Yasunori Goto [Tue, 27 Jun 2006 09:53:34 +0000 (02:53 -0700)]
[PATCH] pgdat allocation for new node add (call pgdat allocation)
Add node-hot-add support to add_memory().
node hotadd uses this sequence.
1. allocate pgdat.
2. refresh NODE_DATA()
3. call free_area_init_node() to initialize
4. create sysfs entry
5. add memory (old add_memory())
6. set node online
7. run kswapd for new node.
(8). update zonelist after pages are onlined. (This is already merged in -mm
due to update phase is difference.)
Note:
To make common function as much as possible,
there is 2 changes from v2.
- The old add_memory(), which is defiend by each archs,
is renamed to arch_add_memory(). New add_memory becomes
caller of arch dependent function as a common code.
- This patch changes add_memory()'s interface
From: add_memory(start, end)
TO : add_memory(nid, start, end).
It was cause of similar code that finding node id from
physical address is inside of old add_memory() on each arch.
In addition, acpi memory hotplug driver can find node id easier.
In v2, it must walk DSDT'S _CRS by matching physical address to
get the handle of its memory device, then get _PXM and node id.
Because input is just physical address.
However, in v3, the acpi driver can use handle to get _PXM and node id
for the new memory device. It can pass just node id to add_memory().
Fix interface of arch_add_memory() is in next patche.
Yasunori Goto [Tue, 27 Jun 2006 09:53:33 +0000 (02:53 -0700)]
[PATCH] pgdat allocation for new node add (refresh node_data[])
Refresh NODE_DATA() for generic archs. In this case, NODE_DATA(nid) ==
node_data[nid]. node_data[] is array of address of pgdat. So, refresh is
quite simple.
Yasunori Goto [Tue, 27 Jun 2006 09:53:32 +0000 (02:53 -0700)]
[PATCH] pgdat allocation for new node add (generic alloc node_data)
For node hotplug, basically we have to allocate new pgdat. But, there are
several types of implementations of pgdat.
1. Allocate only pgdat.
This style allocate only pgdat area.
And its address is recorded in node_data[].
It is most popular style.
2. Static array of pgdat
In this case, all of pgdats are static array.
Some archs use this style.
3. Allocate not only pgdat, but also per node data.
To increase performance, each node has copy of some data as
a per node data. So, this area must be allocated too.
Ia64 is this style. Ia64 has the copies of node_data[] array
on each per node data to increase performance.
In this series of patches, treat (1) as generic arch.
generic archs can use generic function. (2) and (3) should have
its own if necessary.
This patch defines pgdat allocator.
Updating NODE_DATA() macro function is in other patch.
Yasunori Goto [Tue, 27 Jun 2006 09:53:31 +0000 (02:53 -0700)]
[PATCH] pgdat allocation for new node add (get node id by acpi)
This is to find node id from acpi's handle of memory_device in DSDT. _PXM for
the new node can be found by acpi_get_pxm() by using new memory's handle. So,
node id can be found by pxm_to_nid_map[].
This patch becomes simpler than v2 of node hot-add patch.
Because old add_memory() function doesn't have node id parameter.
So, kernel must find its handle by physical address via DSDT again.
But, v3 just give node id to add_memory() now.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yasunori Goto [Tue, 27 Jun 2006 09:53:30 +0000 (02:53 -0700)]
[PATCH] pgdat allocation for new node add (specify node id)
Change the name of old add_memory() to arch_add_memory. And use node id to
get pgdat for the node at NODE_DATA().
Note: Powerpc's old add_memory() is defined as __devinit. However,
add_memory() is usually called only after bootup.
I suppose it may be redundant. But, I'm not well known about powerpc.
So, I keep it. (But, __meminit is better at least.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yasunori Goto [Tue, 27 Jun 2006 09:53:29 +0000 (02:53 -0700)]
[PATCH] Catch notification of memory add event of ACPI via container driver. (avoid redundant call add_memory)
When acpi_memory_device_init() is called at boottime to register struct
memory acpi_memory_device, acpi_bus_add() are called via
acpi_driver_attach().
But it also calls ops->start() function. It is called even if the memory
blocks are initialized at early boottime. In this case add_memory() return
-EEXIST, and the memory blocks becomes INVALID state even if it is normal.
This is patch to avoid calling add_memory() for already available memory.
Yasunori Goto [Tue, 27 Jun 2006 09:53:28 +0000 (02:53 -0700)]
[PATCH] Catch notification of memory add event of ACPI via container driver. (register start func for memory device)
This is a patch to call add_memroy() when notify reaches for new node's add
event.
When new node is added, notify of ACPI reaches container device which means
the node.
Container device driver calls acpi_bus_scan() to find and add belonging
devices (which means cpu, memory and so on). Its function calls add and
start function of belonging devices's driver.
Howevever, current memory hotplug driver just register add function to
create sysfs file for its memory. But, acpi_memory_enable_device() is not
called because it is considered just the case that notify reaches memory
device directly. So, if notify reaches container device nothing can call
add_memory().
This is a patch to create start function which calls add_memory().
add_memory() can be called by this when notify reaches container device.
[PATCH] acpi memory hotplug cannot manage _CRS with plural resoureces
Current acpi memory hotplug just looks into the first entry of resources in
_CRS. But, _CRS can contain plural resources. So, if _CRS contains plural
resoureces, acpi memory hot add cannot add all memory.
With this patch, acpi memory hotplug can deal with Memory Device, whose
_CRS contains plural resources.
Tested on ia64 memory hotplug test envrionment (not emulation, uses alpha
version firmware which supports dynamic reconfiguration of NUMA.)
Note: Microsoft's Windows Server 2003 requires big (>4G)resoureces to be
divided into small (<4G) resources. looks crazy, but not invalid.
(See http://www.microsoft.com/whdc/system/pnppwr/hotadd/hotaddmem.mspx)
For this reason, a firmware vendor who supports Windows writes plural
resources in a _CRS even if they are contiguous.
Randy Dunlap [Tue, 27 Jun 2006 09:53:26 +0000 (02:53 -0700)]
[PATCH] zlib inflate: fix function definitions
Fix function definitions to be ANSI-compliant:
lib/zlib_inflate/inffast.c:68:1: warning: non-ANSI definition of function 'inflate_fast'
lib/zlib_inflate/inftrees.c:33:1: warning: non-ANSI definition of function 'zlib_inflate_table'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Brownell [Tue, 27 Jun 2006 19:59:15 +0000 (12:59 -0700)]
[PATCH] fix static linking of NFS
Builds on ARM report link problems with common configurations like
statically linked NFS (for nfsroot). The symptom is that __init
section code references __exit section code; that won't work since
the exit sections are discarded (since they can never be called).
The best fix for these particular cases would be an "__init_or_exit"
section annotation.
Dmitry Torokhov [Tue, 27 Jun 2006 12:30:31 +0000 (08:30 -0400)]
Input: fix resetting name, phys and uniq when unregistering device
It should be done before calling class_device_unregister() because
it will destroy the device and free memory if there are no other
references to the device.
Thanks to Daniel Ritz and Michal Piotrowski for noticing the problem.
Daniel says:
"[The] reason is a recent change that made modules always shows as
module.mod. it breaks modprobe and probably many scripts..besides
lsmod looking horrible
stuff like this in modprobe.conf:
install pcmcia_core /sbin/modprobe --ignore-install pcmcia_core; /sbin/modprobe pcmcia
makes modprobe fork/exec endlessly calling itself...until oom
interrupts it"
Linus Torvalds [Mon, 26 Jun 2006 23:06:08 +0000 (16:06 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (56 commits)
[PATCH] fs/ocfs2/dlm/: cleanups
ocfs2: fix compiler warnings in dlm_convert_lock_handler()
ocfs2: dlm_print_one_mle() needs to be defined
ocfs2: remove whitespace in dlmunlock.c
ocfs2: move dlm work to a private work queue
ocfs2: fix incorrect error returns
ocfs2: tune down some noisy messages during dlm recovery
ocfs2: display message before waiting for recovery to complete
ocfs2: mlog in dlm_convert_lock_handler() should be ML_ERROR
ocfs2: retry operations when a lock is marked in recovery
ocfs2: use cond_resched() in dlm_thread()
ocfs2: use GFP_NOFS in some dlm operations
ocfs2: wait for recovery when starting lock mastery
ocfs2: continue recovery when a dead node is encountered
ocfs2: remove unneccesary spin_unlock() in dlm_remaster_locks()
ocfs2: dlm_remaster_locks() should never exit without completing
ocfs2: special case recovery lock in dlmlock_remote()
ocfs2: pending mastery asserts and migrations should block each other
ocfs2: temporarily disable automatic lock migration
ocfs2: do not unconditionally purge the lockres in dlmlock_remote()
...
Kurt Hackel [Mon, 1 May 2006 21:29:28 +0000 (14:29 -0700)]
ocfs2: retry operations when a lock is marked in recovery
Before checking for a nonexistent lock, make sure the lockres is not marked
RECOVERING. The caller will just retry and the state should be fixed up when
recovery completes.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Mon, 1 May 2006 20:49:20 +0000 (13:49 -0700)]
ocfs2: dlm_remaster_locks() should never exit without completing
We cannot restart recovery. Once we begin to recover a node, keep the state
of the recovery intact and follow through, regardless of any other node
deaths that may occur.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Mon, 1 May 2006 20:47:50 +0000 (13:47 -0700)]
ocfs2: special case recovery lock in dlmlock_remote()
If the previous master of the recovery lock dies, let calc_usage take it
down completely and let the caller completely redo the dlmlock() call.
Otherwise, there will never be an opportunity to re-master the lockres and
recovery wont be able to progress.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Mon, 1 May 2006 20:32:27 +0000 (13:32 -0700)]
ocfs2: pending mastery asserts and migrations should block each other
Use the existing structure for blocking migrations when ASTs are pending to
achieve the same result. If we can catch the assert before it goes on the
wire, just cancel it and let the migration continue.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Now we never change the owner of a lock resource until unmount or node
death. This will be re-enabled once some issues in the algorithm used have
been resolved.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Mon, 1 May 2006 20:27:10 +0000 (13:27 -0700)]
ocfs2: do not unconditionally purge the lockres in dlmlock_remote()
In dlmlock_remote(), do not call purge_lockres until the lock resource
actually changes. otherwise, the mastery info on the lockres will go away
underneath the caller.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Mon, 1 May 2006 19:02:07 +0000 (12:02 -0700)]
ocfs2: increase backoff before waiting for recovery
When mastering non-recovery lock resources, additional time was frequently
needed to allow the disk heartbeat to catch up with the network timeout. the
recovery lock resource is time critical and avoids this path.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Joel Becker [Fri, 17 Mar 2006 01:40:37 +0000 (17:40 -0800)]
[PATCH] ocfs2: Alloc at least a page for the DLM hash
The OCFS2 DLM allocates a number of pages for a hash to lookup locks.
There was a bug where a PAGE_SIZE bigger than the hash size (eg, 64K
pages) would result in zero pages allocated.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
typo fixes
Clean up 'inline is not at beginning' warnings for usb storage
Storage class should be first
i386: Trivial typo fixes
ixj: make ixj_set_tone_off() static
spelling fixes
fix paniced->panicked typos
Spelling fixes for Documentation/atomic_ops.txt
move acknowledgment for Mark Adler to CREDITS
remove the bouncing email address of David Campbell
Karsten Keil [Mon, 26 Jun 2006 18:21:01 +0000 (20:21 +0200)]
[PATCH] fix processing of the last byte in isdn_readbchan_tty()
The changes in the tty handling contain a bug while accessing
the last byte in the skb. Since special sequence for control of
DTMF and FAX via ttyI* devices handled via this path, these services
do not work anymore.
"It seems too little tested: "losetup -d /dev/loop0" fails with
EINVAL because nothing sets lo_thread; but even when you patch
loop_thread() to set lo->lo_thread = current, it can't survive
more than a few dozen iterations of the loop below (with a tmpfs
mounted on /tst):
j=0
cp /dev/zero /tst
while :
do
let j=j+1
echo "Doing pass $j"
losetup /dev/loop0 /tst/zero
mkfs -t ext2 -b 1024 /dev/loop0 >/dev/null 2>&1
mount -t ext2 /dev/loop0 /mnt
umount /mnt
losetup -d /dev/loop0
done
it collapses with failed ioctl then BUG_ON(!bio).
I think the original lo_done completion was more subtle and safe
than the kthread conversion has allowed for."
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: trivial fixes in Makefile
kbuild: adding symbols in Kconfig and defconfig to TAGS
kbuild: replace abort() with exit(1)
kbuild: support for %.symtypes files
kbuild: fix silentoldconfig recursion
kbuild: add option for stripping modules while installing them
kbuild: kill some false positives from modpost
kbuild: export-symbol usage report generator
kbuild: fix make -rR breakage
kbuild: append -dirty for updated but uncommited changes
kbuild: append git revision for all untagged commits
kbuild: fix module.symvers parsing in modpost
kbuild: ignore make's built-in rules & variables
kbuild: bugfix with initramfs
kbuild: modpost build fix
kbuild: check license compatibility when building modules
kbuild: export-type enhancement to modpost.c
kbuild: add dependency on kernel.release to the package targets
kbuild: `make kernelrelease' speedup
kconfig: KCONFIG_OVERWRITECONFIG
...
Greg Ungerer [Mon, 26 Jun 2006 06:33:15 +0000 (16:33 +1000)]
[PATCH] m68knommu: use configurable RAM setup page_offset.h
Remove board specific base RAM conditionals from page_offset.h
With the Kconfig time configurable RAM setup none of this is required.
It is all based on the Kconfig (CONFIG_RAMBASE) option now.
Greg Ungerer [Mon, 26 Jun 2006 06:33:05 +0000 (16:33 +1000)]
[PATCH] m68knommu: use configurable RAM setup in linker script
Remove the fixed RAM configurations for each board type from the
linker script. Replace with simple defines usng the flexible RAM
configuration options. This cleans out of lot of board specific
munging of addresses.
Greg Ungerer [Mon, 26 Jun 2006 06:32:59 +0000 (16:32 +1000)]
[PATCH] m68knommu: create configurable RAM setup
Reworked the way RAM regions are defined. Instead of coding all the
variations for each board type we now just configure RAM base and size
in the usual Kconfig setup. This much simplifies the code, and makes it
a lot more flexible when setting up new boards or board varients.
Greg Ungerer [Mon, 26 Jun 2006 06:34:09 +0000 (16:34 +1000)]
[PATCH] m68knommu: remove unused vars from generic 68328 start code
Clean out unused variable definitions from 68328 start up code.
Also use a more appropriate start address for the case of relocating
the kernel code to RAM (from ROM/flash).
Greg Ungerer [Mon, 26 Jun 2006 06:34:04 +0000 (16:34 +1000)]
[PATCH] m68knommu: remove __ramvec from 68328/pilot start code
__ramvec has been removed from the linker script. The vector base
address is defined as a configurable option, use that. Remove its
use from the 68328/pilot startup code.
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (4227): Update this driver for recent header file movement.
V4L/DVB (4223): Add V4L2_CID_MPEG_STREAM_VBI_FMT control
V4L/DVB (4222): Always switch tuner mode when calling VIDIOC_S_FREQUENCY.
V4L/DVB (4221): Add HM12 YUV format define.
V4L/DVB (4219): Av7110: analog sound output of DVB-C rev 2.3
V4L/DVB (4217): Fix a misplaced closing bracket/else, which caused swzigzag not to be called
V4L/DVB (4215): Make VIDEO_CX88_BLACKBIRD a separate build option
V4L/DVB (4214): Make VIDEO_CX2341X a selectable build option
V4L/DVB (4213): Cx88: cleanups
V4L/DVB (4211): Fix an Oops for all fe that have get_frontend_algo == NULL
Take two, now without spurious whitespace :( Applies to git & 2.6.17-rc6
CONFIG_DEBUG_STACKOVERFLOW existed for x86_64 in 2.4, but seems to have gone AWOL in 2.6.
I've pretty much just copied this over from the 2.4 code, with
appropriate tweaks for the 2.6 kernel, plus a bugfix. I'd personally
rather see it printed out the way other arches do it, i.e.
bytes-remaining-until-overflow, rather than having to do the subtraction
yourself. Also, only 128 bytes remaining seems awfully late to issue a
warning. But I'll start here :)
Christian Kujau [Mon, 26 Jun 2006 12:00:02 +0000 (14:00 +0200)]
[PATCH] x86_64: msi_apic.c build fix
CC drivers/pci/msi-apic.o
In file included from include/asm/msi.h:11,
from drivers/pci/msi.h:71,
from drivers/pci/msi-apic.c:8:
include/asm/smp.h:103: error: syntax error before '->' token
akpm: nasty. It appears to be
static inline unsigned int cpu_mask_to_apicid(cpumask_t cpumask)
[PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Keith Owens [Mon, 26 Jun 2006 11:59:56 +0000 (13:59 +0200)]
[PATCH] x86_64: Avoid broadcasting NMI IPIs
On some i386/x86_64 systems, sending an NMI IPI as a broadcast will
reset the system. This seems to be a BIOS bug which affects machines
where one or more cpus are not under OS control. It occurs on HT
systems with a version of the OS that is not compiled without HT
support. It also occurs when a system is booted with max_cpus=n where
2 <= n < cpus known to the BIOS. The fix is to always send NMI IPI as
a mask instead of as a broadcast.
Siddha, Suresh B [Mon, 26 Jun 2006 11:59:53 +0000 (13:59 +0200)]
[PATCH] x86_64: fix apic error on bootup
Appended patch fixes the "APIC error on CPUX: 00(40)" observed during bootup.
From SDM Vol-3A "Valid Interrupt Vectors" section:
"When an illegal vector value (0-15) is written to an LVT entry
and the delivery mode is Fixed, the APIC may signal an illegal
vector error, with out regard to whether the mask bit is set
or whether an interrupt is actually seen on input."
Chuck Ebbert [Mon, 26 Jun 2006 11:59:50 +0000 (13:59 +0200)]
[PATCH] x86_64: enlarge window for stack growth
Allow stack growth so the 'enter' instruction works. Also
fixes problem in compat_sys_kexec_load() which could allocate
more than 128 bytes using compat_alloc_user_space().