H. Peter Anvin [Tue, 1 Dec 2009 05:33:51 +0000 (21:33 -0800)]
x86, mm: Correct the implementation of is_untracked_pat_range()
The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region." This
means that checkin 8a27138924f64d2f30c1022f909f74480046bc3f was
technically wrong, because the implementation needlessly confusing.
The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.
Reported-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Yinghai Lu [Tue, 24 Nov 2009 10:46:59 +0000 (02:46 -0800)]
x86, mtrr: Fix sorting of mtrr after subtracting
In some cases we can coalesce MTRR entries after cleanup; this may
allow us to have more entries. As such, introduce clean_sort_range to
to sort and coaelsce the MTRR entries.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9A3.5020908@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Tue, 24 Nov 2009 10:48:18 +0000 (02:48 -0800)]
x86: Move find_smp_config() earlier and avoid bootmem usage
Move the find_smp_config() call to before bootmem is initialized.
Use reserve_early() instead of reserve_bootmem() in it.
This simplifies the code, we only need to call find_smp_config()
once and can remove the now unneeded reserve parameter from
x86_init_mpparse::find_smp_config.
We thus also reduce x86's dependency on bootmem allocations.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9F2.70907@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
H. Peter Anvin [Mon, 23 Nov 2009 22:46:07 +0000 (14:46 -0800)]
x86, platform: Change is_untracked_pat_range() to bool; cleanup init
- Change is_untracked_pat_range() to return bool.
- Clean up the initialization of is_untracked_pat_range() -- by default,
we simply point it at is_ISA_range() directly.
- Move is_untracked_pat_range to the end of struct x86_platform, since
it is the newest field.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
H. Peter Anvin [Mon, 23 Nov 2009 22:44:39 +0000 (14:44 -0800)]
x86: Change is_ISA_range() into an inline function
Change is_ISA_range() from a macro to an inline function. This makes
it type safe, and also allows it to be assigned to a function pointer
if necessary.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091119202341.GA4420@sgi.com>
H. Peter Anvin [Mon, 23 Nov 2009 22:49:20 +0000 (14:49 -0800)]
x86, mm: is_untracked_pat_range() takes a normal semiclosed range
is_untracked_pat_range() -- like its components, is_ISA_range() and
is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the
PAT code called it as if it took a closed range (>=, <=). Fix.
Although this is a bug, I believe it is non-manifest, simply because
none of the callers will call this with non-page-aligned addresses.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
H. Peter Anvin [Mon, 23 Nov 2009 23:12:07 +0000 (15:12 -0800)]
x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
Checkin fd12a0d69aee6d90fa9b9890db24368a897f8423 made the PAT
untracked range a platform configurable, but missed on occurrence of
is_ISA_range() which still refers to PAT-untracked memory, and
therefore should be using the configurable.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Jack Steiner [Thu, 19 Nov 2009 20:23:41 +0000 (14:23 -0600)]
x86: UV SGI: Don't track GRU space in PAT
GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is
unmapped.
Yinghai Lu [Sat, 21 Nov 2009 08:23:37 +0000 (00:23 -0800)]
x86, numa: Use near(er) online node instead of roundrobin for NUMA
CPU to node mapping is set via the following sequence:
1. numa_init_array(): Set up roundrobin from cpu to online node
2. init_cpu_to_node(): Set that according to apicid_to_node[]
according to srat only handle the node that
is online, and leave other cpu on node
without ram (aka not online) to still
roundrobin.
3. later call srat_detect_node for Intel/AMD, will use first_online
node or nearby node.
Problem is that setup_per_cpu_areas() is not called between 2 and 3,
the per_cpu for cpu on node with ram is on different node, and could
put that on node with two hops away.
So try to optimize this and add find_near_online_node() and call
init_cpu_to_node().
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Sat, 21 Nov 2009 08:23:37 +0000 (00:23 -0800)]
x86, numa, bootmem: Only free bootmem on NUMA failure path
In the NUMA bootmem setup failure path we freed nodedata_phys
incorrectly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Mon, 23 Nov 2009 01:18:49 +0000 (17:18 -0800)]
x86: Change crash kernel to reserve via reserve_early()
use find_e820_area()/reserve_early() instead.
-v2: address Eric's request, to restore original semantics.
will fail, if the provided address can not be used.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <4B09E2F9.7040403@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Fri, 13 Nov 2009 11:54:40 +0000 (11:54 +0000)]
x86: Eliminate redundant/contradicting cache line size config options
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.
To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).
Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ravikiran Thirumalai <kiran@scalex86.org> Acked-by: Nick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Tue, 17 Nov 2009 07:04:56 +0000 (23:04 -0800)]
x86: When cleaning MTRRs, do not fold WP into UC
The current MTRR code treats WP as a form of UC. This really isn't
desirable behaviour, except possibly in the case of severe MTRR
shortage. Disable this, to allow legitimate uses of WP to remain
unmolested.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 13 Nov 2009 23:28:17 +0000 (15:28 -0800)]
x86, mm: Report state of NX protections during boot
It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported. Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.
Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>
H. Peter Anvin [Fri, 13 Nov 2009 23:28:16 +0000 (15:28 -0800)]
x86, mm: Clean up and simplify NX enablement
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available. Furthermore, we had a bewildering collection of tests for
the available of NX.
This patch:
a) merges the 32-bit set_nx() and the 64-bit check_efer() function
into a single x86_configure_nx() function. EFER control is left
to the head code.
b) eliminates the nx_enabled variable entirely. Things that need to
test for NX enablement can verify __supported_pte_mask directly,
and cpu_has_nx gives the supported status of NX.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>
H. Peter Anvin [Fri, 13 Nov 2009 23:28:15 +0000 (15:28 -0800)]
x86, pageattr: Make set_memory_(x|nx) aware of NX support
Make set_memory_x/set_memory_nx directly aware of if NX is supported
in the system or not, rather than requiring that every caller assesses
that support independently.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Starling <tstarling@wikimedia.org> Cc: Hannes Eder <hannes@hanneseder.net>
LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>
H. Peter Anvin [Fri, 13 Nov 2009 23:28:14 +0000 (15:28 -0800)]
x86, sleep: Always save the value of EFER
Always save the value of EFER, regardless of the state of NX. Since
EFER may not actually exist, use rdmsr_safe() to do so.
v2: check the return value from rdmsr_safe() instead of relying on
the output values being unchanged on error.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@tuxonice.net>
LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>
H. Peter Anvin [Fri, 13 Nov 2009 23:28:13 +0000 (15:28 -0800)]
x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX
Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>
Yinghai Lu [Wed, 11 Nov 2009 02:27:23 +0000 (18:27 -0800)]
x86: Make sure wakeup trampoline code is below 1MB
Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.
This is more reliable, and it also removes a dependency on
bootmem.
-v2: change function name to acpi_reserve_wakeup_memory(),
as suggested by Rafael.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Randy Dunlap [Wed, 28 Oct 2009 23:09:55 +0000 (16:09 -0700)]
x86: k8.h: Add struct bootnode
k8.h uses struct bootnode but does not #include a header file
for it, so provide a simple declaration for it.
arch/x86/include/asm/k8.h:13: warning: 'struct bootnode'
declared inside parameter list arch/x86/include/asm/k8.h:13:
warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha [Thu, 29 Oct 2009 02:46:58 +0000 (18:46 -0800)]
x86_64, cpa: Use only text section in set_kernel_text_rw/ro
set_kernel_text_rw()/set_kernel_text_ro() are marking pages
starting from _text to __start_rodata as RW or RO.
With CONFIG_DEBUG_RODATA, there might be free pages (associated
with padding the sections to 2MB large page boundary) between
text and rodata sections that are given back to page allocator.
So we should use only use the start (__text) and end
(__stop___ex_table) of the text section in
set_kernel_text_rw()/set_kernel_text_ro().
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.164525222@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha [Thu, 29 Oct 2009 02:46:57 +0000 (18:46 -0800)]
x86_64, ftrace: Make ftrace use kernel identity mapping to modify code
On x86_64, kernel text mappings are mapped read-only with
CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead
of the kernel text mapping to modify the kernel text.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha [Thu, 29 Oct 2009 02:46:56 +0000 (18:46 -0800)]
x86, cpa: Fix kernel text RO checks in static_protection()
Steven Rostedt reported that we are unconditionally making the
kernel text mapping as read-only. i.e., if someone does cpa() to
the kernel text area for setting/clearing any page table
attribute, we unconditionally clear the read-write attribute for
the kernel text mapping that is set at compile time.
We should delay (to forbid the write attribute) and enforce only
after the kernel has mapped the text as read-only.
Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com>
[ marked kernel_set_to_readonly as __read_mostly ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Tue, 27 Oct 2009 17:15:11 +0000 (13:15 -0400)]
tracing: allow to change permissions for text with dynamic ftrace enabled
The commit 74e081797bd9d2a7d8005fe519e719df343a2ba8
x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
prevents text sections from becoming read/write using set_memory_rw.
The dynamic ftrace changes all text pages to read/write just before
converting the calls to tracing to nops, and vice versa.
I orginally just added a flag to allow this transaction when ftrace
did the change, but I also found that when the CPA testing was running
it would remove the read/write as well, and ftrace does not do the text
conversion on boot up, and the CPA changes caused the dynamic tracer
to fail on self tests.
The current solution I have is to simply not to prevent
change_page_attr from setting the RW bit for kernel text pages.
Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Suresh Siddha [Wed, 14 Oct 2009 21:46:56 +0000 (14:46 -0700)]
x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
text/rodata/data to small 4KB pages as they are mapped with different
attributes (text as RO, RODATA as RO and NX etc).
On x86_64, preserve the large page mappings for kernel text/rodata/data
boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
RODATA section to be hugepage aligned and having same RWX attributes
for the 2MB page boundaries
Extra Memory pages padding the sections will be freed during the end of the boot
and the kernel identity mappings will have different RWX permissions compared to
the kernel text mappings.
Kernel identity mappings to these physical pages will be mapped with smaller
pages but large page mappings are still retained for kernel text,rodata,data
mappings.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Suresh Siddha [Wed, 14 Oct 2009 21:46:55 +0000 (14:46 -0700)]
x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA
In the first 2MB, kernel text is co-located with kernel static
page tables setup by head_64.S. CONFIG_DEBUG_RODATA chops this
2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
leaving the static page tables as RW.
With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
with 2% reduction in system time and 1% improvement in iowait idle time.
To recover this, move the kernel static page tables to .data section, so that
we don't have to break the first 2MB of kernel text to small pages with
CONFIG_DEBUG_RODATA.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
David Rientjes [Fri, 25 Sep 2009 22:20:09 +0000 (15:20 -0700)]
x86: Interleave emulated nodes over physical nodes
Add interleaved NUMA emulation support
This patch interleaves emulated nodes over the system's physical
nodes. This is required for interleave optimizations since
mempolicies, for example, operate by iterating over a nodemask and
act without knowledge of node distances. It can also be used for
testing memory latencies and NUMA bugs in the kernel.
There're a couple of ways to do this:
- divide the number of emulated nodes by the number of physical
nodes and allocate the result on each physical node, or
- allocate each successive emulated node on a different physical
node until all memory is exhausted.
The disadvantage of the first option is, depending on the asymmetry
in node capacities of each physical node, emulated nodes may
substantially differ in size on a particular physical node compared
to another.
The disadvantage of the second option is, also depending on the
asymmetry in node capacities of each physical node, there may be
more emulated nodes allocated on a single physical node as another.
This patch implements the second option; we sacrifice the
possibility that we may have slightly more emulated nodes on a
particular physical node compared to another in lieu of node size
asymmetry.
[ Note that "node capacity" of a physical node is not only a
function of its addressable range, but also is affected by
subtracting out the amount of reserved memory over that range.
NUMA emulation only deals with available, non-reserved memory
quantities. ]
We ensure there is at least a minimal amount of available memory
allocated to each node. We also make sure that at least this
amount of available memory is available in ZONE_DMA32 for any node
that includes both ZONE_DMA32 and ZONE_NORMAL.
This patch also cleans the emulation code up by no longer passing
the statically allocated struct bootnode array among the various
functions. This init.data array is not allocated on the stack since
it may be very large and thus it may be accessed at file scope.
The WARN_ON() for nodes_cover_memory() when faking proximity
domains is removed since it relies on successive nodes always
having greater start addresses than previous nodes; with
interleaving this is no longer always true.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Rientjes [Fri, 25 Sep 2009 22:20:04 +0000 (15:20 -0700)]
x86: Export srat physical topology
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init(). This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.
acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located. If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.
acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Rientjes [Fri, 25 Sep 2009 22:20:00 +0000 (15:20 -0700)]
x86: Export k8 physical topology
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it. This does the k8 node setup in two parts:
detection and registration. NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly. If emulation isn't used, the k8 nodes are
registered as normal.
Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time. This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.
This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch. The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.
This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.
k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
We simply can't do the USB handoff at FIXUP_HEADER time, since it will
often require us to have valid IO mappings etc. But that in turn
requires a whole different approach, not this trivial one-liner.
Maybe we could teach all the USB quirk handoff handlers to only do the
quirk if the device has all its registers set up (since if it isn't
initialized, it's unlikely to be active), but regardless that will need
a whole lot more code than just saying "let's do it really early".
The proper fix is almost certainly to just leave the legacy IOMMU
mappings active until after all devices have been initialized.
Reported-by: Nick Piggin <npiggin@suse.de> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Sun, 11 Oct 2009 21:17:16 +0000 (14:17 -0700)]
pci: increase alignment to make more space for hidden code
As reported in
http://bugzilla.kernel.org/show_bug.cgi?id=13940
on some system when acpi are enabled, acpi clears some BAR for some
devices without reason, and kernel will need to allocate devices for
them. It then apparently hits some undocumented resource conflict,
resulting in non-working devices.
Try to increase alignment to get more safe range for unassigned devices.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bernd Schmidt [Tue, 6 Oct 2009 08:55:26 +0000 (09:55 +0100)]
ROMFS: fix length used with romfs_dev_strnlen() function
An interestingly corrupted romfs file system exposed a problem with the
romfs_dev_strnlen function: it's passing the wrong value to its helpers.
Rather than limit the string to the length passed in by the callers, it
uses the size of the device as the limit.
Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (32 commits)
USB: serial: no unnecessary GFP_ATOMIC in oti6858
USB: serial: fix race between unthrottle and completion handler in visor
USB: serial: fix assumption that throttle/unthrottle cannot sleep
USB: serial: fix race between unthrottle and completion handler in symbolserial
USB: serial: fix race between unthrottle and completion handler in opticon
USB: ehci: Fix isoc scheduling boundary checking.
USB: storage: When a device returns no sense data, call it a Hardware Error
USB: small fix in error case of suspend in generic usbserial code
USB: visor: fix trivial accounting bug in visor driver
USB: Fix throttling in generic usbserial driver
USB: cp210x: Add support for the DW700 UART
USB: ipaq: fix oops when device is plugged in
USB: isp1362: fix build warnings on 64-bit systems
USB: gadget: imx_udc: Use resource size
USB: storage: iRiver P7 UNUSUAL_DEV patch
USB: musb: make HAVE_CLK support optional
USB: xhci: Fix dropping endpoints from the xHC schedule.
USB: xhci: Don't wait for a disable slot cmd when HC dies.
USB: xhci: Handle canceled URBs when HC dies.
USB: xhci: Stop debugging polling loop when HC dies.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix file clone ioctl for bookend extents
Btrfs: fix uninit compiler warning in cow_file_range_nocow
Btrfs: constify dentry_operations
Btrfs: optimize back reference update during btrfs_drop_snapshot
Btrfs: remove negative dentry when deleting subvolumne
Btrfs: optimize fsync for the single writer case
Btrfs: async delalloc flushing under space pressure
Btrfs: release delalloc reservations on extent item insertion
Btrfs: delay clearing EXTENT_DELALLOC for compressed extents
Btrfs: cleanup extent_clear_unlock_delalloc flags
Btrfs: fix possible softlockup in the allocator
Btrfs: fix deadlock on async thread startup
Alexey Dobriyan [Wed, 7 Oct 2009 13:09:06 +0000 (17:09 +0400)]
headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (34 commits)
[SCSI] qla2xxx: Fix NULL ptr deref bug in fail path during queue create
[SCSI] st: fix possible memory use after free after MTSETBLK ioctl
[SCSI] be2iscsi: Moving to pci_pools v3
[SCSI] libiscsi: iscsi_session_setup to allow for private space
[SCSI] be2iscsi: add 10Gbps iSCSI - BladeEngine 2 driver
[SCSI] zfcp: Fix hang when offlining device with offline chpid
[SCSI] zfcp: Fix lockdep warning when offlining device with offline chpid
[SCSI] zfcp: Fix oops during shutdown of offline device
[SCSI] zfcp: Fix initial device and cfdc for delayed adapter allocation
[SCSI] zfcp: correctly initialize unchained requests
[SCSI] mpt2sas: Bump version 02.100.03.00
[SCSI] mpt2sas: Support dev remove when phy status is MPI2_EVENT_SAS_TOPO_PHYSTATUS_VACANT
[SCSI] mpt2sas: Timeout occurred within the HANDSHAKE logic while waiting on firmware to ACK.
[SCSI] mpt2sas: Call init_completion on a per request basis.
[SCSI] mpt2sas: Target Reset will be issued from Interrupt context.
[SCSI] mpt2sas: Added SCSIIO, Internal and high priority memory pools to support multiple TM
[SCSI] mpt2sas: Copyright change to 2009.
[SCSI] mpt2sas: Added mpi2_history.txt for MPI2 headers.
[SCSI] mpt2sas: Update driver to MPI2 REV K headers.
[SCSI] bfa: Brocade BFA FC SCSI driver
...
Oliver Neukum [Wed, 7 Oct 2009 08:50:23 +0000 (10:50 +0200)]
USB: serial: fix assumption that throttle/unthrottle cannot sleep
many serial subdrivers are clearly written as if throttle/unthrottle
cannot sleep. This leads to unneeded atomic submissions. This
patch converts affected drivers in a way to makes very clear that
throttle/unthrottle can sleep. Thus future misdesigns can be avoided
and efficiency and reliability improved.
This removes any such assumption using GFP_KERNEL and spin_lock_irq()
Signed-off-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sarah Sharp [Tue, 6 Oct 2009 20:45:59 +0000 (13:45 -0700)]
USB: ehci: Fix isoc scheduling boundary checking.
The EHCI driver does some bounds checking when it's scheduling an iTD for
an active endpoint. It sets the local variable start to
stream->next_uframe and moves that variable further in the schedule if
necessary. However, the driver fails to do anything with start before
jumping to the ready label and setting the URB's starting frame to
stream->next_uframe. Alan Stern confirms the EHCI driver should set
stream->next_uframe to start before jumping.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Tue, 6 Oct 2009 18:07:57 +0000 (14:07 -0400)]
USB: storage: When a device returns no sense data, call it a Hardware Error
This patch (as1294) fixes a problem that has plagued users for several
kernel releases. Some USB mass-storage devices don't return any sense
data when they encounter certain kinds of errors. The SCSI layer
interprets this to mean that the operation should be retried, and the
same thing happens -- over and over again with no limit. In some
circumstances (such as when a bus reset occurs) that is the right
thing to do, but not here.
The patch checks for this condition (a transport failure with no sense
data) and changes the result code to DID_ERROR and the sense code to
Hardware Error. This does get only a limited number of retries, and
so the command will fail relatively quickly instead of getting stuck
in an infinite loop.
The generic usbserial driver in Linux 2.6.31 halts its receiving
channel in response to throttle requests from the line discipline.
Unfortunately it drops the contents of the first URB received after
throttling takes effect. This patch corrects that problem.
Signed-off-by: Joris van Rantwijk <jorispubl@xs4all.nl> Acked-by: Johan Hovold <jhovold@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Éric Piel [Sun, 4 Oct 2009 11:45:07 +0000 (13:45 +0200)]
USB: cp210x: Add support for the DW700 UART
In the Dell inspiron mini 10, the GPS is connected via a cp2102. This patch
adds detection of this USB device. (I haven't managed to use the GPS under
Linux yet, though)
Alan Stern [Mon, 5 Oct 2009 19:53:58 +0000 (15:53 -0400)]
USB: ipaq: fix oops when device is plugged in
This patch (as1293) fixes a problem with the ipaq serial driver. It
tries to bind to all the interfaces, even those that don't have enough
endpoints. The symptom is an invalid memory reference and oops when
the device is plugged in.
Mike Frysinger [Thu, 17 Sep 2009 01:10:53 +0000 (21:10 -0400)]
USB: musb: make HAVE_CLK support optional
The Blackfin port doesn't support HAVE_CLK and the musb driver works fine
with support stubbed out, so take the existing Blackfin clk stubs and move
them to common musb code so we can drop the Kconfig dependency.
Sarah Sharp [Wed, 16 Sep 2009 23:42:39 +0000 (16:42 -0700)]
USB: xhci: Don't wait for a disable slot cmd when HC dies.
When the host controller dies or is removed while a device is plugged in,
the USB core will attempt to deallocate the struct usb_device. That will
call into xhci_free_dev(). This function used to attempt to submit a
disable slot command to the host controller and clean up the device
structures when that command returned. Change xhci_free_dev() to skip the
command submission and just free the memory if the host controller died.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sarah Sharp [Tue, 29 Sep 2009 00:21:37 +0000 (17:21 -0700)]
USB: xhci: Handle canceled URBs when HC dies.
When the host controller dies (e.g. it is removed from a PCI card slot),
the xHCI driver cannot expect commands to complete. The buggy code this
patch fixes would mark an URB as canceled and then expect the URB to be
completed when the stop endpoint command completed. That would never
happen if the host controller was dead, so the USB core would just hang in
the disconnect code.
If the host controller died, and the driver asks to cancel an URB, free
any structures associated with that URB and immediately give it back.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Gergely Imreh [Tue, 15 Sep 2009 08:03:31 +0000 (16:03 +0800)]
USB: usbtmc: fix timeout increase
The current 10ms timeout is too short for some normal USBTMC device
operation, increase it to a value which was tested with previously
affected Tektronix oscilloscopes.
Signed-off-by: Gergely Imreh <imrehg@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 9 Oct 2009 16:43:12 +0000 (12:43 -0400)]
USB: serial: don't call release without attach
This patch (as1295) fixes a recently-added bug in the USB serial core.
If certain kinds of errors occur during probing, the core may call a
serial driver's release method without previously calling the attach
method. This causes some drivers (io_ti in particular) to perform an
invalid memory access.
The patch adds a new flag to keep track of whether or not attach has
been called.
Johan Hovold [Wed, 7 Oct 2009 18:05:07 +0000 (20:05 +0200)]
USB: ftdi_sio: re-implement read processing
- Re-structure read processing.
- Kill obsolete work queue and always push to tty in completion handler.
- Use tty_insert_flip_string instead of per character push when
possible.
- Fix stalled-read regression in 2.6.31 by using urb status to
determine when port is closed rather than port count.
- Fix race with open/close by checking ASYNCB_INITIALIZED in
unthrottle.
- Kill private rx_flag and lock and use throttle flags in
usb_serial_port instead.
Johan Hovold [Wed, 7 Oct 2009 18:05:06 +0000 (20:05 +0200)]
USB: ftdi_sio: clean up read completion handler
Remove superfluous error checks in completion handler:
- No need to check private data and urb pointers as we check urb-status
before dereferencing priv (which is not freed until urb has been killed
on close).
- No need to check tty as it is checked again when processing.
- No need to check urb->number_of_packets on bulk urb.
Note that both private data and tty are checked again before processing
(possibly from work queue which also is cancelled on close).
Ian Abbott [Mon, 21 Sep 2009 19:48:33 +0000 (15:48 -0400)]
Staging: comedi: s526: fixes for pulse generator
Some changes and corrections to handling of
INSN_CONFIG_GPCT_SINGLE_PULSE_GENERATOR, and
INSN_CONFIG_GPCT_PULSE_TRAIN_GENERATOR, so they interpret insn->data[]
as per the comments in the code.
Signed-off-by: Frank Mori Hess <fmhess@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1. The memory into which we copy 'u1@u2' needs space for u1, @,
u2, and a final \0 which strcat copies in.
2. Strsep changes the value of its first argument. So use a
temporary variable to pass to it, so we pass the original
value to kfree!
3. Allocate an extra char to user_buf, because we need a trailing \0
since we later kstrdup it.
I am about to send out an LTP testcase for this driver, but
in addition the correctness of the hashing can be verified as
follows:
Pekka Enberg [Mon, 21 Sep 2009 09:52:15 +0000 (12:52 +0300)]
Staging: w35und: Fix ->beacon_int breakage
Commit f424afa17899408cbd267a4c4534ca6fc9d8f71c ("mac80211: remove
deprecated API") removed ->beacon_int from struct ieee80211_conf. Fix
breakage in w35und by setting beacon period in ->add_interface to
bss_conf.beacon_int.
Cc: Jiri Benc <jbenc@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The author has found a number of problems with the current version
of this driver in the current kernel, and is reworking it to get
things working again. Because of that, it would be better to remove
the driver now and add it back in a future kernel release.