Change the ordering of code in kernel/power/user.c so that device_suspend() is
called before disable_nonboot_cpus() and device_resume() is called after
enable_nonboot_cpus(). This is needed to make the userland suspend call
pm_ops->finish() after enable_nonboot_cpus() and before device_resume(), as
indicated by the recent discussion on Linux-PM (cf.
http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
The changes here only affect the userland interface of swsusp.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the ordering of code in kernel/power/disk.c so that device_suspend() is
called before disable_nonboot_cpus() and platform_finish() is called after
enable_nonboot_cpus() and before device_resume(), as indicated by the recent
discussion on Linux-PM (cf.
http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
The changes here only affect the built-in swsusp.
[alexey.y.starikovskiy@linux.intel.com: fix LED blinking during image load] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Cc: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As indicated in a recent thread on Linux-PM, it's necessary to call
pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to be
called before pm_ops->finish() (cf.
http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For
consistency, it seems reasonable to call disable_nonboot_cpus() after
device_suspend().
This way the suspend code will remain symmetrical with respect to the resume
code and it may allow us to speed up things in the future by suspending and
resuming devices and/or saving the suspend image in many threads.
The following series of patches reorders the suspend and resume code so that
nonboot CPUs are disabled after devices have been suspended and enabled before
the devices are resumed. It also causes pm_ops->finish() to be called after
enable_nonboot_cpus() wherever necessary.
This patch:
Change the ordering of code in kernel/power/main.c so that device_suspend()
is called before disable_nonboot_cpus() and pm_ops->finish() is called after
enable_nonboot_cpus() and before device_resume(), as indicated by recent
discussion on Linux-PM
(cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ahmed S. Darwish [Sat, 10 Feb 2007 09:43:29 +0000 (01:43 -0800)]
[PATCH] ARM26: Use ARRAY_SIZE macro when appropriate
Use ARRAY_SIZE macro already defined in linux/kernel.h
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Acked-by: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Sat, 10 Feb 2007 09:43:19 +0000 (01:43 -0800)]
[PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE
Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE. (see
proc_dointvec_bset in kernel/sysctl.c)
sysctl appears to drive all over proc reading everything it can get it's
hands on and is complaining when it is being denied access to read
cap-bound. Clearly writing to cap-bound should be a sensitive operation
but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong. I
believe the information could with reasonable certainty be obtained by
looking at a bunch of the output of /proc/pid/status which has very low
security protection, so at best we are just getting a little obfuscation of
information.
Currently SELinux policy has to 'dontaudit' capability checks for
CAP_SYS_MODULE for things like sysctl which just want to read cap-bound.
In doing so we also as a byproduct have to hide warnings of potential
exploits such as if at some time that sysctl actually tried to load a
module. I wondered if anyone would have a problem opening cap-bound up to
read from anyone?
Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ken Chen [Sat, 10 Feb 2007 09:43:18 +0000 (01:43 -0800)]
[PATCH] do not disturb page referenced state when unmapping memory range
When kernel unmaps an address range, it needs to transfer PTE state into
page struct. Currently, kernel transfer access bit via
mark_page_accessed(). The call to mark_page_accessed in the unmap path
doesn't look logically correct.
At unmap time, calling mark_page_accessed will causes page LRU state to be
bumped up one step closer to more recently used state. It is causing quite
a bit headache in a scenario when a process creates a shmem segment, touch
a whole bunch of pages, then unmaps it. The unmapping takes a long time
because mark_page_accessed() will start moving pages from inactive to
active list.
I'm not too much concerned with moving the page from one list to another in
LRU. Sooner or later it might be moved because of multiple mappings from
various processes. But it just doesn't look logical that when user asks a
range to be unmapped, it's his intention that the process is no longer
interested in these pages. Moving those pages to active list (or bumping
up a state towards more active) seems to be an over reaction. It also
prolongs unmapping latency which is the core issue I'm trying to solve.
As suggested by Peter, we should still preserve the info on pte young
pages, but not more.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ken Chen <kenchen@google.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem backed file does not have page writeback, nor it participates in
backing device's dirty or writeback accounting. So using generic
__set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
overkill. It unnecessarily prolongs shm unmap latency.
For example, on a densely populated large shm segment (sevearl GBs), the
unmapping operation becomes painfully long. Because at unmap, kernel
transfers dirty bit in PTE into page struct and to the radix tree tag. The
operation of tagging the radix tree is particularly expensive because it
has to traverse the tree from the root to the leaf node on every dirty
page. What's bothering is that radix tree tag is used for page write back.
However, shmem is memory backed and there is no page write back for such
file system. And in the end, we spend all that time tagging radix tree and
none of that fancy tagging will be used. So let's simplify it by introduce
a new aops __set_page_dirty_no_writeback and this will speed up shm unmap.
Signed-off-by: Ken Chen <kenchen@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Sat, 10 Feb 2007 09:43:14 +0000 (01:43 -0800)]
[PATCH] zoneid: fix up calculations for ZONEID_PGSHIFT
Currently if we have a non-zero ZONES_SHIFT we assume we are able to rely
on that as the bottom edge of the ZONEID, if not then we use the
NODES_PGOFF as the right end of either NODES _or_ SECTION. This latter is
more luck than judgement and would be incorrect if we reordered the
SECTION,NODE,ZONE options in the fields space.
Really what we want is the lower of the right hand end of the two fields we
are using (either NODE,ZONE or SECTION,ZONE). Codify that explicitly. As
always allow for there being no bits in either of the fields, such as might
be valid in a non-numa machine with only a zone NORMAL.
I have checked that the compiler is still able to constant fold all of this
away correctly.
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA
As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
channel management. Other functionality may still expect GFP_DMA to
provide memory below 16M. So we need to make sure that CONFIG_ZONE_DMA is
set independent of CONFIG_GENERIC_ISA_DMA. Undo the modifications to
mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
theses explicitly in each arches Kconfig.
Reviews must occur for each arch in order to determine if ZONE_DMA can be
switched off. It can only be switched off if we know that all devices
supported by a platform are capable of performing DMA transfers to all of
memory (Some arches already support this: uml, avr32, sh sh64, parisc and
IA64/Altix).
In order to switch ZONE_DMA off conditionally, one would have to establish
a scheme by which one can assure that no drivers are enabled that are only
capable of doing I/O to a part of memory, or one needs to provide an
alternate means of performing an allocation from a specific range of memory
(like provided by alloc_pages_range()) and insure that all drivers use that
call. In that case the arches alloc_dma_coherent() may need to be modified
to call alloc_pages_range() instead of relying on GFP_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM
Make ZONE_DMA optional in core code.
- ifdef all code for ZONE_DMA and related definitions following the example
for ZONE_DMA32 and ZONE_HIGHMEM.
- Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
0.
- Modify the VM statistics to work correctly without a DMA zone.
- Modify slab to not create DMA slabs if there is no ZONE_DMA.
[akpm@osdl.org: cleanup]
[jdike@addtoit.com: build fix]
[apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch simply defines CONFIG_ZONE_DMA for all arches. We later do special
things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
without ZONE_DMA.
CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
handles ISA DMA.
First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
needs ZONE_DMA because ISA DMA devices are supported. We can catch this in
mm/Kconfig and do not need to modify arch code.
Second, arches may use ZONE_DMA in an unknown way. We set CONFIG_ZONE_DMA for
all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
compatibility. The arches may later undefine ZONE_DMA if their arch code has
been verified to not depend on ZONE_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone
This patchset follows up on the earlier work in Andrew's tree to reduce the
number of zones. The patches allow to go to a minimum of 2 zones. This one
allows also to make ZONE_DMA optional and therefore the number of zones can be
reduced to one.
ZONE_DMA is usually used for ISA DMA devices. There are a number of reasons
why we would not want to have ZONE_DMA
1. Some arches do not need ZONE_DMA at all.
2. With the advent of IOMMUs DMA zones are no longer needed.
The necessity of DMA zones may drastically be reduced
in the future. This patchset allows a compilation of
a kernel without that overhead.
3. Devices that require ISA DMA get rare these days. All
my systems do not have any need for ISA DMA.
4. The presence of an additional zone unecessarily complicates
VM operations because it must be scanned and balancing
logic must operate on its.
5. With only ZONE_NORMAL one can reach the situation where
we have only one zone. This will allow the unrolling of many
loops in the VM and allows the optimization of varous
code paths in the VM.
6. Having only a single zone in a NUMA system results in a
1-1 correspondence between nodes and zones. Various additional
optimizations to critical VM paths become possible.
Many systems today can operate just fine with a single zone. If you look at
what is in ZONE_DMA then one usually sees that nothing uses it. The DMA slabs
are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL
will be empty instead).
On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty. Why
constantly look at an empty zone in /proc/zoneinfo and empty slab in
/proc/slabinfo? Non i386 also frequently have no need for ZONE_DMA and zones
stay empty.
The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA).
The RFC posted earlier (see
http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots
of #ifdefs in them. An effort has been made to minize the number of #ifdefs
and make this as compact as possible. The job was made much easier by the
ongoing efforts of others to extract common arch specific functionality.
I have been running this for awhile now on my desktop and finally Linux is
using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched:
In two places in the VM we use ZONE_DMA to refer to the first zone. If
ZONE_DMA is optional then other zones may be first. So simply replace
ZONE_DMA with zone 0.
This also fixes ZONETABLE_PGSHIFT. If we have only a single zone then
ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone
number related to a pgdat. However, we still need a zonetable to index all
the zones for each node if this is a NUMA system. Therefore define
ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags.
[apw@shadowen.org: fix mismerge] Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nr_free_pages is now a simple access to a global variable. Make it a macro
instead of a function.
The nr_free_pages now requires vmstat.h to be included. There is one
occurrence in power management where we need to add the include. Directly
refrer to global_page_state() there to clarify why the #include was added.
[akpm@osdl.org: arm build fix]
[akpm@osdl.org: sparc64 build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The global and per zone counter sums are in arrays of longs. Reorder the ZVCs
so that the most frequently used ZVCs are put into the same cacheline. That
way calculations of the global, node and per zone vm state touches only a
single cacheline. This is mostly important for 64 bit systems were one 128
byte cacheline takes only 8 longs.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The determination of the dirty ratio to determine writeback behavior is
currently based on the number of total pages on the system.
However, not all pages in the system may be dirtied. Thus the ratio is always
too low and can never reach 100%. The ratio may be particularly skewed if
large hugepage allocations, slab allocations or device driver buffers make
large sections of memory not available anymore. In that case we may get into
a situation in which f.e. the background writeback ratio of 40% cannot be
reached anymore which leads to undesired writeback behavior.
This patchset fixes that issue by determining the ratio based on the actual
pages that may potentially be dirty. These are the pages on the active and
the inactive list plus free pages.
The problem with those counts has so far been that it is expensive to
calculate these because counts from multiple nodes and multiple zones will
have to be summed up. This patchset makes these counters ZVC counters. This
means that a current sum per zone, per node and for the whole system is always
available via global variables and not expensive anymore to calculate.
The patchset results in some other good side effects:
- Removal of the various functions that sum up free, active and inactive
page counts
- Cleanup of the functions that display information via the proc filesystem.
This patch:
The use of a ZVC for nr_inactive and nr_active allows a simplification of some
counter operations. More ZVC functionality is used for sums etc in the
following patches.
[akpm@osdl.org: UP build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 10 Feb 2007 09:43:00 +0000 (01:43 -0800)]
[PATCH] page_mkwrite caller race fix
After do_wp_page has tested page_mkwrite, it must release old_page after
acquiring page table lock, not before: at some stage that ordering got
reversed, leaving a (very unlikely) window in which old_page might be
truncated, freed, and reused in the same position.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Sat, 10 Feb 2007 09:42:57 +0000 (01:42 -0800)]
[PATCH] Avoid excessive sorting of early_node_map[]
find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort
early_node_map[] on every call. This is an excessive amount of sorting and
that can be avoided. This patch always searches the whole early_node_map[]
in find_min_pfn_for_node() instead of returning the first value found. The
map is then only sorted once when required. Successfully boot tested on a
number of machines.
Robert P. J. Day [Sat, 10 Feb 2007 09:42:56 +0000 (01:42 -0800)]
[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pekka Enberg [Sat, 10 Feb 2007 09:42:53 +0000 (01:42 -0800)]
[PATCH] slab: cache alloc cleanups
Clean up __cache_alloc and __cache_alloc_node functions a bit. We no
longer need to do NUMA_BUILD tricks and the UMA allocation path is much
simpler. No functional changes in this patch.
Note: saves few kernel text bytes on x86 NUMA build due to using gotos in
__cache_alloc_node() and moving __GFP_THISNODE check in to
fallback_alloc().
Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Christoph Lameter <christoph@lameter.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pekka Enberg [Sat, 10 Feb 2007 09:42:52 +0000 (01:42 -0800)]
[PATCH] slab: remove broken PageSlab check from kfree_debugcheck
The PageSlab debug check in kfree_debugcheck() is broken for compound
pages. It is also redundant as we already do BUG_ON for non-slab pages in
page_get_cache() and page_get_slab() which are always called before we free
any actual objects.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan [Mon, 5 Feb 2007 16:28:30 +0000 (16:28 +0000)]
libata: Early CFA adapters are not required to support mode setting
If we are doing a PIO setup for a CFA card and it blows up with a device
error then assume it is an older CFA card which doesn't support this
rather than failing the device out of existance.
Stands seperate to the quieting patch but that is obviously useful with
this change.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Tue, 6 Feb 2007 00:26:04 +0000 (16:26 -0800)]
sata_nv: propagate ata_pci_device_do_resume return value
ata_pci_device_do_resume can fail if the PCI device couldn't be re-enabled.
Update sata_nv to propagate the return value from this call and to not try
to do any other resume activities if it fails. Fixes a compile warning.
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Tue, 6 Feb 2007 00:26:03 +0000 (16:26 -0800)]
sata_nv: wait for response on entering/leaving ADMA mode
Update sata_nv to wait for the controller to indicate via the status
register that it has entered the requested state when switching between
ADMA mode and register mode. This issue came up recently when debugging
some problems with cache flush command timeouts and while it didn't appear
to fix that problem, this is something we should likely be doing in any
case.
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Tue, 6 Feb 2007 00:26:02 +0000 (16:26 -0800)]
sata_nv: use ADMA for NODATA commands
Some problems showed up recently with cache flush commands timing out on
sata_nv. Previously these commands were always handled by transitioning to
legacy mode from ADMA mode first. The timeout problem was worked around
already by a change to the interrupt handling code for legacy mode, but for
non-data commands like these it appears we can handle them in ADMA mode, so
the switch to legacy mode is not needed.
This patch changes the behavior so that we use ADMA mode to submit
interrupt-driven commands with ATA_PROT_NODATA protocol. In addition to
avoiding the problem mentioned above entirely, this avoids the overhead of
switching to legacy mode and back to ADMA mode for handling cache flushes.
When handling non-DMA-mapped commands, we leave the APRD blank and clear
the NV_CPB_CTL_APRD_VALID field in the CPB so the controller does not
attempt to read it.
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Tue, 6 Feb 2007 00:26:01 +0000 (16:26 -0800)]
sata_nv: cleanup ADMA error handling
This cleans up a few issues with the error handling in sata_nv in ADMA mode
to make it more consistent with other NCQ-capable drivers like ahci and
sata_sil24:
- When a command failed, we would effectively set AC_ERR_DEV on the
queued command always. In the case of NCQ commands this prevents libata
from doing a log page query to determine the details of the failed
command, since it thinks we've already analyzed. Just set flags in the
port ehi->err_mask, then freeze or abort and let libata figure out what
went wrong.
- The code handled NV_ADMA_STAT_CPBERR as a "really bad error" which
caused it to set error flags on every queued command. I don't know
exactly what this flag means (no docs, grr!) but from what I can guess
from the standard ADMA spec, it just means that one or more of the CPBs
had an error, so we just need to go through and do our normal checks in
this case.
- In the error_handler function the code would always dump the state of
all the CPBs. This output seems redundant at this point since libata
already dumps the state of all active commands on errors (and it also
triggers at times when it shouldn't, like when suspending). Take this
out.
[akpm@osdl.org: many coding-style fixes] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Allen Martin <AMartin@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sergei Shtylyov [Mon, 5 Feb 2007 18:08:55 +0000 (21:08 +0300)]
(2.6.20) pata_mpiix: probing cleanup (resend)
MPIIX has only single channel IDE which can be configured for either primary or
secondary legacy I/O ports and IRQ. So, get rid of the unneeded second probe
entry in mpiix_init_one() and of the invalid (but unused anyway) enable bits in
mpiix_pre_reset().
Warning: this cleanup has only been compile-tested...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sergei Shtylyov [Mon, 5 Feb 2007 17:24:57 +0000 (20:24 +0300)]
(2.6.20) pata_mpiix: fix PIO setup issues
Fix clearing/setting the wrong TIME/IE/PPE bits for a slave drive caused by a
wrong shift count.
Fix the PIO mode 1 being overclocked by wrongly selecting the fast timing bank.
Also, fix/rephrase some comments while at it.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sergei Shtylyov [Mon, 5 Feb 2007 16:45:38 +0000 (19:45 +0300)]
(2.6.20) pata_oldpiix: fix PIO2 underclocking
Fix the PIO mode 2 using mode 0 timings -- this driver should enable the
fast timing bank starting with PIO2, just like the ata_piix driver does.
Also, fix/rephrase some comments while at it.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 2 Feb 2007 06:29:27 +0000 (15:29 +0900)]
libata: add 150ms between completion of hardreset and status checking
Follow the old SRST rule and delay 150ms between completion of
hardreset and status checking. Debouncing delay should usually cover
this but debounce duration could be shorter than 150ms under certain
circumstances.
Usefulness depends on host controller implementation but it can't hurt
and serves as a reminder that 2s delay for GoVault should also be
added here.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Eric D. Mudama [Wed, 31 Jan 2007 06:00:40 +0000 (23:00 -0700)]
libata: rearrange dmesg info to add full ATA revision
Per Jeff's suggestion, this patch rearranges the info printed for ATA
drives into dmesg to add the full ATA firmware revision and model
information, while keeping the output to 2 lines.
Signed-off-by: Eric D. Mudama <edmudama@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sergei Shtylyov [Tue, 30 Jan 2007 17:40:30 +0000 (20:40 +0300)]
pata_sl82c105: wrong assumptions about compatible PIO modes
Fix the wrong "compatible" PIO mode choices: MWDMA0 has 480 ns cycle while PIO1
only has 383 ns cycle, and MWDMA2 timings matchs those of PIO4 exactly.
Andrew Morton [Sat, 3 Feb 2007 02:07:15 +0000 (18:07 -0800)]
git-libata-all: forward declare struct device
In file included from drivers/infiniband/hw/ipath/ipath_diag.c:44:
include/linux/io.h:35: warning: 'struct device' declared inside parameter list
include/linux/io.h:35: warning: its scope is only this definition or declaration
Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 07:00:28 +0000 (16:00 +0900)]
devres: implement pcim_iomap_regions()
Implement pcim_iomap_regions(). This function takes mask of BARs to
request and iomap. No BAR should have length of zero. BARs are
iomapped using pcim_iomap_table().
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 07:00:28 +0000 (16:00 +0900)]
libata: update libata LLDs to use devres
Update libata LLDs to use devres. Core layer is already converted to
support managed LLDs. This patch simplifies initialization and fixes
many resource related bugs in init failure and detach path. For
example, all converted drivers now handle ata_device_add() failure
gracefully without excessive resource rollback code.
As most resources are released automatically on driver detach, many
drivers don't need or can do with much simpler ->{port|host}_stop().
In general, stop callbacks are need iff port or host needs to be given
commands to shut it down. Note that freezing is enough in many cases
and ports are automatically frozen before being detached.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 07:00:28 +0000 (16:00 +0900)]
libata: update libata core layer to use devres
Update libata core layer to use devres.
* ata_device_add() acquires all resources in managed mode.
* ata_host is allocated as devres associated with ata_host_release.
* Port attached status is handled as devres associated with
ata_host_attach_release().
* Initialization failure and host removal is handedl by releasing
devres group.
* Except for ata_scsi_release() removal, LLD interface remains the
same. Some functions use hacky is_managed test to support both
managed and unmanaged devices. These will go away once all LLDs are
updated to use devres.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 07:00:26 +0000 (16:00 +0900)]
libata: implement ata_host_detach()
Implement ata_host_detach() which calls ata_port_detach() for each
port in the host and export it. ata_port_detach() is now internal and
thus un-exported. ata_host_detach() will be used as the 'deregister
from libata layer' function after devres conversion.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Sat, 20 Jan 2007 07:00:26 +0000 (16:00 +0900)]
devres: device resource management
Implement device resource management, in short, devres. A device
driver can allocate arbirary size of devres data which is associated
with a release function. On driver detach, release function is
invoked on the devres data, then, devres data is freed.
devreses are typed by associated release functions. Some devreses are
better represented by single instance of the type while others need
multiple instances sharing the same release function. Both usages are
supported.
devreses can be grouped using devres group such that a device driver
can easily release acquired resources halfway through initialization
or selectively release resources (e.g. resources for port 1 out of 4
ports).
This patch adds devres core including documentation and the following
managed interfaces.
Adrian Bunk [Tue, 30 Jan 2007 08:59:17 +0000 (00:59 -0800)]
fix CONFIG_SATA_SIS=y compile error
Static code shouldn't be used from other modules.
drivers/built-in.o: In function `sis_init_one':
sata_sis.c:(.text+0x7634cd): undefined reference to `sis_info133'
sata_sis.c:(.text+0x7634d6): undefined reference to `sis_info133'
While I was at it, I also moved the prototype of this struct to a header
file.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan [Mon, 8 Jan 2007 16:11:07 +0000 (16:11 +0000)]
sata_sis: Support for PATA supports
This is quick rework of the patch Uwe proposed but using Kconfig not
ifdefs and user selection to sort out PATA support. Instead of ifdefs and
requiring the user to select both drivers the SATA driver selects the
PATA one.
For neatness I've also moved the extern into the function that uses it.
Signed-off-by: Alan Cox Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 2 Jan 2007 11:20:07 +0000 (20:20 +0900)]
libata: implement HDIO_GET_IDENTITY
'hdparm -I' doesn't work with ATAPI devices and sg_sat is not widely
spread yet leaving no easy way to access ATAPI IDENTIFY data.
Implement HDIO_GET_IDENTITY such that at least 'hdparm -i' works.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan [Wed, 10 Jan 2007 17:13:38 +0000 (17:13 +0000)]
libata: PIIX3 support
This I believe completes the PIIX range of support for libata
This adds the table entries needed for the PIIX3, both a new PCI
identifier and a new mode list. It also fixes an erroneous access to PCI
configuration 0x48 on non UDMA capable chips.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch extends sata_promise to handle ATAPI_NODATA
commands internally. However, commands destined to
ATA_DFLAG_CDB_INTR devices are excluded from this and
continue to be returned to libata.
Concrete changes:
- pdc_atapi_dma_pkt() is renamed to pdc_atapi_pkt(), and is
extended to set up correct headers for NODATA packets
- pdc_qc_prep() calls pdc_atapi_pkt() for ATAPI_NODATA
- pdc_host_intr() handles ATAPI_NODATA
- pdc_qc_issue_prot() sends ATAPI_NODATA packets via the
chip's packet mechanism, except for CDB_INTR devices
Tested on first- and second-generation chips, SATAPI and PATAPI,
with no observable regressions.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_promise: issue ATAPI commands as normal packets
This patch (against libata #upstream + the ATAPI cleanup patch)
reimplements sata_promise's ATAPI support to format ATAPI DMA
commands as normal packets, and to issue them via the hardware's
normal packet machinery.
It turns out that the only reason for issuing ATAPI DMA
commands via the pdc_issue_atapi_pkt_cmd() procedure was to
perform two interrupt-fiddling steps for ATA_DFLAG_CDB_INTR
devices. But these steps aren't needed because sata_promise
sets ATA_FLAG_PIO_POLLING, which disables DMA for those devices.
The remaining steps can easily be done in ATA taskfile packets.
Concrete changes:
- pdc_atapi_dma_pkt() is extended to program all packet setup
steps, and not just contain the CDB; the sequence of steps
exactly mirrors what pdc_issue_atapi_pkt_cmd() did
- pdc_atapi_dma_pkt() needed more parameters: simplify it by
just passing 'qc' and having it extract the data it needs
- pdc_issue_atai_pkt_cmd() and its two helper procedures
pdc_wait_for_drq() and pdc_wait_on_busy() are removed
Tested on first- and second-generation chips, SATAPI and PATAPI,
with no observable regressions.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Here's a cleanup for yesterday's sata_promise ATAPI patch:
- add and use a symbolic constant for the altstatus register
- check return status from ata_busy_wait()
- add missing newline in a warning printk()
- update comment in pdc_issue_atapi_pkt_cmd() to clarify
that the maybe-wait-for-INT issue cannot occur in the
current driver, but may occur if the driver starts issuing
ATAPI non-DMA commands as PDC packets
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 3 Jan 2007 08:32:45 +0000 (17:32 +0900)]
sata_inic162x: finally, driver for initio 162x SATA controllers, take #2
Driver for Initio 162x SATA controllers. ATA r/w, ATAPI r, hotplug
and suspend/resume work. ATAPI w (recording, that is) broken. Feel
free to fix it, but be warned, this controller is weird.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 3 Jan 2007 08:30:39 +0000 (17:30 +0900)]
libata: kill qc->nsect and cursect
libata used two separate sets of variables to record request size and
current offset for ATA and ATAPI. This is confusing and fragile.
This patch replaces qc->nsect/cursect with qc->nbytes/curbytes and
kills them. Also, ata_pio_sector() is updated to use bytes for
qc->cursg_ofs instead of sectors. The field used to be used in bytes
for ATAPI and in sectors for ATA.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 26 Dec 2006 10:39:50 +0000 (19:39 +0900)]
libata: handle pci_enable_device() failure while resuming
Handle pci_enable_device() failure while resuming. This patch kills
the "ignoring return value of 'pci_enable_device'" warning message and
propagates __must_check through ata_pci_device_do_resume().
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Tue, 2 Jan 2007 11:19:40 +0000 (20:19 +0900)]
libata: use ata_id_c_string()
There were several places where ATA ID strings are manually terminated
and in some places possibly unterminated strings were passed to string
functions which don't limit length like strstr(). This patch converts
all of them over to ata_id_c_string().
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Robert Hancock [Thu, 4 Jan 2007 00:13:57 +0000 (18:13 -0600)]
sata_nv: add suspend/resume support v3 (Resubmit)
Thoughts from Jeff & company on merging the patch below into libata-dev?
This has been in the -mm tree for over a month now, I haven't heard any
complaints about regressions..
Adrian Bunk [Wed, 3 Jan 2007 23:09:36 +0000 (00:09 +0100)]
drivers/ata/: make 4 functions static
This patch makes the following needlessly global functions static:
- libata-core.c: ata_qc_complete_internal()
- libata-scsi.c: ata_scsi_qc_new()
- libata-scsi.c: ata_dump_status()
- libata-scsi.c: ata_to_sense_error()
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan [Mon, 8 Jan 2007 12:07:25 +0000 (12:07 +0000)]
ahci: Remove jmicron fixup
The AHCI set up is handled properly along with the other bits in the
JMICRON quirk. Remove the code whacking it in ahci.c as its un-needed and
also blindly fiddles with bits it doesn't own.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan [Mon, 8 Jan 2007 12:10:05 +0000 (12:10 +0000)]
libata-sff: Don't try and activate channels which are not in use
An ATA controller in native mode may have one or more channels disabled
and not assigned resources. In that case the existing code crashes trying
to access I/O ports 0-7.
Add the neccessary check.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
pata_sis: implement laptop list and add ASUS A6K/A6U
In ASUS A6K/A6U hdd is connected to SiS 96x via 40c cable, however it
is short cable and is UDMA66 capable.
tj: fixed if () conditionals
ah: fixed infinite loop
Signed-off-by: Jakub W. Jozwicki <jakub007@go2.pl> Cc: Andreas Henriksson <andreas@fatal.se> Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds ATAPI support to the sata_promise driver.
This has been tested on both first- and second-generation
chips (20378 and 20575), and with both SATAPI and PATAPI
devices. CD-writing works.
SATAPI DMA works on second-generation chips, but on
first-generation chips SATAPI is limited to PIO due
to what appears to be HW limitations.
PATAPI DMA works on both first- and second-generation
chips, but requires the separate PATA support patch
before it can be used on TX2plus chips.
The functional changes to the driver are:
- remove ATA_FLAG_NO_ATAPI from PDC_COMMON_FLAGS
- add ->check_atapi_dma() operation to enable DMA for bulk data
transfers but force PIO for other ATAPI commands; this filter
is from Promise's driver and largely matches pata_pdc207x.c
- use a more restrictive ->check_atapi_dma() on first-generation
chips to force SATAPI to always use PIO
- add handling of ATAPI protocols to pdc_qc_prep(), pdc_host_intr(),
and pdc_qc_issue_prot(): ATAPI_DMA is handled by the driver
while non-DMA protocols are handed over to libata generic code
- add pdc_issue_atapi_pkt_cmd() to handle the initial steps in
issuing ATAPI DMA commands before sending the actual CDB;
this procedure was ported from Promise's driver
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch implements a simple way of setting up per-port
flags on the SATA+PATA Promise TX2plus chips, which is a
prerequisite for supporting the PATA port on those chips.
It is based on the observation that ap->flags isn't really
used until after ->port_start() has been invoked. So it
places the "exceptional" per-port flags array in the driver's
private host structure, and uses it in ->port_start() to
finalise the port's flags.
This patch obsoletes the #promise-sata-pata branch included
in the #all branch.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Arjan van de Ven [Tue, 19 Dec 2006 21:05:53 +0000 (13:05 -0800)]
[PATCH] user of the jiffies rounding patch: ATA subsystem
This patch introduces users of the round_jiffies() function: ATA subsystem
This delayed work is of the "about once a second" variety and can be rounded
to coincide with other wakers.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan [Thu, 7 Dec 2006 16:59:14 +0000 (08:59 -0800)]
[PATCH] pata_it8213: Add new driver for the IT8213 card
Add a driver for the IT8213 which is a single channel ICH-ish PATA
controller. As it is very different to the IT8211/2 it gets its own
driver. There is a legacy drivers/ide driver also available and I'll post
that once I get time to test it all out (probably early January). If
anyone else needs the drivers/ide driver and wants to do the merge for
drivers/ide (Bart ??) then I'll forward it.
[akpm@osdl.org: add PCI ID, constify needed_pio[]] Signed-off-by: Alan Cox <alan@redhat.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Greg Ungerer [Wed, 7 Feb 2007 02:03:08 +0000 (12:03 +1000)]
[PATCH] uclinux: correctly remap bin_fmtflat exe allocated mem regions
remap() the region we get from mmap() to mark the fact that we are
using all of the available slack space. Any slack space is used
to form a simple brk region, and potentially more stack space than
requested at load time.
Any searches of the vma chain may well fail looking for
stack (and especially arg) addresses if the remaping is not done.
The simplest example is /proc/<pid>/cmdline, since the args
are pretty much always at the top of the data/bss/stack region.