Fix ia64 build setup_per_cpu_areas() redifinition issue in UP
configuration. When compiling ia64 kernel in UP configuration, the
following compilation errors are reported:
arch/ia64/kernel/setup.c:860: error: redefinition of 'setup_per_cpu_areas'
include/linux/percpu.h:185: error: previous definition of 'setup_per_cpu_areas' was here
The patch fixes the issue in arch/ia64/kernel/setup.c
Discarded sections in different archs share some commonality but have
considerable differences. This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.
This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro. As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.
ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.
defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390. Michal Simek tested microblaze.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Tested-by: Michal Simek <monstr@monstr.eu> Cc: linux-arch@vger.kernel.org Cc: Michal Simek <monstr@monstr.eu> Cc: microblaze-uclinux@itee.uq.edu.au Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Tony Luck <tony.luck@intel.com>
Michal Simek [Thu, 9 Jul 2009 02:27:40 +0000 (11:27 +0900)]
microblaze: include EXIT_TEXT to _stext
Microblaze wants to throw out EXIT_TEXT during runtime too. This
hasn't caused trouble till now because the linker script didn't
discard EXIT_TEXT and it ended up in its default output section. As
discard definition is about to be unified, include EXIT_TEXT into
_stext explicitly and while at it replace explicit exitcall definition
to EXIT_CALL.
Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Tejun Heo <tj@kernel.org>
Large page first chunk allocator is primarily used for NUMA machines;
however, its NUMA handling is extremely simplistic. Regardless of
their proximity, each cpu is put into separate large page just to
return most of the allocated space back wasting large amount of
vmalloc space and increasing cache footprint.
This patch teachs NUMA details to large page allocator. Given
processor proximity information, pcpu_lpage_build_unit_map() will find
fitting cpu -> unit mapping in which cpus in LOCAL_DISTANCE share the
same large page and not too much virtual address space is wasted.
This greatly reduces the unit and thus chunk size and wastes much less
address space for the first chunk. For example, on 4/4 NUMA machine,
the original code occupied 16MB of virtual space for the first chunk
while the new code only uses 4MB - one 2MB page for each node.
[ Impact: much better space efficiency on NUMA machines ]
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net>
percpu: allow non-linear / sparse cpu -> unit mapping
Currently cpu and unit are always identity mapped. To allow more
efficient large page support on NUMA and lazy allocation for possible
but offline cpus, cpu -> unit mapping needs to be non-linear and/or
sparse. This can be easily implemented by adding a cpu -> unit
mapping array and using it whenever looking up the matching unit for a
cpu.
The only unusal conversion is in pcpu_chunk_addr_search(). The passed
in address is unit0 based and unit0 might not be in use so it needs to
be converted to address of an in-use unit. This is easily done by
adding the unit offset for the current processor.
[ Impact: allows non-linear/sparse cpu -> unit mapping, no visible change yet ]
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
percpu core doesn't need to tack all the allocated pages. It needs to
know whether certain pages are populated and a way to reverse map
address to page when freeing. This patch drops pcpu_chunk->page[] and
use populated bitmap and vmalloc_to_page() lookup instead. Using
vmalloc_to_page() exclusively is also possible but complicates first
chunk handling, inflates cache footprint and prevents non-standard
memory allocation for percpu memory.
pcpu_chunk->page[] was used to track each page's allocation and
allowed asymmetric population which happens during failure path;
however, with single bitmap for all units, this is no longer possible.
Bite the bullet and rewrite (de)populate functions so that things are
done in clearly separated steps such that asymmetric population
doesn't happen. This makes the (de)population process much more
modular and will also ease implementing non-standard memory usage in
the future (e.g. large pages).
This makes @get_page_fn parameter to pcpu_setup_first_chunk()
unnecessary. The parameter is dropped and all first chunk helpers are
updated accordingly. Please note that despite the volume most changes
to first chunk helpers are symbol renames for variables which don't
need to be referenced outside of the helper anymore.
This change reduces memory usage and cache footprint of pcpu_chunk.
Now only #unit_pages bits are necessary per chunk.
[ Impact: reduced memory usage and cache footprint for bookkeeping ]
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
(de)populate functions are about to be reimplemented to drop
pcpu_chunk->page array. Move a few functions so that the rewrite
patch doesn't have code movement making it more difficult to read.
Now that all first chunk allocator helpers allocate and map the first
chunk themselves, there's no need to have optional default alloc/map
in pcpu_setup_first_chunk(). Drop @populate_pte_fn and only leave
@dyn_size optional and make all other params mandatory.
This makes it much easier to follow what pcpu_setup_first_chunk() is
doing and what actual differences tweaking each parameter results in.
x86,percpu: generalize lpage first chunk allocator
Generalize and move x86 setup_pcpu_lpage() into
pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper
around the generalized version. Other than taking size parameters and
using arch supplied callbacks to allocate/free/map memory,
pcpu_lpage_first_chunk() is identical to the original implementation.
This simplifies arch code and will help converting more archs to
dynamic percpu allocator.
While at it, factor out pcpu_calc_fc_sizes() which is common to
pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
[ Impact: code reorganization and generalization ]
At first, percpu first chunk was always setup page-by-page by the
generic code. To add other allocators, different parts of the generic
initialization was made optional. Now we have three allocators -
embed, remap and 4k. embed and remap fully handle allocation and
mapping of the first chunk while 4k still depends on generic code for
those. This makes the generic alloc/map paths specifci to 4k and
makes the code unnecessary complicated with optional generic
behaviors.
This patch makes the 4k allocator to allocate and map memory directly
instead of depending on the generic code. The only outside visible
change is that now dynamic area in the first chunk is allocated
up-front instead of on-demand. This doesn't make any meaningful
difference as the area is minimal (usually less than a page, just
enough to fill the alignment) on 4k allocator. Plus, dynamic area in
the first chunk usually gets fully used anyway.
This will allow simplification of pcpu_setpu_first_chunk() and removal
of chunk->page array.
[ Impact: no outside visible change other than up-front allocation of dyn area ]
Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk().
setup_pcpu_4k() now is a simple wrapper around the generalized
version. Other than taking size parameters and using arch supplied
callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical
to the original implementation.
This simplifies arch code and will help converting more archs to
dynamic percpu allocator.
While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for
consistency.
[ Impact: code reorganization and generalization ]
percpu: drop @unit_size from embed first chunk allocator
The only extra feature @unit_size provides is making dead space at the
end of the first chunk which doesn't have any valid usecase. Drop the
parameter. This will increase consistency with generalized 4k
allocator.
James Bottomley spotted missing conversion for the default
setup_per_cpu_areas() which caused build breakage on all arcsh which
use it.
x86: make pcpu_chunk_addr_search() matching stricter
The @addr passed into pcpu_chunk_addr_search() is unit0 based address
and thus should be matched inside unit0 area. Currently, when it uses
chunk size when determining whether the address falls in the first
chunk. Addresses in unitN where N>0 shouldn't be passed in anyway, so
this doesn't cause any malfunction but fix it for consistency.
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix error message formatting
Btrfs: fix use after free in btrfs_start_workers fail path
Btrfs: honor nodatacow/sum mount options for new files
Btrfs: update backrefs while dropping snapshot
Btrfs: account for space we may use in fallocate
Btrfs: fix the file clone ioctl for preallocated extents
Btrfs: don't log the inode in file_write while growing the file
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] cxgb3i: fix connection error when vlan is enabled
[SCSI] FC transport: Locking fix for common-code FC pass-through patch
[SCSI] zalon: fix oops on attach failure
[SCSI] fnic: use DMA_BIT_MASK(nn) instead of deprecated DMA_nnBIT_MASK
[SCSI] fnic: remove redundant BUG_ONs and fix checks on unsigned
[SCSI] ibmvscsi: Fix module load hang
* git://git.infradead.org/iommu-2.6: (38 commits)
intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
intel-iommu: Use cmpxchg64_local() for setting PTEs
intel-iommu: Warn about unmatched unmap requests
intel-iommu: Kill superfluous mapping_lock
intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
intel-iommu: Make iommu=pt work on i386 too
intel-iommu: Performance improvement for dma_pte_free_pagetable()
intel-iommu: Don't free too much in dma_pte_free_pagetable()
intel-iommu: dump mappings but don't die on pte already set
intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
intel-iommu: Simplify __intel_alloc_iova()
intel-iommu: Performance improvement for domain_pfn_mapping()
intel-iommu: Performance improvement for dma_pte_clear_range()
intel-iommu: Clean up iommu_domain_identity_map()
intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
intel-iommu: Change aligned_size() to aligned_nrpages()
intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
...
These macros had two bugs:
- the type of the mask was not correctly expanded to the full size of
the argument being expanded, resulting in possible loss of high bits
when mixing types.
- the alignment argument was evaluated twice, despite the macro looking
like a fancy function (but it really does need to be a macro, since
it works on arbitrary integer types)
Noticed by Peter Anvin, and with a fix that is a modification of his
suggestion (bug noticed by Yinghai Lu).
Cc: Peter Anvin <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Mason [Thu, 2 Jul 2009 16:26:06 +0000 (12:26 -0400)]
Btrfs: honor nodatacow/sum mount options for new files
The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum. This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan Zheng [Sun, 28 Jun 2009 01:07:35 +0000 (21:07 -0400)]
Btrfs: update backrefs while dropping snapshot
The new backref format has restriction on type of backref item. If a tree
block isn't referenced by its owner tree, full backrefs must be used for the
pointers in it. When a tree block loses its owner tree's reference, backrefs
for the pointers in it should be updated to full backrefs. Current
btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for
general use.
This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a
problem in the restricted form btrfs_drop_snapshot is used today, but for
general snapshot deletion this update is required.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 28 Jun 2009 01:07:34 +0000 (21:07 -0400)]
Btrfs: account for space we may use in fallocate
Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC
panic on btrfs. This is because we do not account for data we may use when
doing the fallocate. This patch fixes the problem by properly reserving space,
and then just freeing it when we are done. The reservation stuff was made with
delalloc in mind, so its a little crude for this case, but it keeps the box
from panicing.
Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Keith Packard [Thu, 2 Jul 2009 04:56:38 +0000 (21:56 -0700)]
fs/notify/inotify: decrement user inotify count on close
The per-user inotify_devs value is incremented each time a new file is
allocated, but never decremented. This led to inotify_init failing after a
limited number of calls.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
David Woodhouse [Thu, 2 Jul 2009 10:21:16 +0000 (11:21 +0100)]
intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
On Wed, 2009-07-01 at 16:59 -0700, Linus Torvalds wrote:
> I also _really_ hate how you do
>
> (unsigned long)pte >> VTD_PAGE_SHIFT ==
> (unsigned long)first_pte >> VTD_PAGE_SHIFT
Kill this, in favour of just looking to see if the incremented pte
pointer has 'wrapped' onto the next page. Which means we have to check
it _after_ incrementing it, not before.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: LCDC dcache flush for deferred io
sh: Fix compiler error and include the definition of IS_ERR_VALUE
sh: re-add LCDC fbdev support to the Migo-R defconfig
sh: fix se7724 ceu names
sh: ms7724se: Enable sh_eth in defconfig.
arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
sh: ms7724se: Add sh_eth support
nommu: provide follow_pfn().
sh: Kill off unused DEBUG_BOOTMEM symbol.
perf_counter tools: add cpu_relax()/rmb() definitions for sh.
sh64: Hook up page fault events for software perf counters.
sh: Hook up page fault events for software perf counters.
sh: make set_perf_counter_pending() static inline.
clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.
David Woodhouse [Wed, 1 Jul 2009 18:30:28 +0000 (19:30 +0100)]
intel-iommu: Kill superfluous mapping_lock
Since we're using cmpxchg64() anyway (because that's the only way to do
an atomic 64-bit store on i386), we might as well ditch the extra
locking and just use cmpxchg64() to ensure that we don't add the page
twice.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Paul Mundt [Wed, 1 Jul 2009 06:50:31 +0000 (06:50 +0000)]
sh: LCDC dcache flush for deferred io
Since writenotify on uncached vmas is unsupported in 2.6.31,
live with cached framebuffer memory in the deferred io
case for now and flush the dcache before forcing refresh.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Magnus damm <damm@igel.co.jp>
Matt Fleming [Wed, 1 Jul 2009 18:14:18 +0000 (18:14 +0000)]
sh: Fix compiler error and include the definition of IS_ERR_VALUE
When arch/sh/include/asm/syscall_32.h is included from a file that
doesn't also include linux/err.h the following error is produced,
In file included from /home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall.h:5,
from kernel/trace/trace_syscalls.c:3:
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h: In function 'syscall_get_error':
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h:28: error: implicit declaration of function 'IS_ERR_VALUE'
make[2]: *** [kernel/trace/trace_syscalls.o] Error 1
make[1]: *** [kernel/trace] Error 2
make: *** [kernel] Error 2
Signed-off-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* git://git.infradead.org/mtd-2.6:
mtd: nand: fix build failure and incorrect return from omap_wait()
mtd: Use BLOCK_NIL consistently in NFTL/INFTL
mtd: m25p80 timeout too short for worst-case m25p16 devices
mtd: atmel_nand: Fix typo s/parititions/partitions/
mtd: cmdlineparts: Use 64-bit format when printing a debug message.
mtd: maps: Remove BUS_ID_SIZE from integrator_flash
jffs2: fix another potential leak on error path in scan.c
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: invalidation reverse calls
fuse: allow umask processing in userspace
fuse: fix bad return value in fuse_file_poll()
fuse: fix return value of fuse_dev_write()
David Woodhouse [Wed, 1 Jul 2009 17:49:06 +0000 (18:49 +0100)]
Fix iommu address space allocation
This fixes kernel.org bug #13584. The IOVA code attempted to optimise
the insertion of new ranges into the rbtree, with the unfortunate result
that some ranges just didn't get inserted into the tree at all. Then
those ranges would be handed out more than once, and things kind of go
downhill from there.
Amerigo Wang [Wed, 1 Jul 2009 05:06:26 +0000 (01:06 -0400)]
elf: fix one check-after-use
Check before use it.
Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()
blocK: Restore barrier support for md and probably other virtual devices.
block: get rid of queue-private command filter
block: Create bip slabs with embedded integrity vectors
cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()
cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()
Trivial typo fixes in Documentation/block/data-integrity.txt.
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: use interruptible wait when duration is controlled by userspace.
md/raid5: suspend shouldn't affect read requests.
md: tidy up error paths in md_alloc
md: fix error path when duplicate name is found on md device creation.
md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.
md: Use new topology calls to indicate alignment and I/O sizes
It called deep inside the codepath and in a conditional way,
and that is what crapped up when one of the new scan_block()
uses grew a tasklist_lock dependency.
This minimal patch removes that yielding stuff and adds the
proper cond_resched().
The background scanning thread could probably also be reniced
to +10.
NeilBrown [Tue, 30 Jun 2009 07:35:44 +0000 (09:35 +0200)]
blocK: Restore barrier support for md and probably other virtual devices.
The next_ordered flag is only meaningful for devices that use __make_request.
So move the test against next_ordered out of generic code and in to
__make_request
Since this test was added, barriers have not worked on md or any
devices that don't use __make_request and so don't bother to set
next_ordered. (dm explicitly sets something other than
QUEUE_ORDERED_NONE since
commit 99360b4c18f7675b50d283301d46d755affe75fd
but notes in the comments that it is otherwise meaningless).
Jens Axboe [Fri, 26 Jun 2009 14:27:10 +0000 (16:27 +0200)]
block: get rid of queue-private command filter
The initial patches to support this through sysfs export were broken
and have been if 0'ed out in any release. So lets just kill the code
and reclaim some space in struct request_queue, if anyone would later
like to fixup the sysfs bits, the git history can easily restore
the removed bits.
block: Create bip slabs with embedded integrity vectors
This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>
Jens Axboe [Tue, 30 Jun 2009 07:34:12 +0000 (09:34 +0200)]
cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()
Setup an emergency fallback cfqq that we allocate at IO scheduler init
time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
never fails without having to ensure free memory.
On cfqq lookup, always try to allocate a new cfqq if the given cfq io
context has the oom_cfqq assigned. This ensures that we only temporarily
punt to this shared queue.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Andre Noll [Thu, 25 Jun 2009 11:03:04 +0000 (13:03 +0200)]
Trivial typo fixes in Documentation/block/data-integrity.txt.
Signed-off-by: Andre Noll <maan@systemlinux.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Magnus Damm [Wed, 1 Jul 2009 04:55:35 +0000 (04:55 +0000)]
sh: fix se7724 ceu names
Use "ceu0" and "ceu1" as CEU names instead of "ceu".
This fixes "memchunk" kernel command line selection
on the solution engine 7724 board.
With this patch applied use "memchunk.ceu0=1m" or
"memchunk.ceu1=1m" on kernel command line to override
physically memory size to one meg for CEU0 or CEU1.
Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
md: use interruptible wait when duration is controlled by userspace.
User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.
md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.
After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.
The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.
It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!
Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 30 Jun 2009 12:46:34 +0000 (12:46 +0000)]
igb: return PCI_ERS_RESULT_DISCONNECT on permanent error
PCI drivers that implement the io_error_detected callback should return
PCI_ERS_RESULT_DISCONNECT if the state passed in is
pci_channel_io_perm_failure. This patch fixes the issue for igb.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Mason [Tue, 30 Jun 2009 12:45:53 +0000 (12:45 +0000)]
e1000e: io_error_detected callback should return PCI_ERS_RESULT_DISCONNECT
on permanent failure
PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is not
checked in many of the network drivers.
This patch fixes the omission in the e1000e driver.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andre Detsch [Tue, 30 Jun 2009 12:46:13 +0000 (12:46 +0000)]
e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error
PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is
not checked in many of the network drivers.
The patch fixes the omission in the e1000 driver.
Based on Mike Mason's similar patch for e1000e.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com> CC: Mike Mason <mmlnx@us.ibm.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix is to correctly zero out internal ->dma value when unmapping
and make sure never to unmap unless there specifically was a mapping done.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 30 Jun 2009 12:45:15 +0000 (12:45 +0000)]
igb: fix unmap length bug
driver was mixing NET_IP_ALIGN count bytes in map/unmap calls
unevenly. Only map the bytes that the hardware might dma into
also fix unmap related bug where ->dma was not being cleared
after unmap
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 30 Jun 2009 11:44:56 +0000 (11:44 +0000)]
ixgbe: fix unmap length bug
This patch addresses three WARN_ON statements from DMA-API debug code
ixgbe is mapping more than it unmaps, reduce the length of the map call and
remove the "used once" local variable.
found by Joerg Roedel <joerg.roedel@amd.com> in 2.6.30, so is a candidate
for -stable.
in addition, fix missing ->dma = 0 after unmap to prevent double free with
pci_unmap_single
and lastly, don't unmap (half) pages that aren't mapped.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ixgbe: Fix link capabilities during adapter resets
Adapter link advertisement capabilities were not persistent during
adapter resets. While configuring multispeed fiber link check for
phy autoneg_advertised settings before overwriting with default
link capabilities
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ixgbe: Fix device capabilities of 82599 single speed fiber NICs.
82599 single speed fiber modules only support 10G/Full. Return
proper device capabilities while querrying the adapter and error
while changing device advertisement/speed/duplex capabilities.
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Tue, 30 Jun 2009 11:43:55 +0000 (11:43 +0000)]
ixgbe: Fix SFP log messages
We had a wide range of log messages for the same sort of SFP
failure. This patch makes them all more similar and less
confusing along with converting them to dev_err.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As the recent bug in md_alloc showed, having a single exit path for
unlocking and putting is a good idea. So restructure md_alloc to have
a single mutex_unlock and mddev_put, and use gotos where necessary.
md: fix error path when duplicate name is found on md device creation.
When an md device is created by name (rather than number) we need to
check that the name is not already in use. If this check finds a
duplicate, we return an error without dropping the lock or freeing
the newly create mddev.
This patch fixes that.
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Inform kmemleak about pid_hash
kmemleak: Do not warn if an unknown object is freed
kmemleak: Do not report new leaked objects if the scanning was stopped
kmemleak: Slightly change the policy on newly allocated objects
kmemleak: Do not trigger a scan when reading the debug/kmemleak file
kmemleak: Simplify the reports logged by the scanning thread
kmemleak: Enable task stacks scanning by default
kmemleak: Allow the early log buffer to be configurable.
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm table: fix blk_stack_limits arg to use bytes not sectors
dm exception store: really fix type lookup
Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
Add Fenghua Yu as temporary co-maintainer for ia64
[IA64] address compiler warnings perfmon.c/salinfo.c
[IA64] Remove unnecessary semicolons
[IA64] sprintf should not be used with same source & destination address
hostfs: set maximum filesize in superblock for proper LFS support
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.
Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 30 Jun 2009 18:41:43 +0000 (11:41 -0700)]
bfin: delay IRQ registration until driver is ready
Make sure we do not actually request the RTC IRQ until the device driver
is fully ready to handle and process any interrupt. This way a spurious
interrupt won't crash the system (which may happen if the bootloader was
poking the RTC right before booting Linux).
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 30 Jun 2009 18:41:42 +0000 (11:41 -0700)]
atyfb: fix alignment for block writes
Block writes require 64 byte alignment. Since block writes could be used
with SGRAM or WRAM also refine the memory type detection to check for
either type before deciding to use the 64 byte alignment.
Signed-off-by: Ville Syrjala <syrjala@sci.fi> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 30 Jun 2009 18:41:40 +0000 (11:41 -0700)]
atyfb: fix HP OmniBook 500 reboot hang
Apparently HP OmniBook 500's BIOS doesn't like the way atyfb reprograms
the hardware. The BIOS will simply hang after a reboot. Fix the problem
by restoring the hardware to it's original state on reboot.
Signed-off-by: Ville Syrjala <syrjala@sci.fi> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Tue, 30 Jun 2009 18:41:37 +0000 (11:41 -0700)]
x86: only clear node_states for 64bit
Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.
Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com> Tested-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpusets: document adding/removing cpus to cpuset elaborately
By writing a tasks's pid to the file, a process adds that task to that
cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus
should be written to the cpuset.mems file which would replace the old list
of cpus. Make this clearer in the documentation.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Tue, 30 Jun 2009 18:41:35 +0000 (11:41 -0700)]
mm: prevent balance_dirty_pages() from doing too much work
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.
balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.
This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.
A simple fio test easily demonstrates this problem.
This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.
Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Renaud Lottiaux [Tue, 30 Jun 2009 18:41:34 +0000 (11:41 -0700)]
bsdacct: fix access to invalid filp in acct_on()
The file opened in acct_on and freshly stored in the ns->bacct struct can
be closed in acct_file_reopen by a concurrent call after we release
acct_lock and before we call mntput(file->f_path.mnt).
Record file->f_path.mnt in a local variable and use this variable only.
Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com> Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 30 Jun 2009 18:41:30 +0000 (11:41 -0700)]
spi: bitbang bugfix in message setup
Bugfix to spi_bitbang infrastructure: make sure to always set transfer
parameters on the first pass through the message's per-transfer loop.
This can matter with drivers that replace the per-word or per-buffer
transfer primitives, on busses with multiple SPI devices.
Previously, this could have started messages using the settings left after
previous messages. The problem was observed when a high speed chip
(m25p80 type flash) was running very slowly because a low speed device
(avr8 microcontroller) had previously used the bus. Similar faults could
have driven the low speed device too fast, or used an unexpected word
size.
Acked-by: Steven A. Falco <sfalco@harris.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>