Borislav Petkov [Thu, 25 Jun 2009 17:32:38 +0000 (19:32 +0200)]
EDAC: move MCE error descriptions to EDAC core
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.
While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.
Alex Chiang [Thu, 10 Sep 2009 18:34:09 +0000 (12:34 -0600)]
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
acpi_pci_detect_ejectable() goes through effort to convert its
struct pci_bus arg to an acpi_handle, but every time we use this
interface, we already have the handle available.
So let's just use the handle instead of converting back and forth.
fsync: wait for data writeout completion before calling ->fsync
Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
the data, the calls into ->fsync to write out the metadata and then finally
calls filemap_fdatawait to wait for the data I/O to complete. What sounds
like a clever micro-optimization actually is nast trap for many filesystems.
For many modern filesystems i_size or other inode information is only
updated on I/O completion and we need to wait for I/O to finish before
we can write out the metadata. For old fashionen filesystems that
instanciate blocks during the actual write and also update the metadata
at that point it opens up a large window were we could expose uninitialized
blocks after a crash. While a few filesystems that need it already wait
for the I/O to finish inside their ->fsync methods it is rather suboptimal
as it is done under the i_mutex and also always for the whole file instead
of just a part as we could do for O_SYNC handling.
Here is a small audit of all fsync instances in the tree:
never touch pagecache themselves. We need to wait before if we do
not want to expose stale data after an allocation.
- afs_fsync:
- fuse_fsync_common:
do the waiting writeback itself in awkward ways, would benefit from
proper semantics
- block_fsync:
Does a filemap_write_and_wait on the block device inode. Because we
now have f_mapping that is the same inode we call it on in vfs_fsync.
So just removing it and letting the VFS do the work in one go would
be an improvement.
would need the wait to work correctly for delalloc mode with late
i_size updates. Otherwise the ext3 comment applies.
currently implemens it's own writeback and wait in an odd way,
could benefit from doing it properly.
- gfs2_fsync:
not needed for journaled data mode, but probably harmless there.
Currently writes back data asynchronously itself. Needs some
major audit.
- hostfs_fsync:
just calls fsync/datasync on the host FD. Without the wait before
data might not even be inflight yet if we're unlucky.
- hpfs_file_fsync:
- ncp_fsync:
no-ops. Dangerous before and after.
- jffs2_fsync:
just calls jffs2_flush_wbuf_gc, not sure how this relates to data.
- nfs_fsync_dir:
just increments stats, claims all directory operations are synchronous
- nfs_file_fsync:
only writes out data??? Looks very odd.
- nilfs_sync_file:
looks like it expects all data done, but not sure from the code
- ntfs_dir_fsync:
- ntfs_file_fsync:
appear to do their own data writeback. Very convoluted code.
- ocfs2_sync_file:
does it's own data writeback, but no wait. probably needs the wait.
- smb_fsync:
according to a comment expects all pages written already, probably needs
the wait before.
This patch only changes vfs_fsync_range, removal of the wait in the methods
that have it is left to the filesystem maintainers. Note that most
filesystems really do need an audit for their fsync methods given the
gems found in this very brief audit.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 17 Aug 2009 15:00:02 +0000 (17:00 +0200)]
fat: Opencode sync_page_range_nolock()
fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the
only user of generic_osync_inode() which does not have a file open. So
opencode needed actions for FAT so that we can convert generic_osync_inode() to
a standard syncing path.
Update a comment about generic_osync_inode().
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Tue, 18 Aug 2009 16:32:55 +0000 (18:32 +0200)]
xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.
CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Tue, 18 Aug 2009 16:24:31 +0000 (18:24 +0200)]
ocfs2: Update syncing after splicing to match generic version
Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.
Acked-by: Joel Becker <Joel.Becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Tue, 18 Aug 2009 16:13:58 +0000 (18:13 +0200)]
ntfs: Use new syncing helpers and update comments
Use new syncing helpers in .write and .aio_write functions. Also
remove superfluous syncing in ntfs_file_buffered_write() and update
comments about generic_osync_inode().
CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 17 Aug 2009 17:52:36 +0000 (19:52 +0200)]
vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.
Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.
CC: Evgeniy Polyakov <zbr@ioremap.net> CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 17 Aug 2009 16:50:08 +0000 (18:50 +0200)]
ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.
CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Tue, 18 Aug 2009 14:18:20 +0000 (16:18 +0200)]
vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().
We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.
CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 17 Aug 2009 16:10:06 +0000 (18:10 +0200)]
vfs: Export __generic_file_aio_write() and add some comments
Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.
CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
Merge branch 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, percpu: Collect hot percpu variables into one cacheline
x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, highmem_32.c: Clean up comment
x86, pgtable.h: Clean up types
x86: Clean up dump_pagetable()
Merge branch 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Simplify the Makefile in a minor way through use of cc-ifversion
Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: move clts into batch cpu state updates when preloading fpu
x86-64: move unlazy_fpu() into lazy cpu state part of context switch
x86-32: make sure clts is batched during context switch
x86: split out core __math_state_restore
Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Decrease the level of some NUMA messages to KERN_DEBUG
Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
x86: Fix code patching for paravirt-alternatives on 486
x86, msr: change msr-reg.o to obj-y, and export its symbols
x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
x86, mcheck: Use correct cpumask for shared bank4
x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
x86: Fix CPU llc_shared_map information for AMD Magny-Cours
x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
x86, msr: fix msr-reg.S compilation with gas 2.16.1
x86, msr: Export the register-setting MSR functions via /dev/*/msr
x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
x86, msr: CFI annotations, cleanups for msr-reg.S
x86, asm: Make _ASM_EXTABLE() usable from assembly code
x86, asm: Add 32-bit versions of the combined CFI macros
x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
x86, msr: Rewrite AMD rd/wrmsr variants
x86, msr: Add rd/wrmsr interfaces with preset registers
x86: add specific support for Intel Atom architecture
...
Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Make memtype_seq_ops const
x86: uv: Clean up uv_ptc_init(), use proc_create()
x86: Use printk_once()
x86/cpu: Clean up various files a bit
x86: Remove duplicated #include
x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper
x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number
x86: Further clean up of mtrr/generic.c
x86: Clean up mtrr/main.c
x86: Clean up mtrr/state.c
x86: Clean up mtrr/mtrr.h
x86: Clean up mtrr/if.c
x86: Clean up mtrr/generic.c
x86: Clean up mtrr/cyrix.c
x86: Clean up mtrr/cleanup.c
x86: Clean up mtrr/centaur.c
x86: Clean up mtrr/amd.c:
x86: ds.c fix invalid assignment
Merge branch 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: remove all now-duplicate header files
x86: convert termios.h to the asm-generic version
x86: convert almost generic headers to asm-generic version
x86: convert trivial headers to asm-generic version
x86: add copies of some headers to convert to asm-generic
Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
x86, apic: Slim down stack usage in early_init_lapic_mapping()
x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources()
x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant
x86: Fix x86_model test in es7000_apic_is_cluster()
x86, apic: Move dmar_table_init() out of enable_IR()
x86, ioapic: Panic on irq-pin binding only if needed
x86/apic: Enable x2APIC without interrupt remapping under KVM
x86, apic: Drop redundant bit assignment
x86, ioapic: Throw BUG instead of NULL dereference
x86, ioapic: Introduce for_each_irq_pin() helper
x86: Remove superfluous NULL pointer check in destroy_irq()
x86/ioapic.c: unify ioapic_retrigger_irq()
x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop
x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments
x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop
x86/ioapic.c: simplify add_pin_to_irq_node()
x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop
x86/ioapic.c: move lost comment to what seems like appropriate place
x86/ioapic.c: remove redundant declaration of irq_pin_list
...
Ryusuke Konishi [Sat, 8 Aug 2009 07:09:46 +0000 (16:09 +0900)]
fs/Kconfig: move nilfs2 outside misc filesystems
Some people asked me questions like the following:
On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote:
> just wondering, any reasons why NILFS2 is one of the miscellaneous
> filesystems and, for example, btrfs, is not in Kconfig?
Actually, nilfs is NOT a filesystem came from other operating systems,
but a filesystem created purely for Linux. Nor is it a flash
filesystem but that for generic block devices.
So, this moves nilfs outside the misc category as I responded in LKML
"Re: Why does NILFS2 hide under Miscellaneous filesystems?"
(Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>).
Ryusuke Konishi [Sat, 15 Aug 2009 06:34:33 +0000 (15:34 +0900)]
nilfs2: allow btree code to directly call dat operations
The current btree code is written so that btree functions call dat
operations via wrapper functions in bmap.c when they allocate, free,
or modify virtual block addresses.
This abstraction requires additional function calls and causes
frequent call of nilfs_bmap_get_dat() function since it is used in the
every wrapper function.
This removes the wrapper functions and makes them available from
btree.c and direct.c, which will increase the opportunity of
compiler optimization.
Ryusuke Konishi [Sat, 15 Aug 2009 04:47:09 +0000 (13:47 +0900)]
nilfs2: remove individual gfp constants for each metadata file
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP,
and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP,
and can be removed.
nilfs2: shorten freeze period due to GC in write operation v3
This is a re-revised patch to shorten freeze period.
This version include a fix of the bug Konishi-san mentioned last time.
When GC is runnning, GC moves live block to difference segments.
Copying live blocks into memory is done in a transaction,
however it is not necessarily to be in the transaction.
This patch will get the nilfs_ioctl_move_blocks() out from
transaction lock and put it before the transaction.
I ran sysbench fileio test against nilfs partition.
I copied some DVD/CD images and created snapshot to create live blocks
before starting the benchmark.
Followings are summary of rc8 and rc8 w/ the patch of per-request
statistics, which is min/max and avg. I ran each test three times and
bellow is average of those numers.
According to this benchmark result, average time is slightly degrated.
However, worstcase (max) result is significantly improved.
This can address a few seconds write freeze.
- random write per-request performance of rc8
min 0.843ms
max 680.406ms
avg 3.050ms
- random write per-request performance of rc8 w/ this patch
min 0.843ms -> 100.00%
max 380.490ms -> 55.90%
avg 3.233ms -> 106.00%
- sequential write per-request performance of rc8
min 0.736ms
max 774.343ms
avg 2.883ms
- sequential write per-request performance of rc8 w/ this patch
min 0.720ms -> 97.80%
max 644.280ms-> 83.20%
avg 3.130ms -> 108.50%
-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----
protection_period 150
selection_policy timestamp # timestamp in ascend order
nsegments_per_clean 2
cleaning_interval 2
retry_interval 60
use_mmap
log_priority info
-----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<-----
Zhu Yanhai [Wed, 12 Aug 2009 06:17:59 +0000 (14:17 +0800)]
nilfs2: add more check routines in mount process
nilfs2: Add more safeguard routines and protections in mount process,
which also makes nilfs2 report consistency error messages when
checkpoint number is invalid.
Zhang Qiang [Sun, 9 Aug 2009 11:13:10 +0000 (19:13 +0800)]
nilfs2: An unassigned variable is assigned to a never used structure member
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is
mounted for the first time, local variable 'nilfs->ns_last_cno' is
used before loading the latest checkpoint number from disk (in
'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but
'sd.cno' has never been used in the procedure.
Ryusuke Konishi [Fri, 19 Jun 2009 06:25:42 +0000 (15:25 +0900)]
nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT
Alberto Bertogli advised me about bio_alloc() use in nilfs:
On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote:
> By the way, those bio_alloc()s are using GFP_NOWAIT but it looks
> like they could use at least GFP_NOIO or GFP_NOFS, since the caller
> can (and sometimes do) sleep. The only caller is nilfs_submit_bh(),
> which calls nilfs_submit_seg_bio() which can sleep calling
> wait_for_completion().
This takes in the comment and replaces the use of GFP_NOWAIT flag with
GFP_NOIO.
This removes nilfs_write_super and commit super block in nilfs
internal thread, instead of periodic write_super callback.
VFS layer calls ->write_super callback periodically. However,
it looks like that calling back is ommited when disk I/O is busy.
And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus
nilfs superblock is not synchronized as nilfs designed.
To avoid it, syncing superblock by nilfs thread instead of pdflush.
nilfs_write_super will call nilfs_commit_super to store super block
into block device. However, nilfs_put_super will call
nilfs_commit_super right after calling nilfs_write_super. So calling
nilfs_write_super in nilfs_put_super would be redundant.
nilfs2: always lookup disk block address before reading metadata block
The current metadata file code skips disk address lookup for its data
block if the buffer has a mapped flag.
This has a potential risk to cause read request to be performed
against the stale block address that GC moved, and it may lead to meta
data corruption. The mapped flag is safe if the buffer has an
uptodate flag, otherwise it may prevent necessary update of disk
address in the next read.
This will avoid the potential problem by ensuring disk address lookup
before reading metadata block even for buffers with the mapped flag.
Ryusuke Konishi [Sun, 2 Aug 2009 13:45:33 +0000 (22:45 +0900)]
nilfs2: use semaphore to protect pointer to a writable FS-instance
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to
retain a writable FS-instance for a period.
The pair functions were making up some kind of recursive lock with a
mutex, but they became overkill since the commit 201913ed746c7724a40d33ee5a0b6a1fd2ef3193. Furthermore, they caused
the following lockdep warning because the mutex can be released by a
task which didn't lock it:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at:
[<c1359ff5>] mutex_unlock+0x8/0xa
but there are no more locks to release!
other info that might help us debug this:
no locks held by kswapd0/422.
Unlike on most other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid this compile warning:
fs/nilfs2/recovery.c: In function 'recover_dsync_blocks':
fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
Ryusuke Konishi [Sat, 22 Aug 2009 10:10:07 +0000 (19:10 +0900)]
nilfs2: fix ignored error code in __nilfs_read_inode()
The __nilfs_read_inode function is ignoring the error code returned
from nilfs_read_inode_common(), and wrongly delivers a success code
(zero) when it escapes from the function in erroneous cases.
Zhenyu Wang [Mon, 14 Sep 2009 02:47:06 +0000 (10:47 +0800)]
agp/intel: remove restore in resume
As early pci resume has already restored config for host
bridge and graphics device, don't need to restore it again,
This removes an original order hack for graphics device restore.
This fixed the resume hang issue found by Alan Stern on 845G,
caused by extra config restore on graphics device.
Cc: Stable Team <stable@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Dave Airlie <airlied@linux.ie>
block: use blkdev_issue_discard in blk_ioctl_discard
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
the only difference between the two is that blkdev_issue_discard needs to
send a barrier discard request and blk_ioctl_discard a non-barrier one,
and blk_ioctl_discard needs to wait on the request. To facilitates this
add a flags argument to blkdev_issue_discard to control both aspects of the
behaviour. This will be very useful later on for using the waiting
funcitonality for other callers.
Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
David Woodhouse [Sat, 12 Sep 2009 05:35:37 +0000 (07:35 +0200)]
Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
The commands are conceptually writes, and in the case of IDE and SCSI
commands actually are writes. They were only reads because we thought
that would interact better with the elevators. Now the elevators know
about discard requests, that advantage no longer exists.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper
around it. DM needs this to avoid poking at the queue_limits directly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Seperate read and write statistics of in_flight requests
Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.
This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.
[ 5259.349897] aoe: bi_io_vec is NULL
[ 5259.349940] ------------[ cut here ]------------
[ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
[ 5259.349990] invalid opcode: 0000 [#1]
The bio in question is a barrier. Jens Axboe suggested that such bios
need to be recognized and ended with -EOPNOTSUPP by any driver that
provides its own ->make_request_fn handler and does not handle
barriers.
In testing the changes below eliminate the BUG.
(Better would be real barrier support, something that Ed says he'll add
for later in the .32 cycle. For now, this at least gets rid of a bug
with crashing on an empty barrier. Jens)
Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Eric Dumazet [Thu, 3 Sep 2009 14:08:06 +0000 (16:08 +0200)]
slub: fix slab_pad_check()
When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
slab padding, we call restore_bytes() on the whole slab, not only
on the padding.
Acked-by: Christoph Lameer <cl@linux-foundation.org> Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
V4L/DVB (12712): em28xx: properly load ir-kbd-i2c when needed
Currently, the logic to load ir i2c ancillary module is broken. It is
associated to Hauppauge devices with IR flag on their eeprom, no matter
if the device uses i2c or em28xx direct IR support. That's wrong.
Instead, add a flag to the boards that use i2c IR chips and load the
module only for those devices and if ir is not disabled.
V4L/DVB (12701): saa7134: ir-kbd-i2c init data needs a persistent object
ir-kbd-i2c's ir_probe() function can be called much later (i.e. at
ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data
allocated off of the stack in cx18_i2c_new_ir() at registration time.
Make sure we pass a pointer to a persistent IR_i2c_init_data object at
i2c registration time.
Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to
rise this question.
Before this patch, if ir-kbd-i2c were probed after SAA7134, trash data
were used.
Compile tested only, but the patch is identical to em28xx one. So, it
should work properly.
Andy Walls [Sat, 5 Sep 2009 13:58:37 +0000 (10:58 -0300)]
V4L/DVB (12699): cx18: ir-kbd-i2c initialization data should point to a persistent object
ir-kbd-i2c's ir_probe() function can be called much later (i.e. at ir-kbd-i2c
module load), than the lifetime of a struct IR_i2c_init_data allocated off of
the stack in cx18_i2c_new_ir() at registration time. Make sure we pass
a pointer to a persistent IR_i2c_init_data object at i2c registration time.
Thanks to Brian Rogers for pointing out a solution, and Dustin Mitchell for
testing against a 2.6.30 kernel.
V4L/DVB (12698): em28xx: ir-kbd-i2c init data needs a persistent object
ir-kbd-i2c's ir_probe() function can be called much later (i.e. at
ir-kbd-i2c module load), than the lifetime of a struct IR_i2c_init_data
allocated off of the stack in cx18_i2c_new_ir() at registration time.
Make sure we pass a pointer to a persistent IR_i2c_init_data object at
i2c registration time.
Thanks to Brian Rogers, Dustin Mitchell, Andy Walls and Jean Delvare to
rise this question.
Before this patch, if ir-kbd-i2c were probed after em28xx, trash data
were used. After the patch, no matter what order, it is properly
reported as tested by me:
input: i2c IR (i2c IR (EM2840 Hauppaug as /class/input/input10
ir-kbd-i2c: i2c IR (i2c IR (EM2840 Hauppaug detected at i2c-4/4-0030/ir0 [em28xx #0]
Joe Perches [Wed, 2 Sep 2009 04:12:13 +0000 (01:12 -0300)]
V4L/DVB (12703): gspca - sn9c20x: Reduces size of object
Use s16 instead of int where possible.
Use struct instead of arrays
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Brian Johnson <brijohn@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Janne Grunau [Tue, 1 Sep 2009 22:23:09 +0000 (19:23 -0300)]
V4L/DVB (12685): dvb-core: check fe->ops.set_frontend return value
Various frontend driver have parameter checks in their set_frontend
functions and return an error if the parameters are not supported,
tda10021 and cx24116 to name two.
The tuning ioctls FE_SET_FRONTEND/FE_SET_PROPERTY only change values
in the property cache and return before set_frontend is called. If a
set_frontend call in software zigzag algorithm fails and the card was
previously locked it will report a lock and the new parameters but is
still tuned to the old transport. This is not detectable from
userspace.
This change checks the return values of fe->ops.set_frontend and
changes the state to the added FESTATE_ERROR for software zigzag.
No lock will be reported to userspace if the State is FESTATE_ERROR.
Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Janne Grunau [Tue, 1 Sep 2009 22:15:39 +0000 (19:15 -0300)]
V4L/DVB (12684): DVB: make DVB_MAX_ADAPTERS configurable
Support for more than 8 DVB devices is requested regularly so make it
a kconfig variable instead of a header define. Values in the range 4-32
are tested.
Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Michael Krufky <mkrufky@kernellabs.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Hans de Goede [Mon, 31 Aug 2009 14:32:46 +0000 (11:32 -0300)]
V4L/DVB (12625): Add new V4L2_FMT_FLAG_EMULATED flag to videodev2.h
V4L2_FMT_FLAG_EMULATED 0x0002 This format is not native to the device but
emulated through software (usually libv4l2), where possible try to use a
native format instead for better performance.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Hans de Goede [Fri, 14 Aug 2009 13:40:26 +0000 (10:40 -0300)]
V4L/DVB (12621): gspca_mr97310a: Move detection of CIF sensor type to probe() function
gspca_mr97310a: Move detection of CIF sensor type to probe() function,
so that the right controls are set to disabled from the start, rather then
having them disappear all of a sudden when the stream is started.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>