If we share the tag map between two or more queues, then we cannot
use __set_bit() to set the bit. In fact we need to make sure we
atomically acquire this tag, so loop using test_and_set_bit() to
protect from that.
David Howells [Tue, 29 Aug 2006 18:06:31 +0000 (19:06 +0100)]
[PATCH] BLOCK: Make USB storage depend on SCSI rather than selecting it [try #6]
This makes CONFIG_USB_STORAGE depend on CONFIG_SCSI rather than selecting it,
as selecting it makes CONFIG_USB_STORAGE override the dependencies of SCSI,
causing it to turn on even if they aren't all met.
Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Sat, 30 Sep 2006 18:45:40 +0000 (20:45 +0200)]
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Tue, 29 Aug 2006 18:06:18 +0000 (19:06 +0100)]
[PATCH] BLOCK: Move the ReiserFS device ioctl compat stuff to the ReiserFS driver [try #6]
Move the ReiserFS device ioctl compat stuff from fs/compat_ioctl.c to the
ReiserFS driver so that the ReiserFS header file doesn't need to be included.
Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Tue, 29 Aug 2006 18:06:09 +0000 (19:06 +0100)]
[PATCH] BLOCK: Dissociate generic_writepages() from mpage stuff [try #6]
Dissociate the generic_writepages() function from the mpage stuff, moving its
declaration to linux/mm.h and actually emitting a full implementation into
mm/page-writeback.c.
The implementation is a partial duplicate of mpage_writepages() with all BIO
references removed.
It is used by NFS to do writeback.
Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Oleg Nesterov [Tue, 29 Aug 2006 07:15:14 +0000 (09:15 +0200)]
[PATCH] exit_io_context: don't disable irqs
We don't need to disable irqs to clear current->io_context, it is protected
by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of
current this irq_disable() can't help: current_io_context() will re-instantiate
->io_context after irq_enable().
We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't
prevent other CPUs from playing with our io_context anyway.
[PATCH] ll_rw_blk: allow more flexibility for read_ahead_kb store
It can make sense to set read-ahead larger than a single request.
We should not be enforcing such policy on the user. Additionally,
using the BLKRASET ioctl doesn't impose such a restriction. So
additionally we now expose identical behaviour through the two.
CFQ implements this on its own now, but it's really block layer
knowledge. Tells a device queue to start dispatching requests to
the driver, taking care to unplug if needed. Also fixes the issue
where as/cfq will invoke a stopped queue, which we really don't
want.
[PATCH] cfq-iosched: Kill O(N) runtime of cfq_resort_rr_list()
Currently it scales with number of processes in that priority group,
which is potentially not very nice as it's called quite often.
Basically we always need to do tail inserts, except for the case of a
new process. So just mark/detect a queue as such.
[PATCH] cfq-iosched: use new io context counting mechanism
It's ok if the read path is a lot more costly, as long as inc/dec is
really cheap. The inc/dec will happen for each created/freed io context,
while the reading only happens when a disk queue exits.
[PATCH] as-iosched: use new io context counting mechanism
It's ok if the read path is a lot more costly, as long as inc/dec is
really cheap. The inc/dec will happen for each created/freed io context,
while the reading only happens when a disk queue exits.
None of the in-kernel primitives for handling "atomic" counting seem
to be a good fit. We need something that is essentially free for
incrementing/decrementing, while the read side may be more expensive
as we only ever need to do that when a device is removed from the
kernel.
Use a per-cpu variable for maintaining a per-cpu ioc count and define
a reading mechanism that just sums up the values.
Jens Axboe [Tue, 29 Aug 2006 07:05:44 +0000 (09:05 +0200)]
[PATCH] cfq-iosched: kill cfq_exit_lock
cfq_exit_lock is protecting two things now:
- The per-ioc rbtree of cfq_io_contexts
- The per-cfqd linked list of cfq_io_contexts
The per-cfqd linked list can be protected by the queue lock, as it is (by
definition) per cfqd as the queue lock is.
The per-ioc rbtree is mainly used and updated by the process itself only.
The only outside use is the io priority changing. If we move the
priority changing to not browsing the rbtree, we can remove any locking
from the rbtree updates and lookup completely. Let the sys_ioprio syscall
just mark processes as having the iopriority changed and lazily update
the private cfq io contexts the next time io is queued, and we can
remove this locking as well.
[PATCH] cfq-iosched: cleanups, fixes, dead code removal
A collection of little fixes and cleanups:
- We don't use the 'queued' sysfs exported attribute, since the
may_queue() logic was rewritten. So kill it.
- Remove dead defines.
- cfq_set_active_queue() can be rewritten cleaner with else if conditions.
- Several places had cfq_exit_cfqq() like logic, abstract that out and
use that.
- Annotate the cfqq kmem_cache_alloc() so the allocator knows that this
is a repeat allocation if it fails with __GFP_WAIT set. Allows the
allocator to start freeing some memory, if needed. CFQ already loops for
this condition, so might as well pass the hint down.
- Remove cfqd->rq_starved logic. It's not needed anymore after we dropped
the crq allocation in cfq_set_request().
After Christophs SCSI change, the only usage left is RQ_ACTIVE
and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing
the request, so any check for RQ_INACTIVE in a driver is a bug and
indicates use-after-free.
So kill/clean the remaining users, straight forward.
Jens Axboe [Thu, 10 Aug 2006 06:59:11 +0000 (08:59 +0200)]
[PATCH] Remove struct request_list from struct request
It is always identical to &q->rq, and we only use it for detecting
whether this request came out of our mempool or not. So replace it
with an additional ->flags bit flag.
[PATCH] Remove ->waiting member from struct request
As the comments indicates in blkdev.h, we can fold it into ->end_io_data
usage as that is really what ->waiting is. Fixup the users of
blk_end_sync_rq().
Get rid of the as_rq request type. With the added elevator_private2, we
have enough room in struct request to get rid of any arq allocation/free
for each request.
Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>
Get rid of the cfq_rq request type. With the added elevator_private2, we
have enough room in struct request to get rid of any crq allocation/free
for each request.
[PATCH] deadline-iosched: remove elevator private drq request type
A big win, we now save an allocation/free on each request! With the
previous rb/hash abstractions, we can just reuse queuelist/donelist
for the FIFO data and be done with it.
[PATCH] elevator: abstract out the rbtree sort handling
The rbtree sort/lookup/reposition logic is mostly duplicated in
cfq/deadline/as, so move it to the elevator core. The io schedulers
still provide the actual rb root, as we don't want to impose any sort
of specific handling on the schedulers.
Introduce the helpers and rb_node in struct request to help migrate the
IO schedulers.
[PATCH] elevator: move the backmerging logic into the elevator core
Right now, every IO scheduler implements its own backmerging (except for
noop, which does no merging). That results in duplicated code for
essentially the same operation, which is never a good thing. This patch
moves the backmerging out of the io schedulers and into the elevator
core. We save 1.6kb of text and as a bonus get backmerging for noop as
well. Win-win!
Jens Axboe [Thu, 10 Aug 2006 06:44:47 +0000 (08:44 +0200)]
[PATCH] Split struct request ->flags into two parts
Right now ->flags is a bit of a mess: some are request types, and
others are just modifiers. Clean this up by splitting it into
->cmd_type and ->cmd_flags. This allows introduction of generic
Linux block message types, useful for sending generic Linux commands
to block devices.
Jean Delvare [Sat, 30 Sep 2006 15:18:59 +0000 (17:18 +0200)]
[PATCH] i2c: Prevent deadlock on i2c client registration
Delay the call to adapter->client_register() until after we are
certain that the client registration is a success. At this point the
client is fully initialized and we no longer hold the adapter->clist
mutex, so this should prevent the deadlocks if the client_register()
callback needs to take that mutex too, as is the case for the bttv
driver.
This fixes bug #7234.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (180 commits)
V4L/DVB (4641): Trivial: use lowercase letters in hex subsystem ids
V4L/DVB (4639): Cx88: add autodetection for alternate revision of Leadtek PVR
V4L/DVB (4638): Basic DVB-T and analog TV support for the HVR1300.
V4L/DVB (4637): Add a default method for VIDIOC_G_PARM
V4L/DVB (4635): Extend bttv and saa7134 to check for both AGP and PCI PCI failure case
V4L/DVB (4634): Zr36120: implement pcipci checks
V4L/DVB (4632): Zoran: Implement pcipci failure check
V4L/DVB (4631): Av7110: remove V4L2_CAP_VBI_CAPTURE flag
V4L/DVB (4630): Av7110: FW_LOADER depemdency fixed
V4L/DVB (4629): Saa7134: add card support for Proteus Pro 2309
V4L/DVB (4628): Fix VIDIOC_ENUMSTD ioctl in videodev.c
V4L/DVB (4627): Vivi crashes with mplayer
V4L/DVB (4626): On saa7111/7113, LUMA_CTRL need a different value
V4L/DVB (4624): Tvaudio: Replaced kernel_thread() with kthread_run()
V4L/DVB (4622): Copy-paste bug in videodev.c
V4L/DVB (4620): Fix AGC configuration for MOD3000P-based boards
V4L/DVB (4619): Fixes some I2C dependencies on V4L devices
V4L/DVB (4617): Problem with dibusb-mb.c USB IDs
V4L/DVB (4616): [PATCH] Nebula DigiTV USB RC support
V4L/DVB (4614): Export symbol saa7134_tvaudio_setmute from saa7134 for saa7134-alsa
...
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (48 commits)
ieee1394: raw1394: arm functions slept in atomic context
ieee1394: sbp2: enable auto spin-up for all SBP-2 devices
MAINTAINERS: updates to IEEE 1394 subsystem maintainership
ieee1394: ohci1394: check for errors in suspend or resume
set power state of firewire host during suspend
ieee1394: ohci1394: more obvious endianess handling
ieee1394: ohci1394: fix endianess bug in debug message
ieee1394: sbp2: don't prefer MODE SENSE 10
ieee1394: nodemgr: grab class.subsys.rwsem in nodemgr_resume_ne
ieee1394: nodemgr: fix rwsem recursion
ieee1394: sbp2: more help in Kconfig
ieee1394: sbp2: prevent rare deadlock in shutdown
ieee1394: sbp2: update includes
ieee1394: sbp2: better handling of transport errors
ieee1394: sbp2: recheck node generation in sbp2_update
ieee1394: sbp2: safer agent reset in error handlers
ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
CONFIG_PM=n slim: drivers/ieee1394/ohci1394.c
ieee1394: safer definition of empty macros
video1394: add poll file operation support
...
Merge branch 'intelfb-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/intelfb-2.6
* 'intelfb-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/intelfb-2.6:
intelfbhw.c: intelfbhw_get_p1p2 defined but not used
intelfb: fix mtrr_reg signedness
intelfb: update doc and Kconfig (supported devices)
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add preliminary i2c support
intelfb: add vsync interrupt support
intelfb: add vsync interrupt support
intelfb: add vsync interrupt support
intelfb: add vsync interrupt support
intelfb: add vsync interrupt support
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] Use early clobber in semaphores
[PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it
[PATCH] Re-positioning the bss segment
[PATCH] Use ARRAY_SIZE in setup.c
[PATCH] i386: replace intermediate array-size definitions with ARRAY_SIZE()
[PATCH] x86: Clean up x86 NMI sysctls
[PATCH] Refactor some duplicated code in mpparse.c
[PATCH] Document iommu=panic
[PATCH] Fix broken indentation in iommu_setup
[PATCH] Allow disabling DAC using command line options
[PATCH] Add proper sparse __user casts to __copy_to_user_inatomic
[PATCH] i386: Update defconfig
[PATCH] Update defconfig
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[ATM]: [lec] use refcnt to protect lec_arp_entries outside lock
[ATM]: [lec] add reference counting to lec_arp entries
[ATM]: [lec] use work queue instead of timer for lec arp expiry
[ATM]: [lec] old_close is no longer used
[ATM]: [lec] convert lec_arp_table to hlist
[ATM]: [lec] header indent, comment and whitespace cleanup
[ATM]: [lec] indent, comment and whitespace cleanup [continued]
[ATM]: [lec] indent, comment and whitespace cleanup
[SCTP]: Do not timestamp every SCTP packet.
[SCTP]: Use correct mask when disabling PMTUD.
[SCTP]: Include sk_buff overhead while updating the peer's receive window.
[SCTP]: Enable Nagle algorithm by default.
[BNX2]: Disable MSI on 5706 if AMD 8132 bridge is present.
[NetLabel]: audit fixups due to delayed feedback
We only need the timestamp on COOKIE-ECHO chunks, so instead of always
timestamping every SCTP packet, let common code timestamp if the socket
option is set. For COOKIE-ECHO, simply get the time of day if we don't
have a timestamp. This introduces a small possibility that the cookie
may be considered expired, but it will be renegotiated.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[SCTP]: Include sk_buff overhead while updating the peer's receive window.
Currently if the sender is sending small messages, it can cause a receiver
to run out of receive buffer space even when the advertised receive window
is still open and results in packet drops and retransmissions. Including
a overhead while updating the sender's view of peer receive window will
reduce the chances of receive buffer space overshooting the receive window.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sat, 30 Sep 2006 00:06:23 +0000 (17:06 -0700)]
[BNX2]: Disable MSI on 5706 if AMD 8132 bridge is present.
MSI is defined to be 32-bit write. The 5706 does 64-bit MSI writes
with byte enables disabled on the unused 32-bit word. This is legal
but causes problems on the AMD 8132 which will eventually stop
responding after a while.
Without this patch, the MSI test done by the driver during open will
pass, but MSI will eventually stop working after a few MSIs are
written by the device.
AMD believes this incompatibility is unique to the 5706, and
prefers to locally disable MSI rather than globally disabling it
using pci_msi_quirk.
Update version to 1.4.45.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sat, 30 Sep 2006 00:05:05 +0000 (17:05 -0700)]
[NetLabel]: audit fixups due to delayed feedback
Fix some issues Steve Grubb had with the way NetLabel was using the audit
subsystem. This should make NetLabel more consistent with other kernel
generated audit messages specifying configuration changes.
Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[AK: This apparently broke some systems, but we need it to fix
a compile problem with old binutils and in theory the patch
is correct. So let's trying reenabling it again.]
o Currently bss segment is being placed somewhere in the middle (after .data)
section and after bss lots of init section and data sections are coming.
Is it intentional?
o One side affect of placing bss in the middle is that objcopy keeps the
bss in raw binary image (vmlinux.bin) hence unnecessarily increasing
the size of raw binary image. (In my case ~600K). It also increases
the size of generated bzImage, though the increase is very small
(896 bytes), probably a very high compression ratio for stream
of zeros.
o This patch moves the bss at the end hence reducing the size of
bzImage by 896 bytes and size of vmlinux.bin by 600K.
o This change benefits in the context of relocatable kernel patches. If
kernel bss is not part of compressed data (vmlinux.bin) then it does
not have to be decompressed and this area can be used by the decompressor
for its execution hence keeping the memory requirements bounded and
decompressor code does not stomp over any other data loaded beyond
kernel image (As might be the case with bootloaders like kexec).