git.karo-electronics.de Git - mv-sheeva.git/log

block: drop request->hard_* and *nr_sectors

struct request has had a few different ways to represent some
properties of a request.  ->hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers.  The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion.  In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.

Another interesting duplicate fields are rq->[hard_]nr_sectors and
rq->{hard_cur|current}_nr_sectors against rq->data_len and
rq->bio->bi_size.  This is more convoluted than the hard_ case.

rq->[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests.  This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.

rq->{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front.  This is mainly used by drivers
which transfers data by walking request segment-by-segment.  This
value always equals rq->bio->bi_size >> 9.  However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.

In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs.  With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly.  Drop all the duplicates.  Now rq->sector
means the current sector, rq->data_len the current total length and
rq->bio->bi_size the current segment length.  Everything else is
defined in terms of these three and available only through accessors.

* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
  now handles pc and fs requests equally other than rq->sector update.
  This means that now pc requests can use partial completion too (no
  in-kernel user yet tho).

* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
  now uses byte count as the primary data length.

* blk_rq_pos() is now guranteed to be always correct.  In-block users
  converted.

* blk_rq_bytes() is now guaranteed to be always valid as is
  blk_rq_sectors().  In-block users converted.

* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
  More convenient one is used.

* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
  pointer to request.

[ Impact: API cleanup, single way to represent one property of a request ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ide: convert to rq pos and nr_sectors accessors

ide doesn't manipulate request fields anymore and thus all hard and
their soft equivalents are always equal. Convert all references to
accessors.

[ Impact: use pos and nr_sectors accessors ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: convert to pos and nr_sectors accessors

With recent cleanups, there is no place where low level driver
directly manipulates request fields. This means that the 'hard'
request fields always equal the !hard fields. Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.

While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

[ Impact: use pos and nr_sectors accessors ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dario Ballabio <ballabio_dario@emc.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: unsik Kim <donari75@gmail.com>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones

Implement accessors - blk_rq_pos(), blk_rq_sectors() and
blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
and rq->hard_cur_sectors respectively and convert direct references of
the said fields to the accessors.

This is in preparation of request data length handling cleanup.

Geert : suggested adding const to struct request * parameter to accessors
Sergei : spotted error in patch description

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: add rq->resid_len

rq->data_len served two purposes - the length of data buffer on issue
and the residual count on completion.  This duality creates some
headaches.

First of all, block layer and low level drivers can't really determine
what rq->data_len contains while a request is executing.  It could be
the total request length or it coulde be anything else one of the
lower layers is using to keep track of residual count.  This
complicates things because blk_rq_bytes() and thus
[__]blk_end_request_all() relies on rq->data_len for PC commands.
Drivers which want to report residual count should first cache the
total request length, update rq->data_len and then complete the
request with the cached data length.

Secondly, it makes requests default to reporting full residual count,
ie. reporting that no data transfer occurred.  The residual count is
an exception not the norm; however, the driver should clear
rq->data_len to zero to signify the normal cases while leaving it
alone means no data transfer occurred at all.  This reverse default
behavior complicates code unnecessarily and renders block PC on some
drivers (ide-tape/floppy) unuseable.

This patch adds rq->resid_len which is used only for residual count.

While at it, remove now unnecessasry blk_rq_bytes() caching in
ide_pc_intr() as rq->data_len is not changed anymore.

Boaz : spotted missing conversion in osd
Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape

[ Impact: cleanup residual count handling, report 0 resid by default ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ide-tape: don't initialize rq->sector for rw requests

rq->sector is set to the tape->first_frame but it's never actually
used and not even in the correct unit (512 byte sectors). Don't set
it.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

nbd: don't clear rq->sector and nr_sectors unnecessarily

There's no reason to clear rq->sector and nr_sectors after calling
blk_rq_init(). They're guaranteed to be clear. Drop unnecessary
clearing.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

mg_disk: use defines from <linux/ata.h>

While at it:
- remove MG_REG_HEAD_MUST_BE_ON define
- remove MG_REG_CTRL_INTR_ENABLE define
- remove MG_REG_HEAD_LBA_MODE define
- remove unused defines

Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

mg_disk: fix dependency on libata

Add local copies of ata_id_string() and ata_id_c_string() to mg_disk
so there is no need for the driver to depend on ATA and SCSI.

[ Impact: break dependency on libata by copying ata id string functions ]

Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

mg_disk: clean up request completion paths

mg_disk implements its own partial completion. Convert to standard
block layer partial completion.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

mg_disk: fold mg_disk.h into mg_disk.c

include/linux/mg_disk.h is used only by drivers/block/mg_disk.c. No
reason to put it in a separate header. Fold it into mg_disk.c.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

swim: clean up request completion paths

swim curiously tries to update request parameters before calling
__blk_end_request() when __blk_end_request() will do it anyway and
unnecessarily checks whether current_nr_sectors is zero right after
fetching.

Drop unnecessary stuff and use standard block layer mechanisms.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

swim3: clean up request completion paths

swim3 curiously tries to update request parameters before calling
__blk_end_request() when __blk_end_request() will do it anyway, and it
updates request for partial completion manually instead of using
blk_update_request(). Also, it does some spurious checks on rq such
as testing whether rq->sector is negative or current_nr_sectors is
zero right after fetching.

Drop unnecessary stuff and use standard block layer mechanisms.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

hd: clean up request completion paths

hd read/write_intr() functions manually manipulate request to
incrementally complete it, which block layer already supports. Simply
use block layer completion routines instead of manual partial
completion.

While at it, clear unnecessary elv_next_request() check at the tail of
read_intr(). This also makes read and write_intr() more consistent.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ubd: drop unnecessary rq->sector manipulation

ubd curiously updates rq->sector while issuing the request in multiple
pieces. Don't do it and simply use local copy of sector.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ubd: cleanup completion path

ubd had its own block request partial completion mechanism, which is
unnecessary as block layer already does it. Kill ubd_end_request()
and ubd_finish() and replace them with direct call to
blk_end_request().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

sunvdc: kill vdc_end_request()

vdc_end_request() is a thin silly wrapper on top of
__blk_end_request(). Kill it.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ps3disk: simplify request completion

ps3disk_interrupt() always completes requests fully but it uses
rq->hard_cur_sectors for FLUSH requests for some reason. Drop them
and simply use __blk_end_request_all().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

amiflop,ataflop,xd,mg_disk: clean up unnecessary stuff from block drivers

rq_data_dir() can only be READ or WRITE and rq->sector and nr_sectors
are always automatically updated after partial request completion.
Don't worry about rq_data_dir() not being either READ or WRITE or
manually update sector and nr_sectors.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jörg Dorchain <joerg@dorchain.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: don't init rq fields unnecessarily

blk_get_request() always returns properly zeroed requests. Don't set
fields to zero/NULL unnecessarily.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make blk_end_request_cur() return bool

In the process of mindlessly copying [__]blk_end_request_all(),
[__]blk_end_request_cur() ended up returning void even though they're
partial completion functions. Fix it.

[ Impact: fix braindead API ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: catch trying to use more bits than request->cmd_flags has

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: include discard requests in IO accounting

We currently don't do merging on discard requests, but we potentially
could. If we do, then we need to include discard requests in the IO
accounting, or merging would end up decrementing in_flight IO counters
for an IO which never incremented them.

So enable accounting for discard requests.

Problem found by Nikanth Karthikesan <knikanth@suse.de>

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: make blk_do_io_stat() do the full "is this rq accountable" checks

We currently check for file system requests outside of blk_do_io_stat(rq),
but we may as well just include it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: kill rq->data

Now that all block request data transfer is done via bio, rq->data
isn't used. Kill it.

While at it, make the roles of rq->special and buffer clear.

[ Impact: drop now unncessary field from struct request ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>

arm-omap: don't abuse rq->data

omap mailbox uses rq->data as the second opaque pointer to carry
mbox_msg_t and rq->special message argument which is needed only for
tx. Add and use omap_msg_tx_data struct for tx and use rq->special
for mbox_msg_t for rx such that only rq->special is used as opaque
pointer.

[ Impact: cleanup rq->data usage, extra kmalloc in msg_send ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>

block: replace end_request() with [__]blk_end_request_cur()

end_request() has been kept around for backward compatibility;
however, it's about time for it to go away.

* There aren't too many users left.

* Its use of @updtodate is pretty confusing.

* In some cases, newer code ends up using mixture of end_request() and
  [__]blk_end_request[_all](), which is way too confusing.

So, add [__]blk_end_request_cur() and replace end_request() with it.
Most conversions are straightforward.  Noteworthy ones are...

* paride/pcd: next_request() updated to take 0/-errno instead of 1/0.

* paride/pf: pf_end_request() and next_request() updated to take
  0/-errno instead of 1/0.

* xd: xd_readwrite() updated to return 0/-errno instead of 1/0.

* mtd/mtd_blkdevs: blktrans_discard_request() updated to return
  0/-errno instead of 1/0.  Unnecessary local variable res
  initialization removed from mtd_blktrans_thread().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Joerg Dorchain <joerg@dorchain.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Laurent Vivier <Laurent@lvivier.info>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: unsik Kim <donari75@gmail.com>

block: implement and use [__]blk_end_request_all()

There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion.  Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.

This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request.  BUG_ON() is added to to ensure that
this actually happens.

Most conversions are simple but there are a few noteworthy ones.

* cdrom/viocd: viocd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/block/dasd: dasd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/char/tape_block: tapeblock_end_request() replaced with direct
  calls to blk_end_request_all().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>

block: move rq->start_time initialization to blk_rq_init()

rq->start_time was initialized in init_request_from_bio() so special
requests didn't have start_time set.  This has been okay as start_time
has been used only for fs requests; however, there is no indication of
this actually is the case or not.  Set rq->start_time in blk_rq_init()
and guarantee that all initialized rq's have its start_time set.  This
improves consistency at virtually no cost and future changes will make
use of the timestamp for !bio requests.

[ Impact: rq->start_time is valid for all requests ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: clean up request completion API

Request completion has gone through several changes and became a bit
messy over the time.  Clean it up.

1. end_that_request_data() is a thin wrapper around
   end_that_request_data_first() which checks whether bio is NULL
   before doing anything and handles bidi completion.
   blk_update_request() is a thin wrapper around
   end_that_request_data() which clears nr_sectors on the last
   iteration but doesn't use the bidi completion.

   Clean it up by moving the initial bio NULL check and nr_sectors
   clearing on the last iteration into end_that_request_data() and
   renaming it to blk_update_request(), which makes blk_end_io() the
   only user of end_that_request_data().  Collapse
   end_that_request_data() into blk_end_io().

2. There are four visible completion variants - blk_end_request(),
   __blk_end_request(), blk_end_bidi_request() and end_request().
   blk_end_request() and blk_end_bidi_request() uses blk_end_request()
   as the backend but __blk_end_request() and end_request() use
   separate implementation in __blk_end_request() due to different
   locking rules.

   blk_end_bidi_request() is identical to blk_end_io().  Collapse
   blk_end_io() into blk_end_bidi_request(), separate out request
   update into internal helper blk_update_bidi_request() and add
   __blk_end_bidi_request().  Redefine [__]blk_end_request() as thin
   inline wrappers around [__]blk_end_bidi_request().

3. As the whole request issue/completion usages are about to be
   modified and audited, it's a good chance to convert completion
   functions return bool which better indicates the intended meaning
   of return values.

4. The function name end_that_request_last() is from the days when it
   was a public interface and slighly confusing.  Give it a proper
   internal name - blk_finish_request().

5. Add description explaning that blk_end_bidi_request() can be safely
   used for uni requests as suggested by Boaz Harrosh.

The only visible behavior change is from #1.  nr_sectors counts are
cleared after the final iteration no matter which function is used to
complete the request.  I couldn't find any place where the code
assumes those nr_sectors counters contain the values for the last
segment and this change is good as it makes the API much more
consistent as the end result is now same whether a request is
completed using [__]blk_end_request() alone or in combination with
blk_update_request().

API further cleaned up per Christoph's suggestion.

[ Impact: cleanup, rq->*nr_sectors always updated after req completion ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>

block: kill blk_end_request_callback()

With recent IDE updates, blk_end_request_callback() doesn't have any
user now. Kill it.

[ Impact: removal of unused convoluted interface ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: reorganize request fetching functions

Impact: code reorganization

elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation. They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.

Move the two functions to blk-core.c. This prepares for further
interface cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>

block: reorder request completion functions

Reorder request completion functions such that

* All request completion functions are located together.

* Functions which are used by only one caller is put right above the
caller.

* end_request() is put after other completion functions but before
blk_update_request().

This change is for completion function cleanup which will follow.

[ Impact: cleanup, code reorganization ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: clean up misc stuff after block layer timeout conversion

* In blk_rq_timed_out_timer(), else { if } to else if

* In blk_add_timer(), simplify if/else block

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: cleanup REQ_SOFTBARRIER usages

blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is
now only used in elevator proper and for discard requests.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: don't set REQ_NOMERGE unnecessarily

RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
mergeable.  There is no reason to specify it superflously.  It only
adds to confusion.  Don't set REQ_NOMERGE for barriers and requests
with specific queueing directive.  REQ_NOMERGE is now exclusively used
by the merging code.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: kill blk_start_queueing()

blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion. None of the current users depends on
blk_start_queueing() running request_fn directly. Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.

[ Impact: removal of mostly duplicate interface function ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: merge blk_invoke_request_fn() into __blk_run_queue()

__blk_run_queue wraps blk_invoke_request_fn() such that it
additionally removes plug and bails out early if the queue is empty.
Both extra operations have their own pending mechanisms and don't
cause any harm correctness-wise when they are done superflously.

The only user of blk_invoke_request_fn() being blk_start_queue(),
there isn't much reason to keep both functions around. Merge
blk_invoke_request_fn() into __blk_run_queue() and make
blk_start_queue() use __blk_run_queue() instead.

[ Impact: merge two subtly different internal functions ]

Signed-off-by: Tejun Heo <tj@kernel.org>

block: implement blkdev_readpages

Doing a proper block dev ->readpages() speeds up the crazy dump(8)
approach of using interleaved process IO.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: enable by default support for large devices and files on 32-bit archs

Enable by default support for large devices and files (CONFIG_LBD):

- With 1TB disks being a commodity hardware it is quite easy to hit 2TB
  limitation while building RAIDs etc. and many distros have been using
  CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1).

- This should also prevent a subtle ext4 filesystem compatibility issue:
  mke2fs.ext4 defaults to creating filesystems with huge_files feature
  enabled and such filesystems cannot be later mounted read-write on
  machines with CONFIG_LBD=n (it should be quite easy to hit this issue
  when trying to use filesystem created using distro kernel on system
  running the self-build kernel, think about USB disk enclosures & co.).

While at it:

- Clarify config option help text w.r.t. mounting ext4 filesystems
  (they can be mounted with CONFIG_LBD=n but in the read-only mode).

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ide-dma: don't reset request fields on dma_timeout_retry()

Impact: drop unnecessary code

Now that everything uses bio and block operations, there is no need to
reset request fields manually when retrying a request. Every field is
guaranteed to be always valid. Drop unnecessary request field
resetting from ide_dma_timeout_retry().

Signed-off-by: Tejun Heo <tj@kernel.org>

ide: drop rq->data handling from ide_map_sg()

Impact: remove code path which is no longer necessary

All IDE data transfers now use rq->bio. Simplify ide_map_sg()
accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-atapi: kill unused fields and callbacks

Impact: remove fields and code paths which are no longer necessary

Now that ide-tape uses standard mechanisms to transfer data, special
case handling for bh handling can be dropped from ide-atapi. Drop the
followings.

* pc->cur_pos, b_count, bh and b_data
* drive->pc_update_buffers() and pc_io_buffers().

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: simplify read/write functions

Impact: cleanup

idetape_chrdev_read/write() functions are unnecessarily complex when
everything can be handled in a single loop. Collapse
idetape_add_chrdev_read/write_request() into the rw functions and
simplify the implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: use byte size instead of sectors on rw issue functions

Impact: cleanup

Byte size is what most issue functions deal with, make
idetape_queue_rw_tail() and its wrappers take byte size instead of
sector counts. idetape_chrdev_read() and write() functions are
converted to use tape->buffer_size instead of ctl from tape->cap.

This cleans up code a little bit and will ease the next r/w
reimplementation.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: unify r/w init paths

Impact: cleanup

Read and write init paths are almost identical. Unify them into
idetape_init_rw().

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: kill idetape_bh

Impact: kill now unnecessary idetape_bh

With everything using standard mechanisms, there is no need for
idetape_bh anymore. Kill it and use tape->buf, cur and valid to
describe data buffer instead.

Changes worth mentioning are...

* idetape_queue_rq_tail() now always queue tape->buf and and adjusts
buffer state properly before completion.

* idetape_pad_zeros() clears the buffer only once.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: use standard data transfer mechanism

Impact: use standard way to transfer data

ide-tape uses rq in an interesting way.  For r/w requests, rq->special
is used to carry a private buffer management structure idetape_bh and
rq->nr_sectors and current_nr_sectors are initialized to the number of
idetape blocks which isn't necessary 512 bytes.  Also,
rq->current_nr_sectors is used to report back the residual count in
units of idetape blocks.

This peculiarity taxes both block layer and ide.  ide-atapi has
different paths and hooks to accomodate it and what a rq means becomes
quite confusing and making changes at the block layer becomes quite
difficult and error-prone.

This patch makes ide-tape use bio instead.  With the previous patch,
ide-tape currently is using single contiguos buffer so replacing it
isn't difficult.  Data buffer is mapped into bio using
blk_rq_map_kern() in idetape_queue_rw_tail().  idetape_io_buffers()
and idetape_update_buffers() are dropped and pc->bh is set to null to
tell ide-atapi to use standard data transfer mechanism and idetape_bh
byte counts are updated by the issuer on completion using the residual
count.

This change also nicely removes the FIXME in ide_pc_intr() where
ide-tape rqs need to be completed using ide_rq_bytes() instead of
blk_rq_bytes() (although this didn't really matter as the request
didn't have bio).

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>

ide-tape: use single continuous buffer

Impact: simpler buffer allocation and handling, kills OOM, fix DMA transfers

ide-tape has its own multiple buffer mechanism using struct
idetape_bh.  It allocates buffer with decreasing order-of-two
allocations so that it results in minimum number of segments.
However, the implementation is quite complex and works in a way that
no other block or ide driver works necessitating a lot of special case
handling.

The benefit this complex allocation scheme brings is questionable as
PIO or DMA the number of segments (16 maximum) doesn't make any
noticeable difference and it also doesn't negate the need for multiple
order allocation which can fail under memory pressure or high
fragmentation although it does lower the highest order necessary by
one when the buffer size isn't power of two.

As the first step to remove the custom buffer management, this patch
makes ide-tape allocate single continous buffer.  The maximum order is
four.  I doubt the change would cause any trouble but if it ever
matters, it should be converted to regular sg mechanism like everyone
else and even in that case dropping custom buffer handling and moving
to standard mechanism first make sense as an intermediate step.

This patch makes the first bh to contain the whole buffer and drops
multi bh handling code.  Following patches will make further changes.

This patch has the side effect of killing OOM triggered by allocation
path and fixing DMA transfers.  Previously, bug in alloc path
triggered OOM on command issue and commands were passed to DMA engine
without DMA-mapping all the segments.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len

Impact: allow residual count implementation in ->pc_callback()

rq->data_len has two duties - carrying the number of input bytes on
issue and carrying residual count back to the issuer on completion.
ide-atapi completion callback ->pc_callback() is the right place to do
this but currently ide-atapi depends on rq->data_len carrying the
original request size after calling ->pc_callback() to complete the pc
request.

This patch makes ide_pc_intr(), ide_tape_issue_pc() and
ide_floppy_issue_pc() cache length to complete before calling
->pc_callback() so that it can modify rq->data_len as necessary.

Note: As using rq->data_len for two purposes can make cases like this
incorrect in subtle ways, future changes will introduce separate
field for residual count.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>

ide-tape,floppy: fix failed command completion after request sense

Impact: fix infinite retry loop

After a command failed, ide-tape and floppy inserts REQUEST_SENSE in
front of the failed command and according to the result, sets
pc->retries, flags and errors.  After REQUEST_SENSE is complete, the
failed command is again at the front of the queue and if the verdict
was to terminate the request, the issue functions tries to complete it
directly by calling drive->pc_callback() and returning ide_stopped.

However, drive->pc_callback() doesn't complete a request.  It only
prepares for completion of the request.  As a result, this creates an
infinite loop where the failed request is retried perpetually.

Fix it by actually ending the request by calling ide_complete_rq().

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-pm: don't abuse rq->data

Impact: cleanup rq->data usage

ide-pm uses rq->data to carry pointer to struct request_pm_state
through request queue and rq->special is used to carray pointer to
local struct ide_cmd, which isn't necessary. Use rq->special for
request_pm_state instead and use local ide_cmd in
ide_start_power_step().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-cd,atapi: use bio for internal commands

Impact: unify request data buffer handling

rq->data is used mostly to pass kernel buffer through request queue
without using bio.  There are only a couple of places which still do
this in kernel and converting to bio isn't difficult.

This patch converts ide-cd and atapi to use bio instead of rq->data
for request sense and internal pc commands.  With previous change to
unify sense request handling, this is relatively easily achieved by
adding blk_rq_map_kern() during sense_rq prep and PC issue.

If blk_rq_map_kern() fails for sense, the error is deferred till sense
issue and aborts the failed command which triggered the sense.  Note
that this is a slim possibility as sense prep is done on each command
issue, so for the above condition to actually trigger, all preps since
the last sense issue till the issue of the request which would require
a sense should fail.

* do_request functions might sleep now.  This should be okay as ide
  request_fn - do_ide_request() - is invoked only from make_request
  and plug work.  Make sure this is the case by adding might_sleep()
  to do_ide_request().

* Functions which access the read sense data before the sense request
  is complete now should access bio_data(sense_rq->bio) as the sense
  buffer might have been copied during blk_rq_map_kern().

* ide-tape updated to map sg.

* cdrom_do_block_pc() now doesn't have to deal with REQ_TYPE_ATA_PC
  special case.  Simplified.

* tp_ops->output/input_data path dropped from ide_pc_intr().

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer

Since we're issuing REQ_TYPE_SENSE now we need to allow those types of
rqs in the ->do_request callbacks. As a future improvement, sense_len
assignment might be unified across all ATAPI devices. Borislav to
check with specs and test.

As a result, get rid of ide_queue_pc_head() and
drive->request_sense_rq.

tj: * Init request sense ide_atapi_pc from sense request.  In the
      longer timer, it would probably better to fold
      ide_create_request_sense_cmd() into its only current user -
      ide_floppy_get_format_progress().

    * ide_retry_pc() no longer takes @disk.

CC: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

ide-cd: convert to using generic sense request

Preallocate a sense request in the ->do_request method and reinitialize
it only on demand, in case it's been consumed in the IRQ handler path.
The reason for this is that we don't want to be mapping rq to bio in
the IRQ path and introduce all kinds of unnecessary hacks to the block
layer.

tj: * Both user and kernel PC requests expect sense data to be stored
      in separate storage other than drive->sense_data.  Copy sense
      data to rq->sense on completion if rq->sense is not NULL.  This
      fixes bogus sense data on PC requests.

As a result, remove cdrom_queue_request_sense.

CC: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

ide: add helpers for preparing sense requests

This is in preparation of removing the queueing of a sense request out
of the IRQ handler path.

Use struct request_sense as a general sense buffer for all ATAPI
devices ide-{floppy,tape,cd}.

tj: * blk_get_request(__GFP_WAIT) can't be called from do_request() as
      it can cause deadlock.  Converted to use inline struct request
      and blk_rq_init().

    * Added xfer / cdb len selection depending on device type.

    * All sense prep logics folded into ide_prep_sense() which never
      fails.

    * hwif->rq clearing and sense_rq used handling moved into
      ide_queue_sense_rq().

    * blk_rq_map_kern() conversion is moved to later patch.

CC: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

ide-cd: don't abuse rq->buffer

Impact: rq->buffer usage cleanup

ide-cd uses rq->buffer to carry pointer to the original request when
issuing REQUEST_SENSE. Use rq->special instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-atapi: don't abuse rq->buffer

Impact: rq->buffer usage cleanup

ide-atapi uses rq->buffer as private opaque value for internal special
requests. rq->special isn't used for these cases (the only case where
rq->special is used is for ide-tape rw requests). Use rq->special
instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-taskfile: don't abuse rq->buffer

Impact: rq->buffer usage cleanup

ide_raw_taskfile() directly uses rq->buffer to carry pointer to the
data buffer. This complicates both block interface and ide backend
request handling. Use blk_rq_map_kern() instead and drop special
handling for REQ_TYPE_ATA_TASKFILE from ide_map_sg().

Note that REQ_RW setting is moved upwards as blk_rq_map_kern() uses it
to initialize bio rw flag.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-floppy: block pc always uses bio

Impact: remove unnecessary code path

Block pc requests always use bio and rq->data is always NULL. No need
to worry about !rq->bio cases in idefloppy_block_pc_cmd(). Note that
ide-atapi uses ide_pio_bytes() for bio PIO transfer which handle sg
fine.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

ide-cd: clear sense buffer before issuing request sense

Impact: code simplification

ide_cd_request_sense_fixup() clears the tail of the sense buffer if
the device didn't completely fill it. This patch makes
cdrom_queue_request_sense() clear the sense buffer before issuing the
command instead of clearing it afterwards. This simplifies code and
eases future changes.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide kill unused ide_cmd->special

Impact: removal of unused field

No one uses ide_cmd->special anymore. Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide: don't set REQ_SOFTBARRIER

ide doesn't have to worry about REQ_SOFTBARRIER. Don't set it.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide: use blk_run_queue() instead of blk_start_queueing()

blk_start_queueing() is being phased out in favor of
[__]blk_run_queue(). Switch.

Signed-off-by: Tejun Heo <tj@kernel.org>

ide-tape: remove back-to-back REQUEST_SENSE detection

Impact: fix an oops which always triggers

ide_tape_issue_pc() assumed drive->pc isn't NULL on invocation when
checking for back-to-back request sense issues but drive->pc can be
NULL and even when it's not NULL, it's not safe to dereference it once
the previous command is complete because pc could have been freed or
was on stack. Kill back-to-back REQUEST_SENSE detection.

Signed-off-by: Tejun Heo <tj@kernel.org>

block: clear req->errors on bio completion only for fs requests

Impact: subtle behavior change

For fs requests, rq is only carrier of bios and rq error status as a
whole doesn't mean much.  This is the reason why rq->errors is being
cleared on each partial completion of a request as on each partial
completion the error status is transferred to the respective bios.

For pc requests, rq->errors is used to carry error status to the
issuer and thus __end_that_request_first() doesn't clear it on such
cases.

The condition was fine till now as only fs and pc requests have used
bio and thus the bio completion path.  However, future changes will
unify data accesses to bio and all non fs users care about rq error
status.  Clear rq->errors on bio completion only for fs requests.

In general, the implicit clearing is a bit too subtle especially as
the meaning of rq->errors is completely dependent on low level
drivers.  Unifying / cleaning up rq->errors usage and letting llds
manage it would be better.  TODO comment added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>

loop: use BIO list management functions

Now that the bio list management stuff is generic, convert loop to use
bio lists instead of its own private bio list implementation.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

hd: fix locking

hd dance around local irq and HD_IRQ enable without achieving much.
It ends up transferring data from irq handler with both local irq and
HD_IRQ disabled. The only place it actually does something is while
transferring the first block of a request which it does with HD_IRQ
disabled but local irq enabled.

Unfortunately, the dancing is horribly broken from locking POV. IRQ
and timeout handlers access block queue without grabbing the queue
lock and running the driver in SMP configuration crashes the whole
machine pretty quickly.

Remove meaningless irq enable/disable dancing and add proper locking
in issue, irq and timeout paths.

Signed-off-by: Tejun Heo <tj@kernel.org>

mg_disk: fix CONFIG_LBD=y warning

drivers/block/mg_disk.c: In function ‘mg_dump_status’:
drivers/block/mg_disk.c:265: warning: format ‘%ld’ expects type ‘long int’, but
argument 2 has type ‘sector_t’

[ Impact: kill build warning ]

Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

mg_disk: fix locking

IRQ and timeout handlers call functions which expect locked queue lock
without locking it. Fix it.

While at it, convert 0s used as null pointer constant to NULLs.

[ Impact: fix locking, cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>

sparc: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sh: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

s390: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

powerpc: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mn10300: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

m68k: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

m32r: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

frv: convert frv to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Richard Henderson <rth@twiddle.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xtensa: convert to use __HEAD and HEAD_TEXT macros.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Add new HEAD_TEXT_SECTION macro.

This patch is preparation for replacing all uses of ".head.text" or
".text.head" in the kernel with macros, so that the section name can
later be changed without having to touch a lot of the kernel.

Since some linker scripts do more complex things than referencing
HEAD_TEXT, we add a HEAD_TEXT_SECTION macro that just contains the
actual name.

I've defined HEAD_TEXT_SECTION in a new header,
include/linux/section-names.h, so that this section name only needs to
appear in one place. I anticipate creating similar macro structures
for a number of other section names.

The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections. This requires renaming all magic sections with names
of the form ".text.foo".

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

PM/Hibernate: Fix waiting for image device to appear on resume

Commit c751085943362143f84346d274e0011419c84202 ("PM/Hibernate: Wait for
SCSI devices scan to complete during resume") added a call to
scsi_complete_async_scans() to software_resume(), so that it waited for
the SCSI scanning to complete, but the call was added at a wrong place.

Namely, it should have been added after wait_for_device_probe(), which
is called only if the image partition hasn't been specified yet. Also,
it's reasonable to check if the image partition is present and only wait
for the device probing and SCSI scanning to complete if it is not the
case.

Additionally, since noresume is checked right at the beginning of
software_resume() and the function returns immediately if it's set, it
doesn't make sense to check it once again later.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ROMFS: Advance destination buffer pointer when reading from a blockdev

RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call). Without this, all the data is copied to the beginning of
the output buffer.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ROMFS: romfs_lookup() shouldn't be doing a partial name comparison

romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp(). If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib: find_last_bit.o needed by a module only, move it from lib to obj

Currently, although find_last_bit is EXPORTed, it is statically linked
with the kernel and is referenced only under CONFIG_SMP.

When CONFIG_SMP is undefined and find_last_bit is referenced only by
modules, linking fails with:

ERROR: "find_last_bit" [fs/nfs/nfs.ko] undefined!

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CacheFiles: Fix the documentation to use the correct credential pointer names

Adjust the CacheFiles documentation to use the correct names of the credential
pointers in task_struct.

The documentation was using names from the old versions of the credentials
patches.

Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

virtio-rng: Remove false BUG for spurious callbacks

The virtio-rng drivers checks for spurious callbacks. Since
callbacks can be implemented via shared interrupts (e.g. PCI) this
could lead to guest kernel oopses with lots of virtio devices.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: add parameter to microblaze_read()
  microblaze: Use CFLAGS_KERNEL instead of CFLAGS
  microblaze: Add STATE_SAVE_ARG_SPACE for noMMU kernel too
  microblaze: Do not check use_dcache
  microblaze: Do not use PVR configuration for broken MB version
  microblaze: Fix USR1/2 pvr printing message
  microblaze: iowrite upon timeout
  microblaze: Correspond CONFIG...PCMP in Makefile/Kconfig
  microblaze: Remove redundant variable
  microblaze: Move start_thread to process.c
  microblaze: Add missing preadv and pwritev syscalls
  microblaze: Add missing declaration for die and _exception func
  microblaze: Remove sparse error in traps.c
  microblaze: Move task_pt_regs up
  microblaze: Rename kernel_mode to pt_mode in pt_regs
  microblaze: Remove uncache shadow condition
  microblaze: Remove while(1) loop from show_regs function
  microblaze: Remove unneded per cpu SYSCALL_SAVE variable

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
  ACPI, i915: Register ACPI video even when not modesetting
  Revert "ACPICA: delete check for AML access to port 0x81-83"
  I/O port protection: update for windows compatibility.
  sony-laptop: always try to unblock rfkill on load
  sony-laptop: fix bogus error message display on resume
  ACPI: EC: Fix ACPI EC resume non-query interrupt message
  sony-laptop: SNC input event 38 fix
  sony-laptop: SNC 127 Initialization Fix
  sony-laptop: Duplicate SNC 127 Event Fix
  ACPI: prevent processor.max_cstate=0 boot crash
  ACPI/hpet: prevent boot hang when hpet=force used on ICH-4M
  ACPI: delete obsolete "bus master activity" proc field
  ACPI: idle: mark_tsc_unstable() at init-time, not run-time
  ACPI: add /sys/firmware/acpi/interrupts/sci_not counter
  ACPI video: fix an error when the brightness levels on AC and on Battery are same
  acpi-cpufreq: Do not let get_measured perf depend on internal variable
  acpi-cpufreq: style-only: add parens to math expression
  acpi-cpufreq: Cleanup: Use printk_once
  x86, acpi_cpufreq: Fix the NULL pointer dereference in get_measured_perf
  thinkpad-acpi: bump up version to 0.23
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: update the default config for the ColdFire 5407C3 board
  m68knommu: update the default config for the ColdFire 5307C3 board
  m68knommu: update the default config for the ColdFire 5257EVB board
  m68knommu: update the default config for the ColdFire 5249EVB.
  m68knommu: add a defconfig for the ColdFire M5272C3 board
  m68knommu: update the defconfig for the ColdFire 5208evb board
  m68knommu: fix DMA support for ColdFire
  m68knommu: remove unused kernel stats offsets
  m68knommu: fix missing .data.cacheline_aligned section
  m68knommu: Fixed GPIO pin initialization for CONFIG_M5271 FEC.

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix potential inode allocation soft lockup in Orlov allocator
  ext4: Make the extent validity check more paranoid
  jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  ext4: really print the find_group_flex fallback warning only once

Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5460/1: Orion: reduce namespace pollution
  [ARM] 5458/1: pcmcia: pxa2xx-sharpsl: check if we do have Scoop config
  [ARM] 5457/1: mach-imx gpio buildfix
  [ARM] 5456/1: add sys_preadv and sys_pwritev
  [ARM] pxa/pcm990: start external GPIOs immediately after built-in ones
  [ARM] pxa/palm27x: General fix for Palm27x aSoC driver
  [ARM] pxa/mioa701: use GPIO95 as AC97 reset line
  [ARM] pxa: merge AC97 platform data structures
  [ARM] pxa/magician: remove un-necessary #include of pxa-regs.h and hardware.h

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
  USB: pwc : do not pass stack allocated buffers to USB core.
  USB: otg: Fix bug on remove path without transceiver
  USB: correct error handling in cdc-wdm
  USB: removal of tty->low_latency hack dating back to the old serial code
  USB: serial: sierra driver bug fix for composite interface
  USB: gadget: omap_udc uses platform_driver_probe()
  USB: ci13xxx_udc: fix build error
  USB: musb: Prevent multiple includes of musb.h
  USB: pass mem_flags to dma_alloc_coherent
  USB: g_file_storage: fix use-after-free bug when closing files
  USB: ehci-sched.c: EHCI SITD scheduling bugfix
  USB: fix mos7840 problem with minor numbers
  USB: mos7840: add new device id
  USB: musb: fix build when !CONFIG_PM
  USB: musb: Remove my email address from few musb related drivers
  USB: Gadget: MIPS CI13xxx UDC bugfixes
  USB: Unusual Device support for Gold MP3 Player Energy
  USB: serial: fix lifetime and locking problems

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Larger buffer for encrypted symlink targets
  eCryptfs: Lock lower directory inode mutex during lookup
  eCryptfs: Remove ecryptfs_unlink_sigs warnings
  eCryptfs: Fix data corruption when using ecryptfs_passthrough
  eCryptfs: Print FNEK sig properly in /proc/mounts
  eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev()
  eCryptfs: Copy lower inode attrs before dentry instantiation

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Update defconfigs for 2.6.30-rc3
  m68k,m68knommu: Wire up preadv and pwritev
  scsi: a4000 - Correct driver unregistration in case of failure

Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] update default configuration.
  [S390] omit frame pointers on s390 when possible
  [S390] Use tape_generic_offline directly.
  [S390] /proc/stat idle field for idle cpus
  [S390] appldata: avoid deadlock with appldata_mem
  [S390] ipl: fix compile breakage

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Ensure that the inode goal block settings are updated
  GFS2: Fix bug in block allocation
  bitops: Add __ffs64 bitop

Merge branch 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Unregister cpufreq notifier on unload
  KVM: x86: release time_page on vcpu destruction
  KVM: Fix overlapping check for memory slots
  KVM: MMU: disable global page optimization
  KVM: ia64: fix locking order entering guest
  KVM: MMU: Fix off-by-one calculating large page count

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  MAINTAINERS: update IDE entry
  palm_bk3710: palm_bk3710_udmatimings[] CodingStyle fixup
  palm_bk3710: those registers/bitfields don't exist
  mediabay: fix build for CONFIG_BLOCK=n
  ide: Stop disks on reboot for laptop which cuts power
  ide-cd: fix kernel crash on hppa regression
  palm_bk3710: UDMA performance fix

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Add quirk for Packard Bell RS65
  [ALSA] intel8x0: another attempt to fix ac97_clock measure routine
  [ALSA] ac97_codec: increase timeout for analog subsections
  ALSA: hda - Add quirks for Realtek codecs
  ALSA: hda - Fix alc662_init_verbs
  ALSA: keywest: Convert to new-style i2c driver
  ALSA: AOA: Convert onyx and tas codecs to new-style i2c drivers
  ALSA: Atiixp: Add SSID for mute_led quirk (unknown HP model)
  ALSA: us122l: add snd_us122l_free()
  ASoC: Fix warning in wm9705
  ASoC: OMAP: Update contact addresses
  ASoC: pxa-ssp: Don't use SSCR0_SerClkDiv and SSCR0_SCR
  ALSA: us122l: Fix signedness in comparisions