Dan Carpenter [Thu, 29 Mar 2012 18:57:08 +0000 (20:57 +0200)]
blkcg: change a spin_lock() to spin_lock_irq()
Smatch complains that we re-enable IRQs twice. It looks like we forgot
to disable them here on the spin_trylock() failure path. This was added
in 9f13ef678e "blkcg: use double locking instead of RCU for blkg
synchronization".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>` Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Mon, 19 Mar 2012 22:10:58 +0000 (15:10 -0700)]
cfq: don't use icq_get_changed()
cfq caches the associated cfqq's for a given cic. The cache needs to
be flushed if the cic's ioprio or blkcg has changed. It is currently
done by requiring the changing action to set the respective
ICQ_*_CHANGED bit in the icq and testing it from cfq_set_request(),
which involves iterating through all the affected icqs.
All cfq wants to know is whether ioprio and/or blkcg have changed
since the last flush and can be easily achieved by just remembering
the current ioprio and blkcg ID in cic.
This patch adds cic->{ioprio|blkcg_id}, updates all ioprio users to
use the remembered value instead, and updates cfq_set_request() path
such that, instead of using icq_get_changed(), the current values are
compared against the remembered ones and trigger appropriate flush
action if not. Condition tests are moved inside both _changed
functions which are now named check_ioprio_changed() and
check_blkcg_changed().
ioprio.h::task_ioprio*() can't be used anymore and replaced with
open-coded IOPRIO_CLASS_NONE case in cfq_async_queue_prio().
Tejun Heo [Mon, 19 Mar 2012 22:10:57 +0000 (15:10 -0700)]
cfq: pass around cfq_io_cq instead of io_context
Now that io_cq is managed by block core and guaranteed to exist for
any in-flight request, it is easier and carries more information to
pass around cfq_io_cq than io_context.
This patch updates cfq_init_prio_data(), cfq_find_alloc_queue() and
cfq_get_queue() to take @cic instead of @ioc. This change removes a
duplicate cfq_cic_lookup() from cfq_find_alloc_queue().
This change enables the use of cic-cached ioprio in the next patch.
Tejun Heo [Thu, 8 Mar 2012 18:54:00 +0000 (10:54 -0800)]
blkcg: remove blkio_group->stats_lock
With recent plug merge updates, all non-percpu stat updates happen
under queue_lock making stats_lock unnecessary to synchronize stat
updates. The only synchronization necessary is stat reading, which
can be done using u64_stats_sync instead.
This patch removes blkio_group->stats_lock and adds
blkio_group_stats->syncp for reader synchronization.
Tejun Heo [Thu, 8 Mar 2012 18:53:58 +0000 (10:53 -0800)]
blkcg: simplify stat reset
blkiocg_reset_stats() implements stat reset for blkio.reset_stats
cgroupfs file. This feature is very unconventional and something
which shouldn't have been merged. It's only useful when there's only
one user or tool looking at the stats. As soon as multiple users
and/or tools are involved, it becomes useless as resetting disrupts
other usages. There are very good reasons why all other stats expect
readers to read values at the start and end of a period and subtract
to determine delta over the period.
The implementation is rather complex - some fields shouldn't be
cleared and it saves some fields, resets whole and restores for some
reason. Reset of percpu stats is also racy. The comment points to
64bit store atomicity for the reason but even without that stores for
zero can simply race with other CPUs doing RMW and get clobbered.
Simplify reset by
* Clear selectively instead of resetting and restoring.
* Grouping debug stat fields to be reset and using memset() over them.
Tejun Heo [Thu, 8 Mar 2012 18:53:57 +0000 (10:53 -0800)]
blkcg: don't use percpu for merged stats
With recent plug merge updates, merged stats are no longer called for
plug merges and now only updated while holding queue_lock. As
stats_lock is scheduled to be removed, there's no reason to use percpu
for merged stats. Don't use percpu for merged stats.
Vivek Goyal [Thu, 8 Mar 2012 18:53:56 +0000 (10:53 -0800)]
blkcg: alloc per cpu stats from worker thread in a delayed manner
Current per cpu stat allocation assumes GFP_KERNEL allocation flag. But in
IO path there are times when we want GFP_NOIO semantics. As there is no
way to pass the allocation flags to alloc_percpu(), this patch delays the
allocation of stats using a worker thread.
v2-> tejun suggested following changes. Changed the patch accordingly.
- move alloc_node location in structure
- reduce the size of names of some of the fields
- Reduce the scope of locking of alloc_list_lock
- Simplified stat_alloc_fn() by allocating stats for all
policies in one go and then assigning these to a group.
v3 -> Andrew suggested to put some comments in the code. Also raised
concerns about trying to allocate infinitely in case of allocation
failure. I have changed the logic to sleep for 10ms before retrying.
That should take care of non-preemptible UP kernels.
v4 -> Tejun had more suggestions.
- drop list_for_each_entry_all()
- instead of msleep() use queue_delayed_work()
- Some cleanups realted to more compact coding.
v5-> tejun suggested more cleanups leading to more compact code.
tj: - Relocated pcpu_stats into blkio_stat_alloc_fn().
- Minor comment update.
- This also fixes suspicious RCU usage warning caused by invoking
cgroup_path() from blkg_alloc() without holding RCU read lock.
Now that blkg_alloc() doesn't require sleepable context, RCU
read lock from blkg_lookup_create() is maintained throughout
blkg_alloc().
Tejun Heo [Mon, 5 Mar 2012 21:15:29 +0000 (13:15 -0800)]
block: make blk-throttle preserve the issuing task on delayed bios
Make blk-throttle call bio_associate_current() on bios being delayed
such that they get issued to block layer with the original io_context.
This allows stacking blk-throttle and cfq-iosched propio policies.
bios will always be issued with the correct ioc and blkcg whether it
gets delayed by blk-throttle or not.
Tejun Heo [Mon, 5 Mar 2012 21:15:28 +0000 (13:15 -0800)]
block: make block cgroup policies follow bio task association
Implement bio_blkio_cgroup() which returns the blkcg associated with
the bio if exists or %current's blkcg, and use it in blk-throttle and
cfq-iosched propio. This makes both cgroup policies honor task
association for the bio instead of always assuming %current.
As nobody is using bio_set_task() yet, this doesn't introduce any
behavior change.
Tejun Heo [Mon, 5 Mar 2012 21:15:27 +0000 (13:15 -0800)]
block: implement bio_associate_current()
IO scheduling and cgroup are tied to the issuing task via io_context
and cgroup of %current. Unfortunately, there are cases where IOs need
to be routed via a different task which makes scheduling and cgroup
limit enforcement applied completely incorrectly.
For example, all bios delayed by blk-throttle end up being issued by a
delayed work item and get assigned the io_context of the worker task
which happens to serve the work item and dumped to the default block
cgroup. This is double confusing as bios which aren't delayed end up
in the correct cgroup and makes using blk-throttle and cfq propio
together impossible.
Any code which punts IO issuing to another task is affected which is
getting more and more common (e.g. btrfs). As both io_context and
cgroup are firmly tied to task including userland visible APIs to
manipulate them, it makes a lot of sense to match up tasks to bios.
This patch implements bio_associate_current() which associates the
specified bio with %current. The bio will record the associated ioc
and blkcg at that point and block layer will use the recorded ones
regardless of which task actually ends up issuing the bio. bio
release puts the associated ioc and blkcg.
It grabs and remembers ioc and blkcg instead of the task itself
because task may already be dead by the time the bio is issued making
ioc and blkcg inaccessible and those are all block layer cares about.
elevator_set_req_fn() is updated such that the bio elvdata is being
allocated for is available to the elevator.
This doesn't update block cgroup policies yet. Further patches will
implement the support.
-v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in
rq_ioc() to fix build breakage.
Tejun Heo [Mon, 5 Mar 2012 21:15:26 +0000 (13:15 -0800)]
block: add io_context->active_ref
Currently ioc->nr_tasks is used to decide two things - whether an ioc
is done issuing IOs and whether it's shared by multiple tasks. This
patch separate out the first into ioc->active_ref, which is acquired
and released using {get|put}_io_context_active() respectively.
This will be used to associate bio's with a given task. This patch
doesn't introduce any visible behavior change.
Tejun Heo [Mon, 5 Mar 2012 21:15:25 +0000 (13:15 -0800)]
block: ioc_task_link() can't fail
ioc_task_link() is used to share %current's ioc on clone. If
%current->io_context is set, %current is guaranteed to have refcount
on the ioc and, thus, ioc_task_link() can't fail.
Replace error checking in ioc_task_link() with WARN_ON_ONCE() and make
it just increment refcount and nr_tasks.
Tejun Heo [Mon, 5 Mar 2012 21:15:24 +0000 (13:15 -0800)]
block: interface update for ioc/icq creation functions
Make the following interface updates to prepare for future ioc related
changes.
* create_io_context() returning ioc only works for %current because it
doesn't increment ref on the ioc. Drop @task parameter from it and
always assume %current.
* Make create_io_context_slowpath() return 0 or -errno and rename it
to create_task_io_context().
* Make ioc_create_icq() take @ioc as parameter instead of assuming
that of %current. The caller, get_request(), is updated to create
ioc explicitly and then pass it into ioc_create_icq().
Tejun Heo [Mon, 5 Mar 2012 21:15:23 +0000 (13:15 -0800)]
block: restructure get_request()
get_request() is structured a bit unusually in that failure path is
inlined in the usual flow with goto labels atop and inside it.
Relocate the error path to the end of the function.
This is to prepare for icq handling changes in get_request() and
doesn't introduce any behavior change.
Tejun Heo [Mon, 5 Mar 2012 21:15:22 +0000 (13:15 -0800)]
blkcg: drop unnecessary RCU locking
Now that blkg additions / removals are always done under both q and
blkcg locks, the only places RCU locking is necessary are
blkg_lookup[_create]() for lookup w/o blkcg lock. This patch drops
unncessary RCU locking replacing it with plain blkcg locking as
necessary.
* blkiocg_pre_destroy() already perform proper locking and don't need
RCU. Dropped.
* blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read
lock. This isn't a hot path.
* Now unnecessary synchronize_rcu() from queue exit paths removed.
This makes q->nr_blkgs unnecessary. Dropped.
* RCU annotation on blkg->q removed.
-v2: Vivek pointed out that blkg_lookup_create() still needs to be
called under rcu_read_lock(). Updated.
-v3: After the update, stats_lock locking in blkio_read_blkg_stats()
shouldn't be using _irq variant as it otherwise ends up enabling
irq while blkcg->lock is locked. Fixed.
Tejun Heo [Mon, 5 Mar 2012 21:15:21 +0000 (13:15 -0800)]
blkcg: use double locking instead of RCU for blkg synchronization
blkgs are chained from both blkcgs and request_queues and thus
subjected to two locks - blkcg->lock and q->queue_lock. As both blkcg
and q can go away anytime, locking during removal is tricky. It's
currently solved by wrapping removal inside RCU, which makes the
synchronization complex. There are three locks to worry about - the
outer RCU, q lock and blkcg lock, and it leads to nasty subtle
complications like conditional synchronize_rcu() on queue exit paths.
For all other paths, blkcg lock is naturally nested inside q lock and
the only exception is blkcg removal path, which is a very cold path
and can be implemented as clumsy but conceptually-simple reverse
double lock dancing.
This patch updates blkg removal path such that blkgs are removed while
holding both q and blkcg locks, which is trivial for request queue
exit path - blkg_destroy_all(). The blkcg removal path,
blkiocg_pre_destroy(), implements reverse double lock dancing
essentially identical to ioc_release_fn().
This simplifies blkg locking - no half-dead blkgs to worry about. Now
unnecessary RCU annotations will be removed by the next patch.
Tejun Heo [Mon, 5 Mar 2012 21:15:20 +0000 (13:15 -0800)]
blkcg: unify blkg's for blkcg policies
Currently, blkg is per cgroup-queue-policy combination. This is
unnatural and leads to various convolutions in partially used
duplicate fields in blkg, config / stat access, and general management
of blkgs.
This patch make blkg's per cgroup-queue and let them serve all
policies. blkgs are now created and destroyed by blkcg core proper.
This will allow further consolidation of common management logic into
blkcg core and API with better defined semantics and layering.
As a transitional step to untangle blkg management, elvswitch and
policy [de]registration, all blkgs except the root blkg are being shot
down during elvswitch and bypass. This patch adds blkg_root_update()
to update root blkg in place on policy change. This is hacky and racy
but should be good enough as interim step until we get locking
simplified and switch over to proper in-place update for all blkgs.
-v2: Root blkgs need to be updated on elvswitch too and blkg_alloc()
comment wasn't updated according to the function change. Fixed.
Both pointed out by Vivek.
-v3: v2 updated blkg_destroy_all() to invoke update_root_blkg_pd() for
all policies. This freed root pd during elvswitch before the
last queue finished exiting and led to oops. Directly invoke
update_root_blkg_pd() only on BLKIO_POLICY_PROP from
cfq_exit_queue(). This also is closer to what will be done with
proper in-place blkg update. Reported by Vivek.
Tejun Heo [Mon, 5 Mar 2012 21:15:19 +0000 (13:15 -0800)]
blkcg: let blkcg core manage per-queue blkg list and counter
With the previous patch to move blkg list heads and counters to
request_queue and blkg, logic to manage them in both policies are
almost identical and can be moved to blkcg core.
This patch moves blkg link logic into blkg_lookup_create(), implements
common blkg unlink code in blkg_destroy(), and updates
blkg_destory_all() so that it's policy specific and can skip root
group. The updated blkg_destroy_all() is now used to both clear queue
for bypassing and elv switching, and release all blkgs on q exit.
This patch introduces a race window where policy [de]registration may
race against queue blkg clearing. This can only be a problem on cfq
unload and shouldn't be a real problem in practice (and we have many
other places where this race already exists). Future patches will
remove these unlikely races.
Tejun Heo [Mon, 5 Mar 2012 21:15:18 +0000 (13:15 -0800)]
blkcg: move per-queue blkg list heads and counters to queue and blkg
Currently, specific policy implementations are responsible for
maintaining list and number of blkgs. This duplicates code
unnecessarily, and hinders factoring common code and providing blkcg
API with better defined semantics.
After this patch, request_queue hosts list heads and counters and blkg
has list nodes for both policies. This patch only relocates the
necessary fields and the next patch will actually move management code
into blkcg core.
Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to
have 2 elements. This is to avoid include dependency and will be
removed by the next patch.
This patch doesn't introduce any behavior change.
-v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed
as pointed out by Vivek.
Tejun Heo [Mon, 5 Mar 2012 21:15:17 +0000 (13:15 -0800)]
blkcg: don't use blkg->plid in stat related functions
blkg is scheduled to be unified for all policies and thus there won't
be one-to-one mapping from blkg to policy. Update stat related
functions to take explicit @pol or @plid arguments and not use
blkg->plid.
This is painful for now but most of specific stat interface functions
will be replaced with a handful of generic helpers.
Tejun Heo [Mon, 5 Mar 2012 21:15:16 +0000 (13:15 -0800)]
blkcg: make blkg->pd an array and move configuration and stats into it
To prepare for unifying blkgs for different policies, make blkg->pd an
array with BLKIO_NR_POLICIES elements and move blkg->conf, ->stats,
and ->stats_cpu into blkg_policy_data.
This patch doesn't introduce any functional difference.
Tejun Heo [Mon, 5 Mar 2012 21:15:15 +0000 (13:15 -0800)]
blkcg: move refcnt to blkcg core
Currently, blkcg policy implementations manage blkg refcnt duplicating
mostly identical code in both policies. This patch moves refcnt to
blkg and let blkcg core handle refcnt and freeing of blkgs.
* cfq blkgs now also get freed via RCU.
* cfq blkgs lose RB_EMPTY_ROOT() sanity check on blkg free. If
necessary, we can add blkio_exit_group_fn() to resurrect this.
Tejun Heo [Mon, 5 Mar 2012 21:15:14 +0000 (13:15 -0800)]
blkcg: let blkcg core handle policy private data allocation
Currently, blkg's are embedded in private data blkcg policy private
data structure and thus allocated and freed by policies. This leads
to duplicate codes in policies, hinders implementing common part in
blkcg core with strong semantics, and forces duplicate blkg's for the
same cgroup-q association.
This patch introduces struct blkg_policy_data which is a separate data
structure chained from blkg. Policies specifies the amount of private
data it needs in its blkio_policy_type->pdata_size and blkcg core
takes care of allocating them along with blkg which can be accessed
using blkg_to_pdata(). blkg can be determined from pdata using
pdata_to_blkg(). blkio_alloc_group_fn() method is accordingly updated
to blkio_init_group_fn().
For consistency, tg_of_blkg() and cfqg_of_blkg() are replaced with
blkg_to_tg() and blkg_to_cfqg() respectively, and functions to map in
the reverse direction are added.
Except that policy specific data now lives in a separate data
structure from blkg, this patch doesn't introduce any functional
difference.
This will be used to unify blkg's for different policies.
Tejun Heo [Mon, 5 Mar 2012 21:15:13 +0000 (13:15 -0800)]
blkcg: clear all request_queues on blkcg policy [un]registrations
Keep track of all request_queues which have blkcg initialized and turn
on bypass and invoke blkcg_clear_queue() on all before making changes
to blkcg policies.
This is to prepare for moving blkg management into blkcg core. Note
that this uses more brute force than necessary. Finer grained shoot
down will be implemented later and given that policy [un]registration
almost never happens on running systems (blk-throtl can't be built as
a module and cfq usually is the builtin default iosched), this
shouldn't be a problem for the time being.
Tejun Heo [Mon, 5 Mar 2012 21:15:12 +0000 (13:15 -0800)]
blkcg: add blkcg_{init|drain|exit}_queue()
Currently block core calls directly into blk-throttle for init, drain
and exit. This patch adds blkcg_{init|drain|exit}_queue() which wraps
the blk-throttle functions. This is to give more control and
visiblity to blkcg core layer for proper layering. Further patches
will add logic common to blkcg policies to the functions.
While at it, collapse blk_throtl_release() into blk_throtl_exit().
There's no reason to keep them separate.
Tejun Heo [Mon, 5 Mar 2012 21:15:11 +0000 (13:15 -0800)]
blkcg: let blkio_group point to blkio_cgroup directly
Currently, blkg points to the associated blkcg via its css_id. This
unnecessarily complicates dereferencing blkcg. Let blkg hold a
reference to the associated blkcg and point directly to it and disable
css_id on blkio_subsys.
This change requires splitting blkiocg_destroy() into
blkiocg_pre_destroy() and blkiocg_destroy() so that all blkg's can be
destroyed and all the blkcg references held by them dropped during
cgroup removal.
Vivek Goyal [Mon, 5 Mar 2012 21:15:10 +0000 (13:15 -0800)]
blkcg: skip blkg printing if q isn't associated with disk
blk-cgroup printing code currently assumes that there is a device/disk
associated with every queue in the system, but modules like floppy,
can instantiate request queues without registering disk which can lead
to oops.
Skip the queue/blkg which don't have dev/disk associated with them.
-tj: Factored out backing_dev_info check into blkg_dev_name().
Tejun Heo [Mon, 5 Mar 2012 21:15:09 +0000 (13:15 -0800)]
blkcg: kill the mind-bending blkg->dev
blkg->dev is dev_t recording the device number of the block device for
the associated request_queue. It is used to identify the associated
block device when printing out configuration or stats.
This is redundant to begin with. A blkg is an association between a
cgroup and a request_queue and it of course is possible to reach
request_queue from blkg and synchronization conventions are in place
for safe q dereferencing, so this shouldn't be necessary from the
beginning. Furthermore, it's initialized by sscanf()ing the device
name of backing_dev_info. The mind boggles.
Anyways, if blkg is visible under rcu lock, we *know* that the
associated request_queue hasn't gone away yet and its bdi is
registered and alive - blkg can't be created for request_queue which
hasn't been fully initialized and it can't go away before blkg is
removed.
Let stat and conf read functions get device name from
blkg->q->backing_dev_info.dev and pass it down to printing functions
and remove blkg->dev.
Tejun Heo [Mon, 5 Mar 2012 21:15:08 +0000 (13:15 -0800)]
blkcg: kill blkio_policy_node
Now that blkcg configuration lives in blkg's, blkio_policy_node is no
longer necessary. Kill it.
blkio_policy_parse_and_set() now fails if invoked for missing device
and functions to print out configurations are updated to print from
blkg's.
cftype_blkg_same_policy() is dropped along with other policy functions
for consistency. Its one line is open coded in the only user -
blkio_read_blkg_stats().
-v2: Update to reflect the retry-on-bypass logic change of the
previous patch.
Tejun Heo [Mon, 5 Mar 2012 21:15:07 +0000 (13:15 -0800)]
blkcg: don't allow or retain configuration of missing devices
blkcg is very peculiar in that it allows setting and remembering
configurations for non-existent devices by maintaining separate data
structures for configuration.
This behavior is completely out of the usual norms and outright
confusing; furthermore, it uses dev_t number to match the
configuration to devices, which is unpredictable to begin with and
becomes completely unuseable if EXT_DEVT is fully used.
It is wholely unnecessary - we already have fully functional userland
mechanism to program devices being hotplugged which has full access to
device identification, connection topology and filesystem information.
Add a new struct blkio_group_conf which contains all blkcg
configurations to blkio_group and let blkio_group, which can be
created iff the associated device exists and is removed when the
associated device goes away, carry all configurations.
Note that, after this patch, all newly created blkg's will always have
the default configuration (unlimited for throttling and blkcg's weight
for propio).
This patch makes blkio_policy_node meaningless but doesn't remove it.
The next patch will.
-v2: Updated to retry after short sleep if blkg lookup/creation failed
due to the queue being temporarily bypassed as indicated by
-EBUSY return. Pointed out by Vivek.
Tejun Heo [Mon, 5 Mar 2012 21:15:06 +0000 (13:15 -0800)]
blkcg: factor out blkio_group creation
Currently both blk-throttle and cfq-iosched implement their own
blkio_group creation code in throtl_get_tg() and cfq_get_cfqg(). This
patch factors out the common code into blkg_lookup_create(), which
returns ERR_PTR value so that transitional failures due to queue
bypass can be distinguished from other failures.
* New plkio_policy_ops methods blkio_alloc_group_fn() and
blkio_link_group_fn added. Both are transitional and will be
removed once the blkg management code is fully moved into
blk-cgroup.c.
* blkio_alloc_group_fn() allocates policy-specific blkg which is
usually a larger data structure with blkg as the first entry and
intiailizes it. Note that initialization of blkg proper, including
percpu stats, is responsibility of blk-cgroup proper.
Note that default config (weight, bps...) initialization is done
from this method; otherwise, we end up violating locking order
between blkcg and q locks via blkcg_get_CONF() functions.
* blkio_link_group_fn() is called under queue_lock and responsible for
linking the blkg to the queue. blkcg side is handled by blk-cgroup
proper.
* The common blkg creation function is named blkg_lookup_create() and
blkiocg_lookup_group() is renamed to blkg_lookup() for consistency.
Also, throtl / cfq related functions are similarly [re]named for
consistency.
This simplifies blkcg policy implementations and enables further
cleanup.
-v2: Vivek noticed that blkg_lookup_create() incorrectly tested
blk_queue_dead() instead of blk_queue_bypass() leading a user of
the function ending up creating a new blkg on bypassing queue.
This is a bug introduced while relocating bypass patches before
this one. Fixed.
-v3: ERR_PTR patch folded into this one. @for_root added to
blkg_lookup_create() to allow creating root group on a bypassed
queue during elevator switch.
Tejun Heo [Mon, 5 Mar 2012 21:15:05 +0000 (13:15 -0800)]
blkcg: use the usual get blkg path for root blkio_group
For root blkg, blk_throtl_init() was using throtl_alloc_tg()
explicitly and cfq_init_queue() was manually initializing embedded
cfqd->root_group, adding unnecessarily different code paths to blkg
handling.
Make both use the usual blkio_group get functions - throtl_get_tg()
and cfq_get_cfqg() - for the root blkio_group too. Note that
blk_throtl_init() callsite is pushed downwards in
blk_alloc_queue_node() so that @q is sufficiently initialized for
throtl_get_tg().
This simplifies root blkg handling noticeably for cfq and will allow
further modularization of blkcg API.
-v2: Vivek pointed out that using cfq_get_cfqg() won't work if
CONFIG_CFQ_GROUP_IOSCHED is disabled. Fix it by factoring out
initialization of base part of cfqg into cfq_init_cfqg_base() and
alloc/init/free explicitly if !CONFIG_CFQ_GROUP_IOSCHED.
Tejun Heo [Mon, 5 Mar 2012 21:15:04 +0000 (13:15 -0800)]
blkcg: add blkio_policy[] array and allow one policy per policy ID
Block cgroup policies are maintained in a linked list and,
theoretically, multiple policies sharing the same policy ID are
allowed.
This patch temporarily restricts one policy per plid and adds
blkio_policy[] array which indexes registered policy types by plid.
Both the restriction and blkio_policy[] array are transitional and
will be removed once API cleanup is complete.
Tejun Heo [Mon, 5 Mar 2012 21:15:03 +0000 (13:15 -0800)]
blkcg: use q and plid instead of opaque void * for blkio_group association
blkgio_group is association between a block cgroup and a queue for a
given policy. Using opaque void * for association makes things
confusing and hinders factoring of common code. Use request_queue *
and, if necessary, policy id instead.
Tejun Heo [Mon, 5 Mar 2012 21:15:02 +0000 (13:15 -0800)]
blkcg: update blkg get functions take blkio_cgroup as parameter
In both blkg get functions - throtl_get_tg() and cfq_get_cfqg(),
instead of obtaining blkcg of %current explicitly, let the caller
specify the blkcg to use as parameter and make both functions hold on
to the blkcg.
This is part of block cgroup interface cleanup and will help making
blkcg API more modular.
Tejun Heo [Mon, 5 Mar 2012 21:15:01 +0000 (13:15 -0800)]
blkcg: move rcu_read_lock() outside of blkio_group get functions
rcu_read_lock() in throtl_get_tb() and cfq_get_cfqg() holds onto
@blkcg while looking up blkg. For API cleanup, the next patch will
make the caller responsible for determining @blkcg to look blkg from
and let them specify it as a parameter. Move rcu read locking out to
the callers to prepare for the change.
-v2: Originally this patch was described as a fix for RCU read locking
bug around @blkg, which Vivek pointed out to be incorrect. It
was from misunderstanding the role of rcu locking as protecting
@blkg not @blkcg. Patch description updated.
Tejun Heo [Mon, 5 Mar 2012 21:15:00 +0000 (13:15 -0800)]
blkcg: shoot down blkio_groups on elevator switch
Elevator switch may involve changes to blkcg policies. Implement
shoot down of blkio_groups.
Combined with the previous bypass updates, the end goal is updating
blkcg core such that it can ensure that blkcg's being affected become
quiescent and don't have any per-blkg data hanging around before
commencing any policy updates. Until queues are made aware of the
policies that applies to them, as an interim step, all per-policy blkg
data will be shot down.
* blk-throtl doesn't need this change as it can't be disabled for a
live queue; however, update it anyway as the scheduled blkg
unification requires this behavior change. This means that
blk-throtl configuration will be unnecessarily lost over elevator
switch. This oddity will be removed after blkcg learns to associate
individual policies with request_queues.
* blk-throtl dosen't shoot down root_tg. This is to ease transition.
Unified blkg will always have persistent root group and not shooting
down root_tg for now eases transition to that point by avoiding
having to update td->root_tg and is safe as blk-throtl can never be
disabled
-v2: Vivek pointed out that group list is not guaranteed to be empty
on return from clear function if it raced cgroup removal and
lost. Fix it by waiting a bit and retrying. This kludge will
soon be removed once locking is updated such that blkg is never
in limbo state between blkcg and request_queue locks.
blk-throtl no longer shoots down root_tg to avoid breaking
td->root_tg.
Also, Nest queue_lock inside blkio_list_lock not the other way
around to avoid introduce possible deadlock via blkcg lock.
-v3: blkcg_clear_queue() repositioned and renamed to
blkg_destroy_all() to increase consistency with later changes.
cfq_clear_queue() updated to check q->elevator before
dereferencing it to avoid NULL dereference on not fully
initialized queues (used by later change).
Tejun Heo [Mon, 5 Mar 2012 21:14:59 +0000 (13:14 -0800)]
block: extend queue bypassing to cover blkcg policies
Extend queue bypassing such that dying queue is always bypassing and
blk-throttle is drained on bypass. With blkcg policies updated to
test blk_queue_bypass() instead of blk_queue_dead(), this ensures that
no bio or request is held by or going through blkcg policies on a
bypassing queue.
This will be used to implement blkg cleanup on elevator switches and
policy changes.
Tejun Heo [Mon, 5 Mar 2012 21:14:58 +0000 (13:14 -0800)]
block: implement blk_queue_bypass_start/end()
Rename and extend elv_queisce_start/end() to
blk_queue_bypass_start/end() which are exported and supports nesting
via @q->bypass_depth. Also add blk_queue_bypass() to test bypass
state.
This will be further extended and used for blkio_group management.
Tejun Heo [Mon, 5 Mar 2012 21:14:57 +0000 (13:14 -0800)]
elevator: make elevator_init_fn() return 0/-errno
elevator_ops->elevator_init_fn() has a weird return value. It returns
a void * which the caller should assign to q->elevator->elevator_data
and %NULL return denotes init failure.
Update such that it returns integer 0/-errno and sets elevator_data
directly as necessary.
This makes the interface more conventional and eases further cleanup.
Tejun Heo [Mon, 5 Mar 2012 21:14:56 +0000 (13:14 -0800)]
elevator: clear auxiliary data earlier during elevator switch
Elevator switch tries hard to keep as much as context until new
elevator is ready so that it can revert to the original state if
initializing the new elevator fails for some reason. Unfortunately,
with more auxiliary contexts to manage, this makes elevator init and
exit paths too complex and fragile.
This patch makes elevator_switch() unregister the current elevator and
flush icq's before start initializing the new one. As we still keep
the old elevator itself, the only difference is that we lose icq's on
rare occassions of switching failure, which isn't critical at all.
Note that this makes explicit elevator parameter to
elevator_init_queue() and __elv_register_queue() unnecessary as they
always can use the current elevator.
This patch enables block cgroup cleanups.
-v2: blk_add_trace_msg() prints elevator name from @new_e instead of
@e->type as the local variable no longer exists. This caused
build failure on CONFIG_BLK_DEV_IO_TRACE.
Tejun Heo [Mon, 5 Mar 2012 21:14:55 +0000 (13:14 -0800)]
cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED
cfq has been registering zeroed blkio_poilcy_cfq if CFQ_GROUP_IOSCHED
is disabled. This fortunately doesn't collide with blk-throtl as
BLKIO_POLICY_PROP is zero but is unnecessary and risky. Just don't
register it if not enabled.
Tejun Heo [Mon, 5 Mar 2012 21:14:54 +0000 (13:14 -0800)]
blkcg: make CONFIG_BLK_CGROUP bool
Block cgroup core can be built as module; however, it isn't too useful
as blk-throttle can only be built-in and cfq-iosched is usually the
default built-in scheduler. Scheduled blkcg cleanup requires calling
into blkcg from block core. To simplify that, disallow building blkcg
as module by making CONFIG_BLK_CGROUP bool.
If building blkcg core as module really matters, which I doubt, we can
revisit it after blkcg API cleanup.
-v2: Vivek pointed out that IOSCHED_CFQ was incorrectly updated to
depend on BLK_CGROUP. Fixed.
Tejun Heo [Tue, 6 Mar 2012 20:24:55 +0000 (21:24 +0100)]
block: blk-throttle should be drained regardless of q->elevator
Currently, blk_cleanup_queue() doesn't call elv_drain_elevator() if
q->elevator doesn't exist; however, bio based drivers don't have
elevator initialized but can still use blk-throttle. This patch moves
q->elevator test inside blk_drain_queue() such that only
elv_drain_elevator() is skipped if !q->elevator.
-v2: loop can have registered queue which has NULL request_fn. Make
sure we don't call into __blk_run_queue() in such cases.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Vivek Goyal <vgoyal@redhat.com>
Fold in bug fix from Vivek.
Alan Stern [Fri, 2 Mar 2012 09:51:00 +0000 (10:51 +0100)]
Block: use a freezable workqueue for disk-event polling
This patch (as1519) fixes a bug in the block layer's disk-events
polling. The polling is done by a work routine queued on the
system_nrt_wq workqueue. Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.
Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.
The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Danny Kukawka [Fri, 2 Mar 2012 09:48:35 +0000 (10:48 +0100)]
drivers/block/DAC960: fix -Wuninitialized warning
Set CommandMailbox with memset before use it. Fix for:
drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’:
arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’
may be used uninitialized in this function [-Wuninitialized]
drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’
was declared here
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Then we unblock events, when they are suppose to be blocked. This can
trigger events related block/genhd.c warnings, but also can crash in
sd_check_events() or other places.
I'm able to reproduce crashes with the following scripts (with
connected usb dongle as sdb disk).
function stop_me()
{
for i in `jobs -p` ; do kill $i 2> /dev/null ; done
exit
}
trap stop_me SIGHUP SIGINT SIGTERM
for ((i = 0; i < 10; i++)) ; do
while true; do fdisk -l $DEV 2>&1 > /dev/null ; done &
done
while true ; do
echo 1 > $ENABLE
sleep 1
echo 0 > $ENABLE
done
</snip>
I use the script to verify patch fixing oops in sd_revalidate_disk
http://marc.info/?l=linux-scsi&m=132935572512352&w=2
Without Jun'ichi Nomura patch titled "Fix NULL pointer dereference in
sd_revalidate_disk" or this one, script easily crash kernel within
a few seconds. With both patches applied I do not observe crash.
Unfortunately after some time (dozen of minutes), script will hung in:
As drawback, I do not teardown on sysfs file create error, because I do
not know how to nullify disk->ev (since it can be used). However add_disk
error handling practically does not exist too, and things will work
without this sysfs file, except events will not be exported to user
space.
Jun'ichi Nomura [Fri, 2 Mar 2012 09:38:33 +0000 (10:38 +0100)]
block: Fix NULL pointer dereference in sd_revalidate_disk
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.
However it ends up calling driver's revalidate_disk without open
and could cause oops.
In the case of SCSI:
process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052
This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Tejun Heo [Wed, 15 Feb 2012 08:45:53 +0000 (09:45 +0100)]
block: exit_io_context() should call elevator_exit_icq_fn()
While updating locking, b2efa05265 "block, cfq: unlink
cfq_io_context's immediately" moved elevator_exit_icq_fn() invocation
from exit_io_context() to the final ioc put. While this doesn't cause
catastrophic failure, it effectively removes task exit notification to
elevator and cause noticeable IO performance degradation with CFQ.
On task exit, CFQ used to immediately expire the slice if it was being
used by the exiting task as no more IO would be issued by the task;
however, after b2efa05265, the notification is lost and disk could sit
idle needlessly, leading to noticeable IO performance degradation for
certain workloads.
This patch renames ioc_exit_icq() to ioc_destroy_icq(), separates
elevator_exit_icq_fn() invocation into ioc_exit_icq() and invokes it
from exit_io_context(). ICQ_EXITED flag is added to avoid invoking
the callback more than once for the same icq.
Walking icq_list from ioc side and invoking elevator callback requires
reverse double locking. This may be better implemented using RCU;
unfortunately, using RCU isn't trivial. e.g. RCU protection would
need to cover request_queue and queue_lock switch on cleanup makes
grabbing queue_lock from RCU unsafe. Reverse double locking should
do, at least for now.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-bisected-by: Shaohua Li <shli@kernel.org>
LKML-Reference: <CANejiEVzs=pUhQSTvUppkDcc2TNZyfohBRLygW5zFmXyk5A-xQ@mail.gmail.com> Tested-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 15 Feb 2012 08:45:52 +0000 (09:45 +0100)]
block: simplify ioc_release_fn()
Reverse double lock dancing in ioc_release_fn() can be simplified by
just using trylock on the queue_lock and back out from ioc lock on
trylock failure. Simplify it.
Tejun Heo [Wed, 15 Feb 2012 08:45:49 +0000 (09:45 +0100)]
block: replace icq->changed with icq->flags
icq->changed was used for ICQ_*_CHANGED bits. Rename it to flags and
access it under ioc->lock instead of using atomic bitops.
ioc_get_changed() is added so that the changed part can be fetched and
cleared as before.
Linus Torvalds [Tue, 14 Feb 2012 23:26:42 +0000 (15:26 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
One small bug fix from Axel plus a fix for a build failure in unrealistic
but commonly built configs which for some reason manage to survive for
an awfully long time in -next without any reports.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix getting voltage in max8649_enable_time()
regulator: Fix mc13xxx regulator modular build (again)
Linus Torvalds [Tue, 14 Feb 2012 23:21:25 +0000 (15:21 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Quoth BenH:
"Here are a few powerpc fixes for 3.3, all pretty trivial. I also
added the patch to define GET_IP/SET_IP so we can use some more
asm-generic goodness."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pseries/eeh: Fix crash when error happens during device probe
powerpc/pseries: Fix partition migration hang in stop_topology_update
powerpc/powernv: Disable interrupts while taking phb->lock
powerpc: Fix WARN_ON in decrementer_check_overflow
powerpc/wsp: Fix IRQ affinity setting
powerpc: Implement GET_IP/SET_IP
powerpc/wsp: Permanently enable PCI class code workaround
Linus Torvalds [Tue, 14 Feb 2012 23:20:50 +0000 (15:20 -0800)]
Merge tag 'mmc-fixes-for-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
MMC fixes for 3.3-rc4:
* The most visible fix here is against a regression introduced in 3.3-rc1
that ran cards in Ultra High Speed mode even when they failed to initialize
in that mode, leading to lower-speed cards failing to mount.
* A lockdep warning introduced in 3.3-rc1 is fixed.
* Various other small driver fixes, most notably for a NULL dereference
when using highmem with dw_mmc.
* tag 'mmc-fixes-for-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: dw_mmc: Fix PIO mode with support of highmem
mmc: atmel-mci: save and restore sdioirq when soft reset is performed
mmc: block: Init ro_lock sysfs attr to fix lockdep warnings
mmc: sh_mmcif: fix late delayed work initialisation
mmc: tmio_mmc: fix card eject during IO with DMA
mmc: core: Fix comparison issue in mmc_compare_ext_csds
mmc: core: Fix PowerOff Notify suspend/resume
mmc: sdhci-pci: set Medfield SDIO as non-removable
mmc: core: add the capability for broken voltage
mmc: core: Fix low speed mmc card detection failure
mmc: esdhc: set the timeout to the max value
mmc: esdhc: add PIO mode support
mmc: core: Ensure clocks are always enabled before host interaction
mmc: of_mmc_spi: fix little endian support
mmc: core: UHS sdio card that fails should not exceed 50MHz
mmc: esdhc: fix errors when booting kernel on Freescale eSDHC version 2.3
Linus Torvalds [Tue, 14 Feb 2012 23:20:11 +0000 (15:20 -0800)]
Merge tag 'stable/for-linus-fixes-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Two fixes for VCPU offlining; One to fix the string format exposed
by the xen-pci[front|back] to conform to the one used in majority of
PCI drivers; Two fixes to make the code more resilient to invalid
configurations.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* tag 'stable/for-linus-fixes-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xenbus_dev: add missing error check to watch handling
xen/pci[front|back]: Use %d instead of %1x for displaying PCI devfn.
xen pvhvm: do not remap pirqs onto evtchns if !xen_have_vector_callback
xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic.
xen/bootup: During bootup suppress XENBUS: Unable to read cpu state
Linus Torvalds [Tue, 14 Feb 2012 17:09:24 +0000 (09:09 -0800)]
Merge tag 'sound-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
sound fixes for 3.3-rc4
Basically all small fixes suited as rc4: a few HD-audio regression fixes,
a stable fix for an old Dell laptop with intel8x0, and a simple fix for
ASoC fsi.
* tag 'sound-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: intel8x0: Fix default inaudible sound on Gateway M520
ALSA: hda - Fix silent speaker output on Acer Aspire 6935
ALSA: hda - Fix initialization of secondary capture source on VT1705
ASoC: fsi: fixup fsi_pointer() calculation method
ALSA: hda - Fix mute-LED VREF value for new HP laptops
Daniel T Chen [Tue, 14 Feb 2012 04:44:22 +0000 (23:44 -0500)]
ALSA: intel8x0: Fix default inaudible sound on Gateway M520
BugLink: https://bugs.launchpad.net/bugs/930842
The reporter states that audio is inaudible by default without muting
'External Amplifier'. Add a quirk to handle his SSID so that changing
the control is not necessary.
Reported-and-tested-by: Benjamin Carlson <elderbubba0810@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Daniel T Chen <crimsun@ubuntu.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Tue, 14 Feb 2012 04:34:44 +0000 (20:34 -0800)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
cifs: don't return error from standard_receive3 after marking response malformed
cifs: request oplock when doing open on lookup
cifs: fix error handling when cifscreds key payload is an error
This updates the sha512 fix so that it doesn't cause excessive stack
usage on i386. This is done by reverting to the original code, and
avoiding the W duplication by moving its initialisation into the loop.
As the underlying code is in fact the one that we have used for years,
I'm pushing this now instead of postponing to the next cycle.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512 - Avoid stack bloat on i386
crypto: sha512 - Use binary and instead of modulus
powerpc/pseries/eeh: Fix crash when error happens during device probe
EEH may happen during a PCI driver probe. If the driver is trying to
access some register in a loop, the EEH code will try to print the
driver name. But the driver pointer in struct pci_dev is not set until
probe returns successfully.
Use a function to test if the device and the driver pointer is NULL
before accessing the driver's name.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian King [Wed, 11 Jan 2012 06:56:04 +0000 (06:56 +0000)]
powerpc/pseries: Fix partition migration hang in stop_topology_update
This fixes a hang that was observed during live partition migration.
Since stop_topology_update must not be called from an interrupt
context, call it earlier in the migration process. The hang observed
can be seen below:
powerpc: Fix WARN_ON in decrementer_check_overflow
We use __get_cpu_var() which triggers a false positive warning
in smp_processor_id() thinking interrupts are enabled (at this
point, they are soft-enabled but hard-disabled).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We call the cache_hwirq_map() function with a linux IRQ number
but it expects a HW irq number. This triggers a BUG on multic-chip
setups in addition to not doing the right thing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/wsp: Permanently enable PCI class code workaround
It appears that on the Chroma card, the class code of the root
complex is still wrong even on DD2 or later chips. This could
be a firmware issue, but that breaks resource allocation so let's
unconditionally fix it up.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: atmel-mci: save and restore sdioirq when soft reset is performed
Sometimes a software reset is needed. Then some registers are saved and
restored but the interrupt mask register is missing. It causes issues
with sdio devices whose interrupts are masked after reset.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: sh_mmcif: fix late delayed work initialisation
If the driver is loaded with a card in the slot, mmc_add_host() will
schedule an immediate card-detection work, which will start IO and wait
for command completion. Usually the kernel first returns to the sh_mmcif
probe function, lets it finish and only then schedules the rescan work.
But sometimes, expecially under heavy system load, the work will be
scheduled immediately before returning to the probe method. In this case
it is important for the driver to be fully prepared for IO. For sh_mmcif
this means, that also the timeout work has to be initialised before
calling mmc_add_host(). It is also better to prepare interrupts
beforehand. Besides, since mmc_add_host() does card-detection itself,
there is no need to do it again immediately afterwards.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
When DMA is in use and the card is ejected during IO, DMA transfers have to
be terminated, otherwise the dmaengine driver fails to operate properly,
when the card is re-inserted.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
Jurgen Heeks [Wed, 1 Feb 2012 12:30:55 +0000 (13:30 +0100)]
mmc: core: Fix comparison issue in mmc_compare_ext_csds
Found this issue during code review. Actually, there are two issues which
both compensate together in lucky case. In unlucky case the bus width
probing might not work as expected.
Girish K S [Tue, 31 Jan 2012 10:14:03 +0000 (15:44 +0530)]
mmc: core: Fix PowerOff Notify suspend/resume
Modified the mmc_poweroff to resume before sending the poweroff
notification command. In sleep mode only AWAKE and RESET commands are
allowed, so before sending the poweroff notification command resume from
sleep mode and then send the notification command.
PowerOff Notify is tested on a Synopsis Designware Host Controller
(eMMC 4.5). The suspend to RAM and resume works fine.
Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Tested-by: Girish K S <girish.shivananjappa@linaro.org> Reviewed-by: Saugata Das <saugata.das@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Jaehoon Chung [Mon, 16 Jan 2012 08:49:01 +0000 (17:49 +0900)]
mmc: core: add the capability for broken voltage
There is an understood mismatch between the voltage the host controller is
set to and the voltage supplied to the card by a fixed voltage regulator.
Teaching the driver to accept the mismatch is overly complicated. Instead
just accept the regulator's voltage.
This patch adds MMC_CAP2_BROKEN_VOLTAGE.
If the voltage didn't satisfy between min_uV and max_uV, try to change
the voltage in core.c. When changing the voltage, maybe use
regulator_set_voltage().
In regulator_set_voltage(), check the below condition.
/* sanity check */
if (!rdev->desc->ops->set_voltage &&
!rdev->desc->ops->set_voltage_sel) {
ret = -EINVAL;
goto out;
}
If some board should use the fixed-regulator, always return -EINVAL.
Then, eMMC didn't initialize always.
So if use a fixed-regulator, we need to add the MMC_CAP2_BROKEN_VOLTAGE.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Jerry Huang [Mon, 16 Jan 2012 06:13:04 +0000 (14:13 +0800)]
mmc: esdhc: set the timeout to the max value
When accessing the card on some FSL platform boards (e.g p2020, p1010,
mpc8536), the following error is reported with the timeout value calculated:
mmc0: Got data interrupt 0x00000020 even though no data operation was
in progress.
mmc0: Got data interrupt 0x00000020 even though no data operation was
in progress.
So we skip the calculation of timeout and use the max value to fix it.
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com> Signed-off-by: Gao Guanhua <B22826@freescale.com> Signed-off-by: Xie Xiaobo <X.Xie@freescale.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Jerry Huang [Mon, 16 Jan 2012 06:13:03 +0000 (14:13 +0800)]
mmc: esdhc: add PIO mode support
For some FSL ESDHC controllers (e.g. P2020E, Rev1.0), the SDHC can not
work on DMA mode because of the hardware bug, so we set a broken dma flag
and use PIO mode.
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com> Signed-off-by: Gao Guanhua <B22826@freescale.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: core: Ensure clocks are always enabled before host interaction
Ensure clocks are always enabled before any interaction with the
host controller driver. This makes sure that there is no race
between host execution and the core layer turning off clocks
in different context with clock gating framework.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Per Forlin <per.forlin@stericsson.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Philip Rakity [Thu, 26 Jan 2012 14:57:10 +0000 (06:57 -0800)]
mmc: core: UHS sdio card that fails should not exceed 50MHz
A UHS sdio card that fails initialization at 1.8v signaling is not in
UHS mode. We cannot use the speed in the the cis to reflect the bus
speed as this is the maxiumum value and will not reflect the fact
that the host is operating at a lower (non uhs) bus speed.
Signed-off-by: Philip Rakity <prakity@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Reviewed-by: Aaron Lu <aaron.lu@amd.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Linus Torvalds [Tue, 14 Feb 2012 00:59:53 +0000 (16:59 -0800)]
Merge tag 'for-linus' of git://github.com/rustyrussell/linux
* tag 'for-linus' of git://github.com/rustyrussell/linux:
module: fix broken isapnp handling in file2alias
module: make module param bint handle nul value
Linus Torvalds [Mon, 13 Feb 2012 22:20:43 +0000 (14:20 -0800)]
Merge tag 'battery-fixes-for-v3.3-rc2' of git://git.infradead.org/users/cbou/battery-urgent
Just a few small fixes for a bunch of drivers. Nothing noteworthy.
* tag 'battery-fixes-for-v3.3-rc2' of git://git.infradead.org/users/cbou/battery-urgent:
lp8727_charger: Add terminating entry for i2c_device_id table
power_supply: Fix modalias for charger-manager
lp8727_chager: Fix permissions on a header file
bq27x00_battery: Fix flag register read
Revert "bq27x00_battery: Fix reporting status value for bq27500 battery"
Linus Torvalds [Mon, 13 Feb 2012 22:19:45 +0000 (14:19 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Two bugfixes in XFS for 3.3: one fix passes KMEM_SLEEP to kmem_realloc
instead of 0, and the other resolves a possible deadlock in xfs quotas.
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: use a normal shrinker for the dquot freelist
xfs: pass KM_SLEEP flag to kmem_realloc() in xlog_recover_add_to_cnt_trans()
Linus Torvalds [Mon, 13 Feb 2012 22:16:07 +0000 (14:16 -0800)]
Merge branch 'omap-fixes-warnings' of git://git.linaro.org/people/rmk/linux-arm
This set of changes are fixing various section mismatch warnings which
look to be completely valid. Primerily, those which are fixed are those
which can cause oopses by manipulation of driver binding via sysfs. For
example: calling code marked __init from driver probe __devinit
functions.
Some of these changes will be reworked at the next merge window when the
underlying reasons are sorted out. In the mean time, I think it's
important to have this fixed for correctness.
Also included in this set are fixes to various error messages in OMAP -
including making them gramatically correct, fixing a few spelling
errors, and more importantly, making them greppable by unwrapping them.
Tony Lindgren has acked all these patches, put them out for testing a
week ago, and I've tested them on the platforms I have.
Linus Torvalds [Mon, 13 Feb 2012 22:15:22 +0000 (14:15 -0800)]
Merge branch 'omap-fixes-urgent' of git://git.linaro.org/people/rmk/linux-arm
This pull request covers the major oopsing issues with OMAP, caused by
the lack of the TWL driver. Even when the TWL driver is not built in,
we shouldn't oops.
* 'omap-fixes-urgent' of git://git.linaro.org/people/rmk/linux-arm:
ARM: omap: fix broken twl-core dependencies and ifdefs
ARM: omap: fix oops in drivers/video/omap2/dss/dpi.c
ARM: omap: fix oops in arch/arm/mach-omap2/vp.c when pmic is not found
Linus Torvalds [Mon, 13 Feb 2012 21:56:14 +0000 (13:56 -0800)]
i387: make irq_fpu_usable() tests more robust
Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.
That *can* be ok, but the tests for when it was ok were somewhat
suspect. We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.
Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.
So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks. Because it seems that somehow
AES-NI is corrupting user FP state. The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.