]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
11 years agorbd: protect read of snapshot sequence number
Josh Durgin [Mon, 5 Dec 2011 18:47:13 +0000 (10:47 -0800)]
rbd: protect read of snapshot sequence number

(cherry picked from commit 403f24d3d51760a8b9368d595fa5f48c309f1a0f)

This is updated whenever a snapshot is added or deleted, and the
snapc pointer is changed with every refresh of the header.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agorbd: don't hold spinlock during messenger flush
Alex Elder [Wed, 4 Apr 2012 18:35:44 +0000 (13:35 -0500)]
rbd: don't hold spinlock during messenger flush

(cherry picked from commit cd9d9f5df6098c50726200d4185e9e8da32785b3)

A recent change made changes to the rbd_client_list be protected by
a spinlock.  Unfortunately in rbd_put_client(), the lock is taken
before possibly dropping the last reference to an rbd_client, and on
the last reference that eventually calls flush_workqueue() which can
sleep.

The problem was flagged by a debug spinlock warning:
    BUG: spinlock wrong CPU on CPU#3, rbd/27814

The solution is to move the spinlock acquisition and release inside
rbd_client_release(), which is the spot where it's really needed for
protecting the removal of the rbd_client from the client list.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: fix messenger retry
Sage Weil [Tue, 10 Jul 2012 18:53:34 +0000 (11:53 -0700)]
libceph: fix messenger retry

(cherry picked from commit 5bdca4e0768d3e0f4efa43d9a2cc8210aeb91ab9)

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time.  This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: flush msgr queue during mon_client shutdown
Sage Weil [Mon, 11 Jun 2012 03:43:56 +0000 (20:43 -0700)]
libceph: flush msgr queue during mon_client shutdown

(cherry picked from commit f3dea7edd3d449fe7a6d402c1ce56a294b985261)
(cherry picked from commit 642c0dbde32f34baa7886e988a067089992adc8f)

We need to flush the msgr workqueue during mon_client shutdown to
ensure that any work affecting our embedded ceph_connection is
finished so that we can be safely destroyed.

Previously, we were flushing the work queue after osd_client
shutdown and before mon_client shutdown to ensure that any osd
connection refs to authorizers are flushed.  Remove the redundant
flush, and document in the comment that the mon_client flush is
needed to cover that case as well.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agorbd: Clear ceph_msg->bio_iter for retransmitted message
Yan, Zheng [Thu, 7 Jun 2012 00:35:55 +0000 (19:35 -0500)]
rbd: Clear ceph_msg->bio_iter for retransmitted message

(cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037)
(cherry picked from commit b132cf4c733f91bb4dd2277ea049243cf16e8b66)

The bug can cause NULL pointer dereference in write_partial_msg_pages

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: use con get/put ops from osd_client
Sage Weil [Fri, 1 Jun 2012 03:22:18 +0000 (20:22 -0700)]
libceph: use con get/put ops from osd_client

(cherry picked from commit 0d47766f14211a73eaf54cab234db134ece79f49)

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 88ed6ea0b295f8e2383d599a04027ec596cdf97b)

11 years agolibceph: osd_client: don't drop reply reference too early
Alex Elder [Mon, 4 Jun 2012 19:43:32 +0000 (14:43 -0500)]
libceph: osd_client: don't drop reply reference too early

(cherry picked from commit ab8cb34a4b2f60281a4b18b1f1ad23bc2313d91b)

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 680584fab05efff732b5ae16ad601ba994d7b505)

11 years agolibceph: fix pg_temp updates
Sage Weil [Mon, 21 May 2012 16:45:23 +0000 (09:45 -0700)]
libceph: fix pg_temp updates

(cherry picked from commit 6bd9adbdf9ca6a052b0b7455ac67b925eb38cfad)

Usually, we are adding pg_temp entries or removing them.  Occasionally they
update.  In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.

Fix by removing the existing entry before inserting a new one.

Fixes http://tracker.newdream.net/issues/2446

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: avoid unregistering osd request when not registered
Sage Weil [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
libceph: avoid unregistering osd request when not registered

(cherry picked from commit 35f9f8a09e1e88e31bd34a1e645ca0e5f070dd5c)

There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request().  If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.

Simply check if the request is still already unregistered; if so,
return immediately and do nothing.

Fixes http://tracker.newdream.net/issues/2420

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: add auth buf in prepare_write_connect()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: add auth buf in prepare_write_connect()

(cherry picked from commit 3da54776e2c0385c32d143fd497a7f40a88e29dd)

Move the addition of the authorizer buffer to a connection's
out_kvec out of get_connect_authorizer() and into its caller.  This
way, the caller--prepare_write_connect()--can avoid adding the
connect header to out_kvec before it has been fully initialized.

Prior to this patch, it was possible for a connect header to be
sent over the wire before the authorizer protocol or buffer length
fields were initialized.  An authorizer buffer associated with that
header could also be queued to send only after the connection header
that describes it was on the wire.

Fixes http://tracker.newdream.net/issues/2424

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: rename prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: rename prepare_connect_authorizer()

(cherry picked from commit dac1e716c60161867a47745bca592987ca3a9cb2)

Change the name of prepare_connect_authorizer().  The next
patch is going to make this function no longer add anything to the
connection's out_kvec, so it will no longer fit the pattern of
the rest of the prepare_connect_*() functions.

In addition, pass the address of a variable that will hold the
authorization protocol to use.  Move the assignment of that to the
connection's out_connect structure into prepare_write_connect().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: return pointer from prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: return pointer from prepare_connect_authorizer()

(cherry picked from commit 729796be9190f57ca40ccca315e8ad34a1eb8fef)

Change prepare_connect_authorizer() so it returns a pointer (or
pointer-coded error).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: use info returned by get_authorizer
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: use info returned by get_authorizer

(cherry picked from commit 8f43fb53894079bf0caab6e348ceaffe7adc651a)

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: have get_authorizer methods return pointers
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: have get_authorizer methods return pointers

(cherry picked from commit a3530df33eb91d787d08c7383a0a9982690e42d0)

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: ensure auth ops are defined before use
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: ensure auth ops are defined before use

(cherry picked from commit a255651d4cad89f1a606edd36135af892ada4f20)

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: reduce args to create_authorizer
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: messenger: reduce args to create_authorizer

(cherry picked from commit 74f1869f76d043bad12ec03b4d5f04a8c3d1f157)

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: define ceph_auth_handshake type
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: define ceph_auth_handshake type

(cherry picked from commit 6c4a19158b96ea1fb8acbe0c1d5493d9dcd2f147)

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: check return from get_authorizer
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: check return from get_authorizer

(cherry picked from commit ed96af646011412c2bf1ffe860db170db355fae5)

In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value.  This function can
return an error, so check for it and return it if that ever occurs.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: rework prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: rework prepare_connect_authorizer()

(cherry picked from commit b1c6b9803f5491e94041e6da96bc9dec3870e792)

Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.

Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: check prepare_write_connect() result
Alex Elder [Thu, 17 May 2012 02:51:59 +0000 (21:51 -0500)]
ceph: messenger: check prepare_write_connect() result

(cherry picked from commit 5a0f8fdd8a0ebe320952a388331dc043d7e14ced)

prepare_write_connect() can return an error, but only one of its
callers checks for it.  All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: don't set WRITE_PENDING too early
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: don't set WRITE_PENDING too early

(cherry picked from commit e10c758e4031a801ea4d2f8fb39bf14c2658d74b)

prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection.  Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending.  It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.

Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.

Partially addresses http://tracker.newdream.net/issues/2424

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: drop msgr argument from prepare_write_connect()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: drop msgr argument from prepare_write_connect()

(cherry picked from commit e825a66df97776d30a48a187e3a986736af43945)

In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr.  Just get the msgr
value from the ceph connection and drop the unneeded argument.

The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: send banner in process_connect()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: send banner in process_connect()

(cherry picked from commit 41b90c00858129f52d08e6a05c9cfdb0f2bd074d)

prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message.  It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: reset connection kvec caller
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: reset connection kvec caller

(cherry picked from commit 84fb3adf6413862cff51d8af3fce5f0b655586a2)

Reset a connection's kvec fields in the caller rather than in
prepare_write_connect().   This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: don't reset kvec in prepare_write_banner()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
libceph: don't reset kvec in prepare_write_banner()

(cherry picked from commit d329156f16306449c273002486c28de3ddddfd89)

Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: change read_partial() to take "end" arg
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: change read_partial() to take "end" arg

(cherry picked from commit fd51653f78cf40a0516e521b6de22f329c5bad8d)

Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents.  This
amounts to moving the addition "to + size" into the caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: update "to" in read_partial() caller
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: update "to" in read_partial() caller

(cherry picked from commit e6cee71fac27c946a0bbad754dd076e66c4e9dbd)

read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.

Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: messenger: use read_partial() in read_partial_message()
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: use read_partial() in read_partial_message()

(cherry picked from commit 57dac9d1620942608306d8c17c98a9d1568ffdf4)

There are two blocks of code in read_partial_message()--those that
read the header and footer of the message--that can be replaced by a
call to read_partial().  Do that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: osd_client: fix endianness bug in osd_req_encode_op()
Alex Elder [Fri, 20 Apr 2012 20:49:43 +0000 (15:49 -0500)]
ceph: osd_client: fix endianness bug in osd_req_encode_op()

(cherry picked from commit 065a68f9167e20f321a62d044cb2c3024393d455)

From Al Viro <viro@zeniv.linux.org.uk>

Al Viro noticed that we were using a non-cpu-encoded value in
a switch statement in osd_req_encode_op().  The result would
clearly not work correctly on a big-endian machine.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrush: fix memory leak when destroying tree buckets
Sage Weil [Mon, 7 May 2012 22:37:05 +0000 (15:37 -0700)]
crush: fix memory leak when destroying tree buckets

(cherry picked from commit 6eb43f4b5a2a74599b4ff17a97c03a342327ca65)

Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrush: fix tree node weight lookup
Sage Weil [Mon, 7 May 2012 22:36:49 +0000 (15:36 -0700)]
crush: fix tree node weight lookup

(cherry picked from commit f671d4cd9b36691ac4ef42cde44c1b7a84e13631)

Fix the node weight lookup for tree buckets by using a correct accessor.

Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrush: be more tolerant of nonsensical crush maps
Sage Weil [Mon, 7 May 2012 22:35:24 +0000 (15:35 -0700)]
crush: be more tolerant of nonsensical crush maps

(cherry picked from commit a1f4895be8bf1ba56c2306b058f51619e9b0e8f8)

If we get a map that doesn't make sense, error out or ignore the badness
instead of BUGging out.  This reflects the ceph.git commits
9895f0bff7dc68e9b49b572613d242315fb11b6c and
8ded26472058d5205803f244c2f33cb6cb10de79.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrush: adjust local retry threshold
Sage Weil [Mon, 7 May 2012 22:35:09 +0000 (15:35 -0700)]
crush: adjust local retry threshold

(cherry picked from commit c90f95ed46393e29d843686e21947d1c6fcb1164)

This small adjustment reflects a change that was made in ceph.git commit
af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago.  An N-1
search is not exhaustive.  Fixed ceph.git bug #1594.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrush: clean up types, const-ness
Sage Weil [Mon, 7 May 2012 22:38:35 +0000 (15:38 -0700)]
crush: clean up types, const-ness

(cherry picked from commit 8b12d47b80c7a34dffdd98244d99316db490ec58)

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoselinux: fix sel_netnode_insert() suspicious rcu dereference
Dave Jones [Fri, 9 Nov 2012 00:09:27 +0000 (16:09 -0800)]
selinux: fix sel_netnode_insert() suspicious rcu dereference

commit 88a693b5c1287be4da937699cb82068ce9db0135 upstream.

===============================
[ INFO: suspicious RCU usage. ]
3.5.0-rc1+ #63 Not tainted
-------------------------------
security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by trinity-child1/8750:
 #0:  (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0

stack backtrace:
Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63
Call Trace:
 [<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130
 [<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0
 [<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0
 [<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0
 [<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0
 [<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff812c9536>] security_socket_bind+0x16/0x20
 [<ffffffff815550ca>] sys_bind+0x7a/0x100
 [<ffffffff816c03d5>] ? sysret_check+0x22/0x5d
 [<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b

This patch below does what Paul McKenney suggested in the previous thread.

Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoreiserfs: Protect reiserfs_quota_write() with write lock
Jan Kara [Tue, 13 Nov 2012 17:25:38 +0000 (18:25 +0100)]
reiserfs: Protect reiserfs_quota_write() with write lock

commit 361d94a338a3fd0cee6a4ea32bbc427ba228e628 upstream.

Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoreiserfs: Move quota calls out of write lock
Jan Kara [Tue, 13 Nov 2012 16:05:14 +0000 (17:05 +0100)]
reiserfs: Move quota calls out of write lock

commit 7af11686933726e99af22901d622f9e161404e6b upstream.

Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoreiserfs: Protect reiserfs_quota_on() with write lock
Jan Kara [Tue, 13 Nov 2012 15:34:17 +0000 (16:34 +0100)]
reiserfs: Protect reiserfs_quota_on() with write lock

commit b9e06ef2e8706fe669b51f4364e3aeed58639eb2 upstream.

In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoreiserfs: Fix lock ordering during remount
Jan Kara [Tue, 13 Nov 2012 13:55:52 +0000 (14:55 +0100)]
reiserfs: Fix lock ordering during remount

commit 3bb3e1fc47aca554e7e2cc4deeddc24750987ac2 upstream.

When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoNFS: Wait for session recovery to finish before returning
Bryan Schumaker [Tue, 30 Oct 2012 20:06:35 +0000 (16:06 -0400)]
NFS: Wait for session recovery to finish before returning

commit 399f11c3d872bd748e1575574de265a6304c7c43 upstream.

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

Client Server
------ ------
Open file over NFS v4.1
Write to file
Expire client
Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/i915: fix overlay on i830M
Daniel Vetter [Mon, 22 Oct 2012 10:55:55 +0000 (12:55 +0200)]
drm/i915: fix overlay on i830M

commit a9193983f4f292a82a00c72971c17ec0ee8c6c15 upstream.

The overlay on the i830M has a peculiar failure mode: It works the
first time around after boot-up, but consistenly hangs the second time
it's used.

Chris Wilson has dug out a nice errata:

"1.5.12 Clock Gating Disable for Display Register
Address Offset: 06200h–06203h

"Bit 3
Ovrunit Clock Gating Disable.
0 = Clock gating controlled by unit enabling logic
1 = Disable clock gating function
DevALM Errata ALM049: Overlay Clock Gating Must be Disabled:  Overlay
& L2 Cache clock gating must be disabled in order to prevent device
hangs when turning off overlay.SW must turn off Ovrunit clock gating
(6200h) and L2 Cache clock gating (C8h)."

Now I've nowhere found that 0xc8 register and hence couldn't apply the
l2 cache workaround. But I've remembered that part of the magic that
the OVERLAY_ON/OFF commands are supposed to do is to rearrange cache
allocations so that the overlay scaler has some scratch space.

And while pondering how that could explain the hang the 2nd time we
enable the overlay, I've remembered that the old ums overlay code did
_not_ issue the OVERLAY_OFF cmd.

And indeed, disabling the OFF cmd results in the overlay working
flawlessly, so I guess we can workaround the lack of the above
workaround by simply never disabling the overlay engine once it's
enabled.

Note that we have the first part of the above w/a already implemented
in i830_init_clock_gating - leave that as-is to avoid surprises.

v2: Add a comment in the code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47827
Tested-by: Rhys <rhyspuk@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2:
 - Adjust context
 - s/intel_ring_emit(ring, /OUT_RING(/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agos390/signal: set correct address space control
Martin Schwidefsky [Wed, 7 Nov 2012 09:44:08 +0000 (10:44 +0100)]
s390/signal: set correct address space control

commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream.

If user space is running in primary mode it can switch to secondary
or access register mode, this is used e.g. in the clock_gettime code
of the vdso. If a signal is delivered to the user space process while
it has been running in access register mode the signal handler is
executed in access register mode as well which will result in a crash
most of the time.

Set the address space control bits in the PSW to the default for the
execution of the signal handler and make sure that the previous
address space control is restored on signal return. Take care
that user space can not switch to the kernel address space by
modifying the registers in the signal frame.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agosky2: Fix for interrupt handler
Mirko Lindner [Tue, 3 Jul 2012 23:38:46 +0000 (23:38 +0000)]
sky2: Fix for interrupt handler

commit d663d181b9e92d80c2455e460e932d34e7a2a7ae upstream.

Re-enable interrupts if it is not our interrupt

Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoeCryptfs: check for eCryptfs cipher support at mount
Tim Sally [Thu, 12 Jul 2012 23:10:24 +0000 (19:10 -0400)]
eCryptfs: check for eCryptfs cipher support at mount

commit 5f5b331d5c21228a6519dcb793fc1629646c51a6 upstream.

The issue occurs when eCryptfs is mounted with a cipher supported by
the crypto subsystem but not by eCryptfs. The mount succeeds and an
error does not occur until a write. This change checks for eCryptfs
cipher support at mount time.

Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009.
https://bugs.launchpad.net/ecryptfs/+bug/338914

Signed-off-by: Tim Sally <tsally@atomicpeace.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoeCryptfs: Copy up POSIX ACL and read-only flags from lower mount
Tyler Hicks [Mon, 11 Jun 2012 22:42:32 +0000 (15:42 -0700)]
eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

commit 069ddcda37b2cf5bb4b6031a944c0e9359213262 upstream.

When the eCryptfs mount options do not include '-o acl', but the lower
filesystem's mount options do include 'acl', the MS_POSIXACL flag is not
flipped on in the eCryptfs super block flags. This flag is what the VFS
checks in do_last() when deciding if the current umask should be applied
to a newly created inode's mode or not. When a default POSIX ACL mask is
set on a directory, the current umask is incorrectly applied to new
inodes created in the directory. This patch ignores the MS_POSIXACL flag
passed into ecryptfs_mount() and sets the flag on the eCryptfs super
block depending on the flag's presence on the lower super block.

Additionally, it is incorrect to allow a writeable eCryptfs mount on top
of a read-only lower mount. This missing check did not allow writes to
the read-only lower mount because permissions checks are still performed
on the lower filesystem's objects but it is best to simply not allow a
rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw
mount is valid and still allowed.

https://launchpad.net/bugs/1009207

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Stefan Beller <stefanbeller@googlemail.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agousb: use usb_serial_put in usb_serial_probe errors
Jan Safrata [Tue, 22 May 2012 12:04:50 +0000 (14:04 +0200)]
usb: use usb_serial_put in usb_serial_probe errors

commit 0658a3366db7e27fa32c12e886230bb58c414c92 upstream.

The use of kfree(serial) in error cases of usb_serial_probe
was invalid - usb_serial structure allocated in create_serial()
gets reference of usb_device that needs to be put, so we need
to use usb_serial_put() instead of simple kfree().

Signed-off-by: Jan Safrata <jan.nikitenko@gmail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Cc: Richard Retanubun <richardretanubun@ruggedcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonetfilter: nf_nat: don't check for port change on ICMP tuples
Ulrich Weber [Thu, 25 Oct 2012 05:34:45 +0000 (05:34 +0000)]
netfilter: nf_nat: don't check for port change on ICMP tuples

commit 38fe36a248ec3228f8e6507955d7ceb0432d2000 upstream.

ICMP tuples have id in src and type/code in dst.
So comparing src.u.all with dst.u.all will always fail here
and ip_xfrm_me_harder() is called for every ICMP packet,
even if there was no NAT.

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonetfilter: Mark SYN/ACK packets as invalid from original direction
Jozsef Kadlecsik [Fri, 31 Aug 2012 09:55:53 +0000 (09:55 +0000)]
netfilter: Mark SYN/ACK packets as invalid from original direction

commit 64f509ce71b08d037998e93dd51180c19b2f464c upstream.

Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonetfilter: Validate the sequence number of dataless ACK packets as well
Jozsef Kadlecsik [Fri, 31 Aug 2012 09:55:54 +0000 (09:55 +0000)]
netfilter: Validate the sequence number of dataless ACK packets as well

commit 4a70bbfaef0361d27272629d1a250a937edcafe4 upstream.

We spare nothing by not validating the sequence number of dataless
ACK packets and enabling it makes harder off-path attacks.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agor8169: allow multicast packets on sub-8168f chipset.
Nathan Walp [Thu, 1 Nov 2012 12:08:47 +0000 (12:08 +0000)]
r8169: allow multicast packets on sub-8168f chipset.

commit 0481776b7a70f09acf7d9d97c288c3a8403fbfe4 upstream.

RTL_GIGA_MAC_VER_35 includes no multicast hardware filter.

Signed-off-by: Nathan Walp <faceprint@faceprint.com>
Suggested-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agor8169: Fix WoL on RTL8168d/8111d.
Cyril Brulebois [Wed, 31 Oct 2012 14:00:46 +0000 (14:00 +0000)]
r8169: Fix WoL on RTL8168d/8111d.

commit b00e69dee4ccbb3a19989e3d4f1385bc2e3406cd upstream.

This regression was spotted between Debian squeeze and Debian wheezy
kernels (respectively based on 2.6.32 and 3.2). More info about
Wake-on-LAN issues with Realtek's 816x chipsets can be found in the
following thread: http://marc.info/?t=132079219400004

Probable regression from d4ed95d796e5126bba51466dc07e287cebc8bd19;
more chipsets are likely affected.

Tested on top of a 3.2.23 kernel.

Reported-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Tested-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Hinted-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Cyril Brulebois <kibi@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoxen/events: fix RCU warning, or Call idle notifier after irq_enter()
Mojiong Qiu [Tue, 6 Nov 2012 08:08:15 +0000 (16:08 +0800)]
xen/events: fix RCU warning, or Call idle notifier after irq_enter()

commit 772aebcefeff310f80e32b874988af0076cb799d upstream.

exit_idle() should be called after irq_enter(), otherwise it throws:

[ INFO: suspicious RCU usage. ]
3.6.5 #1 Not tainted
-------------------------------
include/linux/rcupdate.h:725 rcu_read_lock() used illegally while idle!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
 #0:  (rcu_read_lock){......}, at: [<ffffffff810e9fe0>] __atomic_notifier_call_chain+0x0/0x140

stack backtrace:
Pid: 0, comm: swapper/0 Not tainted 3.6.5 #1
Call Trace:
 <IRQ>  [<ffffffff811259a2>] lockdep_rcu_suspicious+0xe2/0x130
 [<ffffffff810ea10c>] __atomic_notifier_call_chain+0x12c/0x140
 [<ffffffff810e9fe0>] ? atomic_notifier_chain_unregister+0x90/0x90
 [<ffffffff811216cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810ea136>] atomic_notifier_call_chain+0x16/0x20
 [<ffffffff810777c3>] exit_idle+0x43/0x50
 [<ffffffff81568865>] xen_evtchn_do_upcall+0x25/0x50
 [<ffffffff81aa690e>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
 [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
 [<ffffffff81061540>] ? xen_safe_halt+0x10/0x20
 [<ffffffff81075cfa>] ? default_idle+0xba/0x570
 [<ffffffff810778af>] ? cpu_idle+0xdf/0x140
 [<ffffffff81a4d881>] ? rest_init+0x135/0x144
 [<ffffffff81a4d74c>] ? csum_partial_copy_generic+0x16c/0x16c
 [<ffffffff82520c45>] ? start_kernel+0x3db/0x3e8
 [<ffffffff8252066a>] ? repair_env_string+0x5a/0x5a
 [<ffffffff82520356>] ? x86_64_start_reservations+0x131/0x135
 [<ffffffff82524aca>] ? xen_start_kernel+0x465/0x46

Git commit 98ad1cc14a5c4fd658f9d72c6ba5c86dfd3ce0d5
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Fri Oct 7 18:22:09 2011 +0200

    x86: Call idle notifier after irq_enter()

did this, but it missed the Xen code.

Signed-off-by: Mojiong Qiu <mjqiu@tencent.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agor8169: use unlimited DMA burst for TX
Michal Schmidt [Sun, 9 Sep 2012 13:55:26 +0000 (13:55 +0000)]
r8169: use unlimited DMA burst for TX

commit aee77e4accbeb2c86b1d294cd84fec4a12dde3bd upstream.

The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
a box where this prevents the interface from using the gigabit line to its full
potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.

The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
(XID 0c900880). TSO is enabled.

I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
'7'=unlimited). This chart shows the results:
http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png

Interesting points:
 - With the current DMA burst limit (1024):
   - at the default MTU=1500 I get only 842 Mbit/s.
   - when going from small MTU, the performance rises monotonically with
     increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
     a sudden drop to 762 MBit/s from which the throughput rises monotonically
     again with further MTU increases.
 - With a smaller DMA burst limit (512):
   - there's a similar peak at MTU=1076 and another one at MTU=564.
 - With unlimited DMA burst:
   - at the default MTU=1500 I get nice 940 Mbit/s.
   - the throughput rises monotonically with increasing MTU with no strange
     peaks.

Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
limit plus 52. Why 52? Because:
  20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52

The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
the smaller burst limit, or if any other versions have similar requirements.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agotmpfs: change final i_blocks BUG to WARNING
Hugh Dickins [Fri, 16 Nov 2012 22:15:04 +0000 (14:15 -0800)]
tmpfs: change final i_blocks BUG to WARNING

commit 0f3c42f522dc1ad7e27affc0a4aa8c790bce0a66 upstream.

Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.

It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count.  There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.

One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected.  Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.

So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonet-rps: Fix brokeness causing OOO packets
Tom Herbert [Fri, 16 Nov 2012 09:04:15 +0000 (09:04 +0000)]
net-rps: Fix brokeness causing OOO packets

[ Upstream commit baefa31db2f2b13a05d1b81bdf2d20d487f58b0a ]

In commit c445477d74ab3779 which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonet: correct check in dev_addr_del()
Jiri Pirko [Wed, 14 Nov 2012 02:51:04 +0000 (02:51 +0000)]
net: correct check in dev_addr_del()

[ Upstream commit a652208e0b52c190e57f2a075ffb5e897fe31c3b ]

Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value
Hannes Frederic Sowa [Sat, 10 Nov 2012 19:52:34 +0000 (19:52 +0000)]
ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value

[ Upstream commit d4596bad2a713fcd0def492b1960e6d899d5baa8 ]

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoipv4: avoid undefined behavior in do_ip_setsockopt()
Xi Wang [Sun, 11 Nov 2012 11:20:01 +0000 (11:20 +0000)]
ipv4: avoid undefined behavior in do_ip_setsockopt()

[ Upstream commit 0c9f79be295c99ac7e4b569ca493d75fdcc19e4e ]

(1<<optname) is undefined behavior in C with a negative optname or
optname larger than 31.  In those cases the result of the shift is
not necessarily zero (e.g., on x86).

This patch simplifies the code with a switch statement on optname.
It also allows the compiler to generate better code (e.g., using a
64-bit mask).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agom68k: fix sigset_t accessor functions
Andreas Schwab [Sat, 17 Nov 2012 21:27:04 +0000 (22:27 +0100)]
m68k: fix sigset_t accessor functions

commit 34fa78b59c52d1db3513db4c1a999db26b2e9ac2 upstream.

The sigaddset/sigdelset/sigismember functions that are implemented with
bitfield insn cannot allow the sigset argument to be placed in a data
register since the sigset is wider than 32 bits.  Remove the "d"
constraint from the asm statements.

The effect of the bug is that sending RT signals does not work, the signal
number is truncated modulo 32.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agowireless: allow 40 MHz on world roaming channels 12/13
Johannes Berg [Mon, 12 Nov 2012 09:51:34 +0000 (10:51 +0100)]
wireless: allow 40 MHz on world roaming channels 12/13

commit 43c771a1963ab461a2f194e3c97fded1d5fe262f upstream.

When in world roaming mode, allow 40 MHz to be used
on channels 12 and 13 so that an AP that is, e.g.,
using HT40+ on channel 9 (in the UK) can be used.

Reported-by: Eddie Chapman <eddie@ehuk.net>
Tested-by: Eddie Chapman <eddie@ehuk.net>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomemcg: oom: fix totalpages calculation for memory.swappiness==0
Michal Hocko [Fri, 16 Nov 2012 22:14:49 +0000 (14:14 -0800)]
memcg: oom: fix totalpages calculation for memory.swappiness==0

commit 9a5a8f19b43430752067ecaee62fc59e11e88fa6 upstream.

oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation.  The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp.  memsw portion of it).

This is usually correct but since fe35004fbf9e ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages.  This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.

The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agottm: Clear the ttm page allocated from high memory zone correctly
Zhao Yakui [Tue, 13 Nov 2012 18:31:55 +0000 (18:31 +0000)]
ttm: Clear the ttm page allocated from high memory zone correctly

commit ac207ed2471150e06af0afc76e4becc701fa2733 upstream.

The TTM page can be allocated from high memory. In such case it is
wrong to use the page_address(page) as the virtual address for the high memory
page.

bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50241

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/radeon: fix logic error in atombios_encoders.c
Alex Deucher [Wed, 14 Nov 2012 14:10:39 +0000 (09:10 -0500)]
drm/radeon: fix logic error in atombios_encoders.c

commit b9196395c905edec512dfd6690428084228c16ec upstream.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=50431

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoUSB: option: add Alcatel X220/X500D USB IDs
Dan Williams [Thu, 8 Nov 2012 17:56:53 +0000 (11:56 -0600)]
USB: option: add Alcatel X220/X500D USB IDs

commit c0bc3098871dd9b964f6b45ec1e4d70d87811744 upstream.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoUSB: option: add Novatel E362 and Dell Wireless 5800 USB IDs
Dan Williams [Thu, 8 Nov 2012 17:56:42 +0000 (11:56 -0600)]
USB: option: add Novatel E362 and Dell Wireless 5800 USB IDs

commit fcb21645f1bd86d2be29baf48aa1b298de52ccc7 upstream.

The Dell 5800 appears to be a simple rebrand of the Novatel E362.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agos390/gup: add missing TASK_SIZE check to get_user_pages_fast()
Heiko Carstens [Mon, 22 Oct 2012 13:49:02 +0000 (15:49 +0200)]
s390/gup: add missing TASK_SIZE check to get_user_pages_fast()

commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream.

When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoRevert "Staging: Android alarm: IOCTL command encoding fix"
Colin Cross [Thu, 8 Nov 2012 02:21:51 +0000 (18:21 -0800)]
Revert "Staging: Android alarm: IOCTL command encoding fix"

commit d38e0e3fed4f58bcddef4dc93a591dfe2f651cb0 upstream.

Commit 6bd4a5d96c08dc2380f8053b1bd4f879f55cd3c9 changed the
ANDROID_ALARM_GET_TIME ioctls from IOW to IOR.  While technically
correct, the _IOC_DIR bits are ignored by alarm_ioctl, so the
commit breaks a userspace ABI used by all existing Android devices
for a purely cosmetic reason.  Revert it.

Cc: Dae S. Kim <dae@velatum.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoUBIFS: introduce categorized lprops counter
Artem Bityutskiy [Wed, 10 Oct 2012 07:55:28 +0000 (10:55 +0300)]
UBIFS: introduce categorized lprops counter

commit 98a1eebda3cb2a84ecf1f219bb3a95769033d1bf upstream.

This commit is a preparation for a subsequent bugfix. We introduce a
counter for categorized lprops.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoUBIFS: fix mounting problems after power cuts
Artem Bityutskiy [Tue, 9 Oct 2012 13:20:15 +0000 (16:20 +0300)]
UBIFS: fix mounting problems after power cuts

commit a28ad42a4a0c6f302f488f26488b8b37c9b30024 upstream.

This is a bugfix for a problem with the following symptoms:

1. A power cut happens
2. After reboot, we try to mount UBIFS
3. Mount fails with "No space left on device" error message

UBIFS complains like this:

UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB

The root cause of this problem is that when we mount, not all LEBs are
categorized. Only those which were read are. However, the
'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were
categorized and 'c->freeable_cnt' is valid, which is a false assumption.

This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()'
to always fall back to LPT scanning if no freeable LEBs were found.

This problem was reported by few people in the past, but Brent Taylor
was able to reproduce it and send me a flash image which cannot be mounted,
which made it easy to hunt the bug. Kudos to Brent.

Reported-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoASoC: dapm: Use card_list during DAPM shutdown
Misael Lopez Cruz [Thu, 8 Nov 2012 18:03:12 +0000 (12:03 -0600)]
ASoC: dapm: Use card_list during DAPM shutdown

commit 445632ad6dda42f4d3f9df2569a852ca0d4ea608 upstream.

DAPM shutdown incorrectly uses "list" field of codec struct while
iterating over probed components (codec_dev_list). "list" field
refers to codecs registered in the system, "card_list" field is
used for probed components.

Signed-off-by: Misael Lopez Cruz <misael.lopez@ti.com>
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoASoC: wm8978: pll incorrectly configured when codec is master
Eric Millbrandt [Fri, 2 Nov 2012 21:05:44 +0000 (17:05 -0400)]
ASoC: wm8978: pll incorrectly configured when codec is master

commit 55c6f4cb6ef49afbb86222c6a3ff85329199c729 upstream.

When MCLK is supplied externally and BCLK and LRC are configured as outputs
(codec is master), the PLL values are only calculated correctly on the first
transmission.  On subsequent transmissions, at differenct sample rates, the
wrong PLL values are used.  Test for f_opclk instead of f_pllout to determine
if the PLL values are needed.

Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda - Add a missing quirk entry for iMac 9,1
Takashi Iwai [Mon, 12 Nov 2012 09:07:36 +0000 (10:07 +0100)]
ALSA: hda - Add a missing quirk entry for iMac 9,1

commit 05193639ca977cc889668718adb38db6d585045b upstream.

This is another variant of iMac 9,1 with a different codec SSID.

Reported-and-tested-by: Everaldo Canuto <everaldo.canuto@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda - Add new codec ALC668 and ALC900 (default name ALC1150)
Kailang Yang [Thu, 8 Nov 2012 09:25:37 +0000 (10:25 +0100)]
ALSA: hda - Add new codec ALC668 and ALC900 (default name ALC1150)

commit 19a62823eae453619604636082085812c14ee391 upstream.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda - Fix invalid connections in VT1802 codec
Takashi Iwai [Wed, 7 Nov 2012 09:37:48 +0000 (10:37 +0100)]
ALSA: hda - Fix invalid connections in VT1802 codec

commit ef4da45828603df57e5e21b8aa21a66ce309f79b upstream.

VT1802 codec provides the invalid connection lists of NID 0x24 and
0x33 containing the routes to a non-exist widget 0x3e.  This confuses
the auto-parser.  Fix it up in the driver by overriding these
connections.

Reported-by: Massimo Del Fedele <max@veneto.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda - Fix empty DAC filling in patch_via.c
Takashi Iwai [Wed, 7 Nov 2012 09:32:47 +0000 (10:32 +0100)]
ALSA: hda - Fix empty DAC filling in patch_via.c

commit 5b3761954dac2d1393beef8210eb8cee81d16b8d upstream.

In via_auto_fill_adc_nids(), the parser tries to fill dac_nids[] at
the point of the current line-out (i).  When no valid path is found
for this output, this results in dac = 0, thus it creates a hole in
dac_nids[].  This confuses is_empty_dac() and trims the detected DAC
in later reference.

This patch fixes the bug by appending DAC properly to dac_nids[] in
via_auto_fill_adc_nids().

Reported-by: Massimo Del Fedele <max@veneto.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda - Force to reset IEC958 status bits for AD codecs
Takashi Iwai [Mon, 5 Nov 2012 11:32:46 +0000 (12:32 +0100)]
ALSA: hda - Force to reset IEC958 status bits for AD codecs

commit ae24c3191ba2ab03ec6b4be323e730e00404b4b6 upstream.

Several bug reports suggest that the forcibly resetting IEC958 status
bits is required for AD codecs to get the SPDIF output working
properly after changing streams.

Original fix credit to Javeed Shaikh.

BugLink: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/359361
Reported-by: Robin Kreis <r.kreis@uni-bremen.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: HDA: Fix digital microphone on CS420x
Daniel J Blueman [Sun, 4 Nov 2012 05:19:03 +0000 (13:19 +0800)]
ALSA: HDA: Fix digital microphone on CS420x

commit 16337e028a6dae9fbdd718c0d42161540a668ff3 upstream.

Correctly enable the digital microphones with the right bits in the
right coeffecient registers on Cirrus CS4206/7 codecs. It also
prevents misconfiguring ADC1/2.

This fixes the digital mic on the Macbook Pro 10,1/Retina.

Based-on-patch-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: hda: Cirrus: Fix coefficient index for beep configuration
Alexander Stein [Thu, 1 Nov 2012 12:42:37 +0000 (13:42 +0100)]
ALSA: hda: Cirrus: Fix coefficient index for beep configuration

commit 5a83b4b5a391f07141b157ac9daa51c409e71ab5 upstream.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption
Jussi Kivilinna [Sun, 21 Oct 2012 17:42:28 +0000 (20:42 +0300)]
crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption

commit 9efade1b3e981f5064f9db9ca971b4dc7557ae42 upstream.

cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
However cryptd_enqueue_request might be called from softirq context,
so add local_bh_disable/local_bh_enable to prevent data corruption and
panics.

Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2

v2:
 - Disable software interrupts instead of hardware interrupts

Reported-by: Gurucharan Shetty <gurucharan.shetty@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocifs: fix potential buffer overrun in cifs.idmap handling code
Jeff Layton [Sat, 3 Nov 2012 13:37:28 +0000 (09:37 -0400)]
cifs: fix potential buffer overrun in cifs.idmap handling code

commit 36960e440ccf94349c09fb944930d3bfe4bc473f upstream.

The userspace cifs.idmap program generally works with the wbclient libs
to generate binary SIDs in userspace. That program defines the struct
that holds these values as having a max of 15 subauthorities. The kernel
idmapping code however limits that value to 5.

When the kernel copies those values around though, it doesn't sanity
check the num_subauths value handed back from userspace or from the
server. It's possible therefore for userspace to hand us back a bogus
num_subauths value (or one that's valid, but greater than 5) that could
cause the kernel to walk off the end of the cifs_sid->sub_auths array.

Fix this by defining a new routine for copying sids and using that in
all of the places that copy it. If we end up with a sid that's longer
than expected then this approach will just lop off the "extra" subauths,
but that's basically what the code does today already. Better approaches
might be to fix this code to reject SIDs with >5 subauths, or fix it
to handle the subauths array dynamically.

At the same time, change the kernel to check the length of the data
returned by userspace. If it's shorter than struct cifs_sid, reject it
and return -EIO. If that happens we'll end up with fields that are
basically uninitialized.

Long term, it might make sense to redefine cifs_sid using a flexarray at
the end, to allow for variable-length subauth lists, and teach the code
to handle the case where the subauths array being passed in from
userspace is shorter than 5 elements.

Note too, that I don't consider this a security issue since you'd need
a compromised cifs.idmap program. If you have that, you can do all sorts
of nefarious stuff. Still, this is probably reasonable for stable.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomodule: fix out-by-one error in kallsyms
Rusty Russell [Thu, 25 Oct 2012 00:19:25 +0000 (10:49 +1030)]
module: fix out-by-one error in kallsyms

commit 59ef28b1f14899b10d6b2682c7057ca00a9a3f47 upstream.

Masaki found and patched a kallsyms issue: the last symbol in a
module's symtab wasn't transferred.  This is because we manually copy
the zero'th entry (which is always empty) then copy the rest in a loop
starting at 1, though from src[0].  His fix was minimal, I prefer to
rewrite the loops in more standard form.

There are two loops: one to get the size, and one to copy.  Make these
identical: always count entry 0 and any defined symbol in an allocated
non-init section.

This bug exists since the following commit was introduced.
   module: reduce symbol table for loaded modules (v2)
   commit: 4a4962263f07d14660849ec134ee42b63e95ea9a

LKML: http://lkml.org/lkml/2012/10/24/27
Reported-by: Masaki Kimura <masaki.kimura.kz@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agofanotify: fix missing break
Eric Paris [Thu, 8 Nov 2012 23:53:37 +0000 (15:53 -0800)]
fanotify: fix missing break

commit 848561d368751a1c0f679b9f045a02944506a801 upstream.

Anders Blomdell noted in 2010 that Fanotify lost events and provided a
test case.  Eric Paris confirmed it was a bug and posted a fix to the
list

  https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE

but never applied it.  Repeated attempts over time to actually get him
to apply it have never had a reply from anyone who has raised it

So apply it anyway

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomac80211: call skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge
Felix Fietkau [Sat, 10 Nov 2012 02:44:14 +0000 (03:44 +0100)]
mac80211: call skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge

commit 1f98ab7fef48a2968f37f422c256c9fbd978c3f0 upstream.

Fixes more wifi status skb leaks, leading to hostapd/wpa_supplicant hangs.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomac80211: don't send null data packet when not associated
Johannes Berg [Thu, 8 Nov 2012 13:06:28 +0000 (14:06 +0100)]
mac80211: don't send null data packet when not associated

commit 20f544eea03db4b498942558b882d463ce575c3e upstream.

On resume or firmware recovery, mac80211 sends a null
data packet to see if the AP is still around and hasn't
disconnected us. However, it always does this even if
it wasn't even connected before, leading to a warning
in the new channel context code. Fix this by checking
that it's associated.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomac80211: sync acccess to tx_filtered/ps_tx_buf queues
Arik Nemtsov [Mon, 5 Nov 2012 08:27:52 +0000 (10:27 +0200)]
mac80211: sync acccess to tx_filtered/ps_tx_buf queues

commit 987c285c2ae2e4e32aca3a9b3252d28171c75711 upstream.

These are accessed without a lock when ending STA PSM. If the
sta_cleanup timer accesses these lists at the same time, we might crash.

This may fix some mysterious crashes we had during
ieee80211_sta_ps_deliver_wakeup.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoxfs: drop buffer io reference when a bad bio is built
Dave Chinner [Mon, 12 Nov 2012 11:09:46 +0000 (22:09 +1100)]
xfs: drop buffer io reference when a bad bio is built

commit d69043c42d8c6414fa28ad18d99973aa6c1c2e24 upstream.

Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomm: bugfix: set current->reclaim_state to NULL while returning from kswapd()
Takamori Yamaguchi [Thu, 8 Nov 2012 23:53:39 +0000 (15:53 -0800)]
mm: bugfix: set current->reclaim_state to NULL while returning from kswapd()

commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream.

In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.

In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.

Signed-off-by: Takamori Yamaguchi <takamori.yamaguchi@jp.sony.com>
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoLinux 3.4.19 v3.4.19
Greg Kroah-Hartman [Sat, 17 Nov 2012 21:16:56 +0000 (13:16 -0800)]
Linux 3.4.19

11 years agoALSA: usb-audio: Fix mutex deadlock at disconnection
Takashi Iwai [Tue, 13 Nov 2012 10:22:48 +0000 (11:22 +0100)]
ALSA: usb-audio: Fix mutex deadlock at disconnection

commit 10e44239f67d0b6fb74006e61a7e883b8075247a upstream.

The recent change for USB-audio disconnection race fixes introduced a
mutex deadlock again.  There is a circular dependency between
chip->shutdown_rwsem and pcm->open_mutex, depicted like below, when a
device is opened during the disconnection operation:

A. snd_usb_audio_disconnect() ->
     card.c::register_mutex ->
       chip->shutdown_rwsem (write) ->
         snd_card_disconnect() ->
           pcm.c::register_mutex ->
             pcm->open_mutex

B. snd_pcm_open() ->
     pcm->open_mutex ->
       snd_usb_pcm_open() ->
         chip->shutdown_rwsem (read)

Since the chip->shutdown_rwsem protection in the case A is required
only for turning on the chip->shutdown flag and it doesn't have to be
taken for the whole operation, we can reduce its window in
snd_usb_audio_disconnect().

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoALSA: Fix card refcount unbalance
Takashi Iwai [Thu, 8 Nov 2012 13:36:18 +0000 (14:36 +0100)]
ALSA: Fix card refcount unbalance

commit 8bb4d9ce08b0a92ca174e41d92c180328f86173f upstream.

There are uncovered cases whether the card refcount introduced by the
commit a0830dbd isn't properly increased or decreased:
- OSS PCM and mixer success paths
- When lookup function gets NULL

This patch fixes these places.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50251

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoxfs: fix reading of wrapped log data
Dave Chinner [Fri, 2 Nov 2012 00:38:44 +0000 (11:38 +1100)]
xfs: fix reading of wrapped log data

commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream.

Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoUSB: mos7840: remove unused variable
Johan Hovold [Thu, 8 Nov 2012 17:28:59 +0000 (18:28 +0100)]
USB: mos7840: remove unused variable

Fix warning about unused variable introduced by commit e681b66f2e19fa
("USB: mos7840: remove invalid disconnect handling") upstream.

A subsequent fix which removed the disconnect function got rid of the
warning but that one was only backported to v3.6.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/i915: clear the entire sdvo infoframe buffer
Daniel Vetter [Sun, 21 Oct 2012 10:52:39 +0000 (12:52 +0200)]
drm/i915: clear the entire sdvo infoframe buffer

commit b6e0e543f75729f207b9c72b0162ae61170635b2 upstream.

Like in the case of native hdmi, which is fixed already in

commit adf00b26d18e1b3570451296e03bcb20e4798cdd
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Tue Sep 25 13:23:34 2012 -0300

    drm/i915: make sure we write all the DIP data bytes

we need to clear the entire sdvo buffer to avoid upsetting the
display.

Since infoframe buffer writing is now a bit more elaborate, extract it
into it's own function. This will be useful if we ever get around to
properly update the ELD for sdvo. Also #define proper names for the
two buffer indexes with fixed usage.

v2: Cite the right commit above, spotted by Paulo Zanoni.

v3: I'm too stupid to paste the right commit.

v4: Ben Hutchings noticed that I've failed to handle an underflow in
my loop logic, breaking it for i >= length + 8. Since I've just lost C
programmer license, use his solution. Also, make the frustrated 0-base
buffer size a notch more clear.

Reported-and-tested-by: Jürg Billeter <j@bitron.ch>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/i915: fixup infoframe support for sdvo
Daniel Vetter [Sat, 12 May 2012 18:22:00 +0000 (20:22 +0200)]
drm/i915: fixup infoframe support for sdvo

commit 81014b9d0b55fb0b48f26cd2a943359750d532db upstream.

At least the worst offenders:
- SDVO specifies that the encoder should compute the ecc. Testing also
  shows that we must not send the ecc field, so copy the dip_infoframe
  struct to a temporay place and avoid the ecc field. This way the avi
  infoframe is exactly 17 bytes long, which agrees with what the spec
  mandates as a minimal storage capacity (with the ecc field it would
  be 18 bytes).
- Only 17 when sending the avi infoframe. The SDVO spec explicitly
  says that sending more data than what the device announces results
  in undefined behaviour.
- Add __attribute__((packed)) to the avi and spd infoframes, for
  otherwise they're wrongly aligned. Noticed because the avi infoframe
  ended up being 18 bytes large instead of 17. We haven't noticed this
  yet because we don't use the uint16_t fields yet (which are the only
  ones that would be wrongly aligned).

This regression has been introduce by

3c17fe4b8f40a112a85758a9ab2aebf772bdd647 is the first bad commit
commit 3c17fe4b8f40a112a85758a9ab2aebf772bdd647
Author: David Härdeman <david@hardeman.nu>
Date:   Fri Sep 24 21:44:32 2010 +0200

    i915: enable AVI infoframe for intel_hdmi.c [v4]

Patch tested on my g33 with a sdvo hdmi adaptor.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Tested-by: Peter Ross <pross@xvid.org> (G35 SDVO-HDMI)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/radeon/si: add some missing regs to the VM reg checker
Alex Deucher [Thu, 8 Nov 2012 15:13:24 +0000 (10:13 -0500)]
drm/radeon/si: add some missing regs to the VM reg checker

commit f418b88aad0c42b4caf4d79a0cf8d14a5d0a2284 upstream.

This register is needed for streamout to work properly.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/radeon/cayman: add some missing regs to the VM reg checker
Alex Deucher [Thu, 8 Nov 2012 15:08:04 +0000 (10:08 -0500)]
drm/radeon/cayman: add some missing regs to the VM reg checker

commit 860fe2f05fa2eacac84368e23547ec8cf3cc6652 upstream.

These regs were being wronly rejected leading to rendering
issues.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=56876

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/vmwgfx: Fix a case where the code would BUG when trying to pin GMR memory
Thomas Hellstrom [Fri, 9 Nov 2012 09:45:14 +0000 (10:45 +0100)]
drm/vmwgfx: Fix a case where the code would BUG when trying to pin GMR memory

commit afcc87aa6a233e52df73552dc1dc9ae3881b7cc8 upstream.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dmitry Torokhov <dtor@vmware.com>
Cc: linux-graphics-maintainer@vmware.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrm/vmwgfx: Fix hibernation device reset
Thomas Hellstrom [Fri, 9 Nov 2012 09:05:57 +0000 (10:05 +0100)]
drm/vmwgfx: Fix hibernation device reset

commit 95e8f6a21996c4cc2c4574b231c6e858b749dce3 upstream.

The device would not reset properly when resuming from hibernation.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dmitry Torokhov <dtor@vmware.com>
Cc: linux-graphics-maintainer@vmware.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agommc: sdhci: fix NULL dereference in sdhci_request() tuning
Chris Ball [Mon, 5 Nov 2012 19:29:49 +0000 (14:29 -0500)]
mmc: sdhci: fix NULL dereference in sdhci_request() tuning

commit 14efd957209461bbdf285bf0d67e931955d04a4c upstream.

Commit 473b095a72a9 ("mmc: sdhci: fix incorrect command used in tuning")
introduced a NULL dereference at resume-time if an SD 3.0 host controller
raises the SDHCI_NEEDS_TUNING flag while no card is inserted.  Seen on an
OLPC XO-4 with sdhci-pxav3, but presumably affects other controllers too.

Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agofutex: Handle futex_pi OWNER_DIED take over correctly
Thomas Gleixner [Tue, 23 Oct 2012 20:29:38 +0000 (22:29 +0200)]
futex: Handle futex_pi OWNER_DIED take over correctly

commit 59fa6245192159ab5e1e17b8e31f15afa9cff4bf upstream.

Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076

The detailed problem analysis shows:

Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --> T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --> exit_robust_list() runs
       --> Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --> Succeeds due to the check for F's userspace TID field == 0
   --> Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --> returns to user space

T1 --> exit_pi_state_list()
       --> Transfers pi_state to waiter T2 and wakes T2 via
           rt_mutex_unlock(&pi_state->mutex)

T2 --> acquires pi_state->mutex and gains real ownership of the
       pi_state
   --> Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --> returns to user space

T3 --> observes inconsistent state

This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.

So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:

if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {

We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.

AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.

Now the proposed change

- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+       if (unlikely(ownerdied ||
+                       !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {

solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.

What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state->mutex). Probelm solved!

Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.

Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.

lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.

So the above scenario changes to:

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --> T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --> exit_robust_list() runs
       --> Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --> Finds pi_state and blocks on pi_state->rt_mutex

T1 --> exit_pi_state_list()
       --> Transfers pi_state to waiter T2 and wakes it via
           rt_mutex_unlock(&pi_state->mutex)

T2 --> acquires pi_state->mutex and gains ownership of the pi_state
   --> Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --> returns to user space

This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.

Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.

As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)

Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>