This is to change use of "0x%08x" in favour of "%p" as per ../Documentation/printk-formats.txt,
which also takes care about the following warning during compilation time:
drivers/scsi/aha152x.c: In function ‘get_command’:
drivers/scsi/aha152x.c:2987: warning: cast from pointer to integer of different size
We don't use "dev" any more after 07ec747a5f ("libsas: remove
ata_port.lock management duties from lldds") and it causes a compile
warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/leds/leds-lp5521.c: In function `lp5521_load_program':
drivers/leds/leds-lp5521.c:214:21: warning: `mode' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c: In function `lp5521_probe':
drivers/leds/leds-lp5521.c:788:5: warning: `buf' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c:740:6: warning: `ret' may be used uninitialized in this function [-Wuninitialized]
These are real problems if lp5521_read() returns an error. When that
happens we should handle it, instead of ignoring it or doing a bitwise
OR with all the other error codes and continuing.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Milo <Milo.Kim@ti.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Bryan Wu <bryan.wu@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC [M] sound/usb/caiaq/device.o
sound/usb/caiaq/device.c: In function ‘snd_probe’:
sound/usb/caiaq/device.c:500:16: warning: ‘card’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
If remote device sends bogus RFC option with invalid length,
undefined options values are used. Fix this by using defaults when
remote misbehaves.
This also fixes the following warning reported by gcc 4.7.0:
net/bluetooth/l2cap_core.c: In function 'l2cap_config_rsp':
net/bluetooth/l2cap_core.c:3302:13: warning: 'rfc.max_pdu_size' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.max_pdu_size' was declared here
net/bluetooth/l2cap_core.c:3298:25: warning: 'rfc.monitor_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.monitor_timeout' was declared here
net/bluetooth/l2cap_core.c:3297:25: warning: 'rfc.retrans_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.retrans_timeout' was declared here
net/bluetooth/l2cap_core.c:3295:2: warning: 'rfc.mode' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.mode' was declared here
For some reason the declaration of ceph_con_get() and
ceph_con_put() did not get deleted in this commit: d59315ca libceph: drop ceph_con_get/put helpers and nref member
Clean that up.
Signed-off-by: Alex Elder <elder@inktank.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As Russell has pointed out, that commit isn't fixing
Software Flow Control at all, and it actually makes
it even more broken.
It was agreed to revert this commit and use Russell's
latest UART patches instead.
Signed-off-by: Felipe Balbi <balbi@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Tony Lindgren <tony@atomide.com> Cc: Andreas Bießmann <andreas.devel@googlemail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are systems where video module known to work fine regardless
of broken _DOD and ignoring returned value here doesn't cause
any issues later. This should fix brightness controls on some laptops.
A pgoff_t is defined (by default) to have type (unsigned long). On
architectures such as i686 that's a 32-bit type. The ceph address
space code was attempting to produce 64 bit offsets by shifting a
page's index by PAGE_CACHE_SHIFT, but the result was not what was
desired because the shift occurred before the result got promoted
to 64 bits.
Fix this by converting all uses of page->index used in this way to
use the page_offset() macro, which ensures the 64-bit result has the
intended value.
This fixes http://tracker.newdream.net/issues/3112
Reported-by: Mohamed Pakkeer <pakkeer.mohideen@realimage.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a
message. If that races with a timeout that resends a zillion messages and
resets the connection, and the ->alloc_msg() method returns a NULL message,
it will call ceph_msg_put(NULL) and BUG.
Fix by only calling put if msg is non-NULL.
Fixes http://tracker.newdream.net/issues/3142
Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If ceph_fault() is unable to queue work after a delay, it sets the
BACKOFF connection flag so con_work() will attempt to do so.
In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
result in newly-queued work, it simply ignores this condition and
proceeds as if no backoff delay were desired. There are two
problems with this--one of which is a bug.
The first problem is simply that the intended behavior is to back
off, and if we aren't able queue the work item to run after a delay
we're not doing that.
The only reason queue_delayed_work() won't queue work is if the
provided work item is already queued. In the messenger, this
means that con_work() is already scheduled to be run again. So
if we simply set the BACKOFF flag again when this occurs, we know
the next con_work() call will again attempt to hold off activity
on the connection until after the delay.
The second problem--the bug--is a leak of a reference count. If
queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
the connection reference held on entry to con_work(). However,
processing is (was) allowed to continue, and at the end of the
function a second con->ops->put() is called.
This patch fixes both problems.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In write_partial_msg_pages(), pages need to be kmapped in order to
perform a CRC-32c calculation on them. As an artifact of the way
this code used to be structured, the kunmap() call was separated
from the kmap() call and both were done conditionally. But the
conditions under which the kmap() and kunmap() calls were made
differed, so there was a chance a kunmap() call would be done on a
page that had not been mapped.
The symptom of this was tripping a BUG() in kunmap_high() when
pkmap_count[nr] became 0.
Reported-by: Bryan K. Wright <bryan@virginia.edu> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.
When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work(). During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().
If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner. In this case Ceph's protocol negotiation
can complete succesfully.
The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client. As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read(). If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner. Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.
The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".
Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.
[elder@inktak.com: added note about server-side behavior]
Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).
We drop the lock when calling the ->alloc_msg() con op, which means
we need to (a) not clobber con->in_msg without the mutex held, and (b)
we need to verify that we are still in the OPEN state when we retake
it to avoid causing any mayhem. If the state does change, -EAGAIN
will get us back to con_work() and loop.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This function's calling convention is very limiting. In particular,
we can't return any error other than ENOMEM (and only implicitly),
which is a problem (see next patch).
Instead, return an normal 0 or error code, and make the skip a pointer
output parameter. Drop the useless in_hdr argument (we have the con
pointer).
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it. This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).
Add annotation so that lockdep realizes we will drop the mutex before
returning.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We drop the con mutex when delivering a message. When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Revoke all mon_client messages when we shut down the old connection.
This is mostly moot since we are re-using the same ceph_connection,
but it is cleaner.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.
This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We exponentially back off when we encounter connection errors. If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.
Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node. Fixes ceph issue #2802.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Take the con mutex while we are initiating a ceph open. This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path. The problem
is that a sequence like:
- start reading message
- initialize bio_iter
- read half a message
- messenger fault, reconnect
- restart reading message
- ** bio_iter now non-NULL, not reinitialized **
- read past end of bio, crash
Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The linger op registration (i.e., watch) modifies the object state. As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.
To fix this, always resubmit the linger op with a new tid. We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered. Then the second loop will treat this just like a normal
case of re-registering.
This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: Guanjun He <gjhe@suse.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible to close a socket that is in the OPENING state. For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established. con_work() will come around and shut
down the socket.
Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do not re-initialize the con on every connection attempt. When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.
Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch gathers a few small changes in "net/ceph/messenger.c":
out_msg_pos_next()
- small logic change that mostly affects indentation
write_partial_msg_pages().
- use a local variable trail_off to represent the offset into
a message of the trail portion of the data (if present)
- once we are in the trail portion we will always be there, so we
don't always need to check against our data position
- avoid computing len twice after we've reached the trail
- get rid of the variable tmpcrc, which is not needed
- trail_off and trail_len never change so mark them const
- update some comments
read_partial_message_bio()
- bio_iovec_idx() will never return an error, so don't bother
checking for it
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer. Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun. The
CONNECTING bit/state continues to be set during this phase.
Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states. In other
words, when NEGOTIATING is set, clear CONNECTING. That way only
one of them will be active at a time.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two phases in the process of linking together the two ends
of a ceph connection. The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.
When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.
So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).
Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed. Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.
Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server). But that bit
is not cleared again--at least not until another connection attempt
is initiated.
Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A connection that is closed will no longer be connecting. So
clear the CONNECTING state bit in ceph_con_close(). Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.
Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down. We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.
A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down. Add a comment explaining
the reason for this.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
connection flag bit is set, and if it had not been previously set
queue_con() is called to ensure con_work() will get a chance to
handle the changed state.
con_work() atomically checks--and if set, clears--the SOCK_CLOSED
bit if it was set. This means that even if the bit were set
repeatedly, the related processing in con_work() only gets called
once per transition of the bit from 0 to 1.
What's important then is that we ensure con_work() gets called *at
least* once when a socket close event occurs, not that it gets
called *exactly* once.
The work queue mechanism already takes care of queueing work
only if it is not already queued, so there's no need for us
to call queue_con() conditionally.
So this patch just makes it so the SOCK_CLOSED flag gets set
unconditionally in ceph_sock_state_change().
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.
Changing connection information during handling of a socket event is
not very clean, so instead move this assignment inside con_work(),
where it can be done during normal connection-level processing (and
under protection of the connection mutex as well).
Move the handling of a socket closed event up to the top of the
processing loop in con_work(); there's no point in handling backoff
etc. if we have a newly-closed socket to take care of.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently a bug was fixed in which the bio_iter field in a ceph
message was not being properly re-initialized when a message got
re-transmitted:
commit 43643528cce60ca184fe8197efa8e8da7c89a037
Author: Yan, Zheng <zheng.z.yan@intel.com>
rbd: Clear ceph_msg->bio_iter for retransmitted message
We are now only initializing the bio_iter field when we are about to
start to write message data (in prepare_write_message_data()),
rather than every time we are attempting to write any portion of the
message data (in write_partial_msg_pages()). This means we no
longer need to use the msg->bio_iter field as a flag.
So just don't do that any more. Trust prepare_write_message_data()
to ensure msg->bio_iter is properly initialized, every time we are
about to begin writing (or re-writing) a message's bio data.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a message has a non-null bio pointer, its bio_iter field is
initialized in write_partial_msg_pages() if this has not been done
already. This is really a one-time setup operation for sending a
message's (bio) data, so move that initialization code into
prepare_write_message_data() which serves that purpose.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
flag before the CRC for the data portion (recorded in the footer)
has been completely computed. Hold off setting the complete flag
until we've decided it's ready to send.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In write_partial_msg_pages(), once all the data from a page has been
sent we advance to the next one. Put the code that takes care of
this into its own function.
While modifying write_partial_msg_pages(), make its local variable
"in_trail" be Boolean, and use the local variable "msg" (which is
just the connection's current out_msg pointer) consistently.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once we call ->connect(), we are racing against the actual
connection, and a subsequent transition from CONNECTING ->
CONNECTED. Set the state to CONNECTING before that, under the
protection of the mutex, to avoid the race.
On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It would also overflow the subsequent kmalloc() size, leading to
out-of-bounds write.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass
the check ceph_decode_need(p, end, n * sizeof(u32), bad). It would also
overflow the subsequent kmalloc() size, leading to out-of-bounds write.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`len' is read from network and thus needs validation. Otherwise a
large `len' would cause out-of-bounds access via the memcpy() call.
In addition, len = 0xffffffff would overflow the kmalloc() size,
leading to out-of-bounds write.
This patch adds a check of `len' via ceph_decode_need(). Also use
kstrndup rather than kmalloc/memcpy.
[elder@inktank.com: added -ENOMEM return for null kstrndup() result]
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ceph_con_revoke_message() is passed both a message and a ceph
connection. A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.
Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer. The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field. That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.
Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.
This has the added benefit of precluding the possibility of the
providing the wrong connection pointer. If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op. This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."
Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are essentially two types of ceph messages: incoming and
outgoing. Outgoing messages are always allocated via ceph_msg_new(),
and at the time of their allocation they are not associated with any
particular connection. Incoming messages are always allocated via
ceph_con_in_msg_alloc(), and they are initially associated with the
connection from which incoming data will be placed into the message.
When an outgoing message gets sent, it becomes associated with a
connection and remains that way until the message is successfully
sent. The association of an incoming message goes away at the point
it is sent to an upper layer via a con->ops->dispatch method.
This patch implements reference counting for all ceph messages, such
that every message holds a reference (and a pointer) to a connection
if and only if it is associated with that connection (as described
above).
For background, here is an explanation of the ceph message
lifecycle, emphasizing when an association exists between a message
and a connection.
Outgoing Messages
An outgoing message is "owned" by its allocator, from the time it is
allocated in ceph_msg_new() up to the point it gets queued for
sending in ceph_con_send(). Prior to that point the message's
msg->con pointer is null; at the point it is queued for sending its
message pointer is assigned to refer to the connection. At that
time the message is inserted into a connection's out_queue list.
When a message on the out_queue list has been sent to the socket
layer to be put on the wire, it is transferred out of that list and
into the connection's out_sent list. At that point it is still owned
by the connection, and will remain so until an acknowledgement is
received from the recipient that indicates the message was
successfully transferred. When such an acknowledgement is received
(in process_ack()), the message is removed from its list (in
ceph_msg_remove()), at which point it is no longer associated with
the connection.
So basically, any time a message is on one of a connection's lists,
it is associated with that connection. Reference counting outgoing
messages can thus be done at the points a message is added to the
out_queue (in ceph_con_send()) and the point it is removed from
either its two lists (in ceph_msg_remove())--at which point its
connection pointer becomes null.
Incoming Messages
When an incoming message on a connection is getting read (in
read_partial_message()) and there is no message in con->in_msg,
a new one is allocated using ceph_con_in_msg_alloc(). At that
point the message is associated with the connection. Once that
message has been completely and successfully read, it is passed to
upper layer code using the connection's con->ops->dispatch method.
At that point the association between the message and the connection
no longer exists.
Reference counting of connections for incoming messages can be done
by taking a reference to the connection when the message gets
allocated, and releasing that reference when it gets handed off
using the dispatch method.
We should never fail to get a connection reference for a
message--the since the caller should already hold one.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a ceph message is queued for sending it is placed on a list of
pending messages (ceph_connection->out_queue). When they are
actually sent over the wire, they are moved from that list to
another (ceph_connection->out_sent). When acknowledgement for the
message is received, it is removed from the sent messages list.
During that entire time the message is "in the possession" of a
single ceph connection. Keep track of that connection in the
message. This will be used in the next patch (and is a helpful
bit of information for debugging anyway).
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer. Rename the
function so this implied usage is more clear.
In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for). This allows us
to return what is now provided via the passed-in address of a "skip"
variable. The return type is now Boolean to be explicit that there
are only two possible outcomes.
Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init(). Rearrange the arguments so the connection pointer
is first. Hide the byte-swapping of the peer entity number inside
ceph_con_init()
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All references to the embedded ceph_connection come from the msgr
workqueue, which is drained prior to mon_client destruction. That
means we can ignore con refcounting entirely.
Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A monitor client has a pointer to a ceph connection structure in it.
This is the only one of the three ceph client types that do it this
way; the OSD and MDS clients embed the connection into their main
structures. There is always exactly one ceph connection for a
monitor client, so there is no need to allocate it separate from the
monitor client structure.
So switch the ceph_mon_client structure to embed its
ceph_connection structure.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once a connection is fully initialized, it is really in a CLOSED
state, so make that explicit by setting the bit in its state field.
It is possible for a connection in NEGOTIATING state to get a
failure, leading to ceph_fault() and ultimately ceph_con_close().
Clear that bits if it is set in that case, to reflect that the
connection truly is closed and is no longer participating in a
connect sequence.
Issue a warning if ceph_con_open() is called on a connection that
is not in CLOSED state.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Start explicitly keeping track of the state of a ceph connection's
socket, separate from the state of the connection itself. Create
placeholder functions to encapsulate the state transitions.
Make the socket state an atomic variable, reinforcing that it's a
distinct transtion with no possible "intermediate/both" states.
This is almost certainly overkill at this point, though the
transitions into CONNECTED and CLOSING state do get called via
socket callback (the rest of the transitions occur with the
connection mutex held). We can back out the atomicity later.
A ceph_connection holds a mixture of connection state (as in "state
machine" state) and connection flags in a single "state" field. To
make the distinction more clear, define a new "flags" field and use
it rather than the "state" field to hold Boolean flag values.
A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.
Switch the ceph_client structure to embed its ceph_messenger
structure.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The functions ceph_con_out_kvec_reset() and ceph_con_out_kvec_add()
are entirely private functions, so drop the "ceph_" prefix in their
name to make them slightly more wieldy.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the names of the three socket callback functions to make it
more obvious they're specifically associated with a connection's
socket (not the ceph connection that uses it).
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a device was open at a snapshot, and snapshots were deleted or
added, data from the wrong snapshot could be read. Instead of
assuming the snap context is constant, store the actual snap id when
the device is initialized, and rely on the OSDs to signal an error
if we try reading from a snapshot that was deleted.
A recent change made changes to the rbd_client_list be protected by
a spinlock. Unfortunately in rbd_put_client(), the lock is taken
before possibly dropping the last reference to an rbd_client, and on
the last reference that eventually calls flush_workqueue() which can
sleep.
The problem was flagged by a debug spinlock warning:
BUG: spinlock wrong CPU on CPU#3, rbd/27814
The solution is to move the spinlock acquisition and release inside
rbd_client_release(), which is the spot where it's really needed for
protecting the removal of the rbd_client from the client list.
Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.
Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to flush the msgr workqueue during mon_client shutdown to
ensure that any work affecting our embedded ceph_connection is
finished so that we can be safely destroyed.
Previously, we were flushing the work queue after osd_client
shutdown and before mon_client shutdown to ensure that any osd
connection refs to authorizers are flushed. Remove the redundant
flush, and document in the comment that the mon_client flush is
needed to cover that case as well.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c. This is a bug since those ops aren't defined to
be ceph_con_get/put.
This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.
The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.
[elder@inktank.com: added that last sentence]
Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 88ed6ea0b295f8e2383d599a04027ec596cdf97b)
In ceph_osdc_release_request(), a reference to the r_reply message
is dropped. But just after that, that same message is revoked if it
was in use to receive an incoming reply. Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 680584fab05efff732b5ae16ad601ba994d7b505)
Usually, we are adding pg_temp entries or removing them. Occasionally they
update. In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.
Fix by removing the existing entry before inserting a new one.
Fixes http://tracker.newdream.net/issues/2446
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request(). If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.
Simply check if the request is still already unregistered; if so,
return immediately and do nothing.
Fixes http://tracker.newdream.net/issues/2420
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>