Chanho Min [Tue, 26 Mar 2013 23:26:32 +0000 (10:26 +1100)]
crypto: add lz4 Cryptographic API
Add support for lz4 and lz4hc compression algorithm using the lib/lz4/*
codebase.
Signed-off-by: Chanho Min <chanho.min@lge.com> Cc: "Darrick J. Wong" <djwong@us.ibm.com> Cc: Bob Pearson <rpearson@systemfabricworks.com> Cc: Richard Weinberger <richard@nod.at> Cc: Herbert Xu <herbert@gondor.hengli.com.au> Cc: Yann Collet <yann.collet.73@gmail.com> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chanho Min [Tue, 26 Mar 2013 23:26:31 +0000 (10:26 +1100)]
lib: add lz4 compressor module
This patchset is for supporting LZ4 compression and the crypto API using it.
As shown below, the size of data is a little bit bigger but compressing
speed is faster under the enabled unaligned memory access. We can use lz4
de/compression through crypto API as well. Also, It will be useful for
another potential user of lz4 compression.
lz4 Compression Benchmark:
Compiler: ARM gcc 4.6.4
ARMv7, 1 GHz based board
Kernel: linux 3.4
Uncompressed data Size: 101 MB
Compressed Size compression Speed
LZO 72.1MB 32.1MB/s, 33.0MB/s(UA)
LZ4 75.1MB 30.4MB/s, 35.9MB/s(UA)
LZ4HC 59.8MB 2.4MB/s, 2.5MB/s(UA)
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch:
Add support for LZ4 compression in the Linux Kernel. LZ4 Compression APIs
for kernel are based on LZ4 implementation by Yann Collet and were changed
for kernel coding style.
lz4_compress() support basic lz4 compression whereas lz4hc_compress()
support high compression or CPU performance get lower but compression
ratio get higher. Also, we require the pre-allocated working memory with
the defined size and destination buffer must be allocated with the size of
lz4_compressbound.
Signed-off-by: Chanho Min <chanho.min@lge.com> Cc: "Darrick J. Wong" <djwong@us.ibm.com> Cc: Bob Pearson <rpearson@systemfabricworks.com> Cc: Richard Weinberger <richard@nod.at> Cc: Herbert Xu <herbert@gondor.hengli.com.au> Cc: Yann Collet <yann.collet.73@gmail.com> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s, 52.2MB/s(UA)
LZ4 6.5MB 86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].
But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones? This issue
had been discussed and came to the conclusion [2].
Russell King said that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly
that: replace it.
The benchmark shows that an 8% increase in image size vs a 66% increase in
decompression speed compared to LZO(which has been known as the fastest
decompressor in the Kernel). Therefore the "fast but may not be small"
compression title has clearly been taken by LZ4 [3].
This functionality is needed by any application or package that needs to
reconstruct Linux processes, that is, to start them in any way other than
by means of an "execve()" from an executable file. This includes:
1. Restoring processes from a checkpoint-file (by all potential
user-level checkpointing packages, not only CRIU's).
2. Restarting processes on another node after process migration.
3. Starting duplicated copies of a running process (for reliability
and high-availablity).
4. Starting a process from an executable format that is not supported
by Linux, thus requiring a "manual execve" by a user-level utility.
5. Similarly, starting a process from a networked and/or crypted
executable that, for confidentiality, licensing or other reasons,
may not be written to the local file-systems.
The code that does that was already included in the Linux kernel by the
CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this was
enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE", which is
normally disabled. The patch removes those ifdefs.
Signed-off-by: Amnon Shiloh <u3557@miso.sublimeip.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zhangwei(Jovi) [Tue, 26 Mar 2013 23:26:29 +0000 (10:26 +1100)]
init/Kconfig: make EXPERT as config instead of menuconfig
There don't have any EXPERT menu guard, and no config item is included in
EXPERT menu, so change it as a config, not menu.
This will make user more clear when they use 'make menuconfig'
or 'make nconfig'.
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Tue, 26 Mar 2013 23:26:29 +0000 (10:26 +1100)]
kconfig menu: move Virtualization drivers near other virtualization options
Make virtualization drivers be logically grouped together (physically near
each other) in the kconfig menu by moving "Virtualization drivers" to be
near "Virtio drivers", Microsort Hyper-V, and Xen driver support.
This is just a user-friendly, visual search change.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> CC [M] fs/binfmt_misc.o
> In file included from arch/x86/include/asm/uaccess.h:537:0,
> from include/linux/uaccess.h:5,
> from include/linux/highmem.h:8,
> from include/linux/pagemap.h:10,
> from fs/binfmt_misc.c:27:
> arch/x86/include/asm/uaccess_32.h: In function 'parse_command.part.1':
> arch/x86/include/asm/uaccess_32.h:211:26: warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct [enabled by default]
Hm.. That's because it's part of lib and not obj, right?
Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this option
isn't a problem when using pre 4.4 gcc.
To simplify the rewording, consolidate the text into lib/Kconfig.debug and
modify it there to be more explicit about when you should say N to this
config.
Also, make the text a bit more generic by stating that this option enables
compile time checks so we can cover architectures which emit warnings vs.
ones which emit errors. The details of how an architecture decided to
implement the checks isn't as important as the concept of compile time
checking of copy_from_user() calls.
While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Benjamin LaHaise [Tue, 26 Mar 2013 23:26:28 +0000 (10:26 +1100)]
aio: fix kioctx not being freed after cancellation at exit time
The recent changes overhauling fs/aio.c introduced a bug that results in the
kioctx not being freed when outstanding kiocbs are cancelled at exit_aio()
time. Specifically, a kiocb that is cancelled has its completion events
discarded by batch_complete_aio(), which then fails to wake up the process
stuck in free_ioctx(). Fix this by removing the event suppression in
batch_complete_aio() and modify the wait_event() condition in free_ioctx()
appropriately.
This patch was tested with the cancel operation in the thread based code
posted yesterday.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Zach Brown <zab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:27 +0000 (10:26 +1100)]
mtip32xx: convert to batch completion
[asamymuthupa@micron.com:
* changes for conversion to bio batch completion from Kent
* fix to apply the above changes cleanly on latest mtip32xx code
* batch bio completion changes in
* mtip_command_cleanup()
* mtip_timeout_function()
* mtip_handle_tfe()]
Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WARNING: line over 80 characters
#155: FILE: block/blk-core.c:2527:
+ unsigned int nr_bytes, unsigned int bidi_bytes,
WARNING: line over 80 characters
#177: FILE: block/blk-core.c:2643:
+void blk_end_request_all_batch(struct request *rq, int error, struct batch_complete *batch)
WARNING: line over 80 characters
#186: FILE: block/blk-core.c:2651:
+ pending = __blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes, batch);
WARNING: line over 80 characters
#633: FILE: fs/direct-io.c:233:
+static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is_async,
WARNING: line over 80 characters
#652: FILE: fs/direct-io.c:278:
+static void dio_bio_end_aio(struct bio *bio, int error, struct batch_complete *batch)
total: 0 errors, 5 warnings, 740 lines checked
./patches/block-aio-batch-completion-for-bios-kiocbs.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Kent Overstreet <koverstreet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:26 +0000 (10:26 +1100)]
block, aio: batch completion for bios/kiocbs
When completing a kiocb, there's some fixed overhead from touching the
kioctx's ring buffer the kiocb belongs to. Some newer high end block
devices can complete multiple IOs per interrupt, much like many network
interfaces have been for some time.
This plumbs through infrastructure so we can take advantage of multiple
completions at the interrupt level, and complete multiple kiocbs at the
same time.
Drivers have to be converted to take advantage of this, but it's a simple
change and the next patches will convert a few drivers.
To use it, an interrupt handler (or any code that completes bios or
requests) declares and initializes a struct batch_complete:
WARNING: line over 80 characters
#116: FILE: drivers/block/drbd/drbd_bitmap.c:940:
+static void bm_async_io_complete(struct bio *bio, int error, struct batch_complete *batch)
WARNING: line over 80 characters
#128: FILE: drivers/block/drbd/drbd_worker.c:67:
+void drbd_md_io_complete(struct bio *bio, int error, struct batch_complete *batch)
WARNING: line over 80 characters
#137: FILE: drivers/block/drbd/drbd_worker.c:169:
+void drbd_peer_request_endio(struct bio *bio, int error, struct batch_complete *batch)
WARNING: line over 80 characters
#146: FILE: drivers/block/drbd/drbd_worker.c:205:
+void drbd_request_endio(struct bio *bio, int error, struct batch_complete *batch)
ERROR: "foo * bar" should be "foo *bar"
#562: FILE: drivers/md/raid5.c:1743:
+static void raid5_end_read_request(struct bio * bi, int error,
WARNING: line over 80 characters
#946: FILE: fs/btrfs/volumes.c:5021:
+static void btrfs_end_bio(struct bio *bio, int err, struct batch_complete *batch)
Kent Overstreet [Tue, 26 Mar 2013 23:26:24 +0000 (10:26 +1100)]
aio: kill ki_retry
Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
need this anymore - ki_retry was only called right after the kiocb was
initialized.
This also refactors and trims some duplicated code, as well as cleaning up
the refcounting/error handling a bit.
[akpm@linux-foundation.org: use fmode_t in aio_run_iocb()] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Tue, 26 Mar 2013 23:26:23 +0000 (10:26 +1100)]
aio-dont-include-aioh-in-schedh-fix
fs/bio.c: In function 'bio_alloc_bioset':
fs/bio.c:394: error: 'UIO_MAXIOV' undeclared (first use in this function)
fs/bio.c:394: error: (Each undeclared identifier is reported only once
fs/bio.c:394: error: for each function it appears in.)
fs/bio.c: In function 'bio_alloc_map_data':
fs/bio.c:963: error: 'UIO_MAXIOV' undeclared (first use in this function)
make[1]: *** [fs/bio.o] Error 1
make: *** [fs/bio.o] Error 2
Cc: Kent Overstreet <koverstreet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:23 +0000 (10:26 +1100)]
aio: use xchg() instead of completion_lock
So, for sticking kiocb completions on the kioctx ringbuffer, we need a
lock - it unfortunately can't be lockless.
When the kioctx is shared between threads on different cpus and the rate
of completions is high, this lock sees quite a bit of contention - in
terms of cacheline contention it's the hottest thing in the aio subsystem.
That means, with a regular spinlock, we're going to take a cache miss to
grab the lock, then another cache miss when we touch the data the lock
protects - if it's on the same cacheline as the lock, other cpus spinning
on the lock are going to be pulling it out from under us as we're using
it.
So, we use an old trick to get rid of this second forced cache miss - make
the data the lock protects be the lock itself, so we grab them both at
once.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:22 +0000 (10:26 +1100)]
generic dynamic per cpu refcounting
This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test(), that starts out as just an atomic_t
but dynamically switches to per cpu refcounting when the rate of gets/puts
becomes too high.
It also implements two stage shutdown, as we need it to tear down the
percpu counts. Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate barriers
(synchronize_rcu()).
It's also legal to call percpu_ref_kill() multiple times - it only returns
true once, so callers don't have to reimplement shutdown synchronization.
For the sake of simplicity/efficiency, the heuristic is pretty simple - it
just switches to percpu refcounting if there are more than x gets in one
second (completely arbitrarily, 4096).
It'd be more correct to count the number of cache misses or something else
more profile driven, but doing so would require accessing the shared ref
twice per get - by just counting the number of gets(), we can stick that
counter in the high bits of the refcount and increment both with a single
atomic64_add(). But I expect this'll be good enough in practice.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:22 +0000 (10:26 +1100)]
aio: percpu reqs_available
See the previous patch ("aio: reqs_active -> reqs_available") for why we
want to do this - this basically implements a per cpu allocator for
reqs_available that doesn't actually allocate anything.
Note that we need to increase the size of the ringbuffer we allocate,
since a single thread won't necessarily be able to use all the
reqs_available slots - some (up to about half) might be on other per cpu
lists, unavailable for the current thread.
We size the ringbuffer based on the nr_events userspace passed to
io_setup(), so this is a slight behaviour change - but nr_events wasn't
being used as a hard limit before, it was being rounded up to the next
page before so this doesn't change the actual semantics.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:21 +0000 (10:26 +1100)]
aio: reqs_active -> reqs_available
The number of outstanding kiocbs is one of the few shared things left that
has to be touched for every kiocb - it'd be nice to make it percpu.
We can make it per cpu by treating it like an allocation problem: we have
a maximum number of kiocbs that can be outstanding (i.e. slots) - then we
just allocate and free slots, and we know how to write per cpu allocators.
So as prep work for that, we convert reqs_active to reqs_available.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:21 +0000 (10:26 +1100)]
aio: give shared kioctx fields their own cachelines
[akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:21 +0000 (10:26 +1100)]
aio: kill struct aio_ring_info
struct aio_ring_info was kind of odd, the only place it's used is where
it's embedded in struct kioctx - there's no real need for it.
The next patch rearranges struct kioctx and puts various things on their
own cachelines - getting rid of struct aio_ring_info now makes that
reordering a bit clearer.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:21 +0000 (10:26 +1100)]
aio: kill batch allocation
Previously, allocating a kiocb required touching quite a few global (well,
per kioctx) cachelines... so batching up allocation to amortize those was
worthwhile. But we've gotten rid of some of those, and in another couple
of patches kiocb allocation won't require writing to any shared
cachelines, so that means we can just rip this code out.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:20 +0000 (10:26 +1100)]
aio: change reqs_active to include unreaped completions
The aio code tries really hard to avoid having to deal with the completion
ringbuffer overflowing. To do that, it has to keep track of the number of
outstanding kiocbs, and the number of completions currently in the
ringbuffer - and it's got to check that every time we allocate a kiocb.
Ouch.
But - we can improve this quite a bit if we just change reqs_active to
mean "number of outstanding requests and unreaped completions" - that
means kiocb allocation doesn't have to look at the ringbuffer, which is a
fairly significant win.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:20 +0000 (10:26 +1100)]
aio: use cancellation list lazily
Cancelling kiocbs requires adding them to a per kioctx linked list, which
is one of the few things we need to take the kioctx lock for in the fast
path. But most kiocbs can't be cancelled - so if we just do this lazily,
we can avoid quite a bit of locking overhead.
While we're at it, instead of using a flag bit switch to using ki_cancel
itself to indicate that a kiocb has been cancelled/completed. This lets
us get rid of ki_flags entirely.
[akpm@linux-foundation.org: remove buggy BUG()] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:19 +0000 (10:26 +1100)]
aio: make aio_read_evt() more efficient, convert to hrtimers
Previously, aio_read_event() pulled a single completion off the ringbuffer
at a time, locking and unlocking each time. Change it to pull off as many
events as it can at a time, and copy them directly to userspace.
This also fixes a bug where if copying the event to userspace failed,
we'd lose the event.
Also convert it to wait_event_interruptible_hrtimeout(), which
simplifies it quite a bit.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:19 +0000 (10:26 +1100)]
wait: add wait_event_hrtimeout()
Analagous to wait_event_timeout() and friends, this adds
wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().
Note that unlike the versions that use regular timers, these don't return
the amount of time remaining when they return - instead, they return 0 or
-ETIME if they timed out. because I was uncomfortable with the semantics
of doing it the other way (that I could get it right, anyways).
If the timer expires, there's no real guarantee that expire_time -
current_time would be <= 0 - due to timer slack certainly, and I'm not
sure I want to know the implications of the different clock bases in
hrtimers.
If the timer does expire and the code calculates that the time remaining
is nonnegative, that could be even worse if the calling code then reuses
that timeout. Probably safer to just return 0 then, but I could imagine
weird bugs or at least unintended behaviour arising from that too.
I came to the conclusion that if other users end up actually needing the
amount of time remaining, the sanest thing to do would be to create a
version that uses absolute timeouts instead of relative.
[akpm@linux-foundation.org: fix description of `timeout' arg] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:18 +0000 (10:26 +1100)]
aio: refcounting cleanup
The usage of ctx->dead was fubar - it makes no sense to explicitly check
it all over the place, especially when we're already using RCU.
Now, ctx->dead only indicates whether we've dropped the initial
refcount. The new teardown sequence is:
set ctx->dead
hlist_del_rcu();
synchronize_rcu();
Now we know no system calls can take a new ref, and it's safe to drop
the initial ref:
put_ioctx();
We also need to ensure there are no more outstanding kiocbs. This was
done incorrectly - it was being done in kill_ctx(), and before dropping
the initial refcount. At this point, other syscalls may still be
submitting kiocbs!
Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
kioctx->users has dropped to 0 and we know no more iocbs could be
submitted.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:18 +0000 (10:26 +1100)]
aio: make aio_put_req() lockless
Freeing a kiocb needed to touch the kioctx for three things:
* Pull it off the reqs_active list
* Decrementing reqs_active
* Issuing a wakeup, if the kioctx was in the process of being freed.
This patch moves these to aio_complete(), for a couple reasons:
* aio_complete() already has to issue the wakeup, so if we drop the
kioctx refcount before aio_complete does its wakeup we don't have to
do it twice.
* aio_complete currently has to take the kioctx lock, so it makes sense
for it to pull the kiocb off the reqs_active list too.
* A later patch is going to change reqs_active to include unreaped
completions - this will mean allocating a kiocb doesn't have to look
at the ringbuffer. So taking the decrement of reqs_active out of
kiocb_free() is useful prep work for that patch.
This doesn't really affect cancellation, since existing (usb) code that
implements a cancel function still calls aio_complete() - we just have
to make sure that aio_complete does the necessary teardown for cancelled
kiocbs.
It does affect code paths where we free kiocbs that were never
submitted; they need to decrement reqs_active and pull the kiocb off the
reqs_active list. This occurs in two places: kiocb_batch_free(), which
is going away in a later patch, and the error path in io_submit_one.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kent Overstreet [Tue, 26 Mar 2013 23:26:17 +0000 (10:26 +1100)]
aio: do fget() after aio_get_req()
aio_get_req() will fail if we have the maximum number of requests
outstanding, which depending on the application may not be uncommon. So
avoid doing an unnecessary fget().
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WARNING: braces {} are not necessary for any arm of this statement
#527: FILE: fs/aio.c:1289:
+ if (unlikely(kiocbIsCancelled(req))) {
[...]
+ } else {
[...]
WARNING: line over 80 characters
#543: FILE: fs/aio.c:1300:
+ ret == -ERESTARTNOHAND || ret == -ERESTART_RESTARTBLOCK))
total: 0 errors, 3 warnings, 628 lines checked
./patches/aio-remove-retry-based-aio.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Kent Overstreet <koverstreet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zach Brown [Tue, 26 Mar 2013 23:26:15 +0000 (10:26 +1100)]
aio: remove retry-based AIO
This removes the retry-based AIO infrastructure now that nothing in tree
is using it.
We want to remove retry-based AIO because it is fundemantally unsafe. It
retries IO submission from a kernel thread that has only assumed the mm of
the submitting task. All other task_struct references in the IO
submission path will see the kernel thread, not the submitting task. This
design flaw means that nothing of any meaningful complexity can use
retry-based AIO.
This removes all the code and data associated with the retry machinery.
The most significant benefit of this is the removal of the locking around
the unused run list in the submission path.
This has only been compiled.
Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Zach Brown <zab@redhat.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zach Brown [Tue, 26 Mar 2013 23:26:15 +0000 (10:26 +1100)]
gadget: remove only user of aio retry
This removes the only in-tree user of aio retry. This will let us remove
the retry code from the aio core.
Removing retry is relatively easy as the USB gadget wasn't using it to
retry IOs at all. It always fully submitted the IO in the context of the
initial io_submit() call. It only used the AIO retry facility to get the
submitter's mm context for copying the result of a read back to user
space. This is easy to implement with use_mm() and a work struct, much
like kvm does with async_pf_execute() for get_user_pages().
Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:12 +0000 (10:26 +1100)]
net: rename random32 to prandom
Commit 496f2f93b1cc286f5a4f4f9acdc1e5314978683f ("random32: rename
random32 to prandom") renamed random32() and srandom32() to prandom_u32()
and prandom_seed() respectively.
net_random() and net_srandom() need to be redefined with prandom_* in
order to finish the naming transition.
While I'm at it, enclose macro argument of net_srandom() with parenthesis.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:11 +0000 (10:26 +1100)]
net/core: remove duplicate statements by do-while loop
Remove duplicate statements by using do-while loop instead of while loop.
- A;
- while (e) {
+ do {
A;
- }
+ } while (e);
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:11 +0000 (10:26 +1100)]
net/core: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:11 +0000 (10:26 +1100)]
net/netfilter: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:10 +0000 (10:26 +1100)]
net/sched: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:09 +0000 (10:26 +1100)]
scsi: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Robert Love <robert.w.love@intel.com> Cc: James Smart <james.smart@emulex.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:09 +0000 (10:26 +1100)]
lguest: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:08 +0000 (10:26 +1100)]
infiniband: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:06 +0000 (10:26 +1100)]
x86: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:06 +0000 (10:26 +1100)]
x86: pageattr-test: remove srandom32 call
pageattr-test calls srandom32() once every test iteration. But calling
srandom32() after late_initcalls is not meaningfull. Because the random
states for random32() is mixed by good random numbers in late_initcall
prandom_reseed().
So this removes the call to srandom32().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Tue, 26 Mar 2013 23:26:05 +0000 (10:26 +1100)]
raid6test: use prandom_bytes()
Use prandom_bytes() to generate random bytes for test data.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dan Williams <djbw@fb.com> Cc: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel/pid.c: improve flow of a loop inside alloc_pidmap.
find_next_offset() searches for an available "cleaned bit" in the
respective pid bitmap (page), so returns the offset if found, otherwise it
returns a value equals to BITS_PER_PAGE.
For example, suppose find_next_offset didn't find any available bit, so
there's no purpose to call mk_pid (Wasteful Cpu Cycles).
Therefore, I found it could be better to call mk_pid after the checking
(offset < BITS_PER_PAGE) returned sucessfully! Another point: If (offset
< BITS_PER_PAGE) results in a "failure", then mk_pid would be called again
afterwards.
Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Manfred Spraul [Tue, 26 Mar 2013 23:26:03 +0000 (10:26 +1100)]
ipc/sem.c: alternatives to preempt_disable()
ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.
The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one
My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks. Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Tue, 26 Mar 2013 23:26:03 +0000 (10:26 +1100)]
ipc, sem: prevent possible deadlock
In semctl_main(), when cmd == GETALL, we're locking sma->sem_perm.lock
(through sem_lock_and_putref), yet after the conditional, we lock it
again. Unlock sma right after exiting the conditional.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rik van Riel [Tue, 26 Mar 2013 23:26:03 +0000 (10:26 +1100)]
fix for sem_lock
Fix a typo in sem_lock. Of course we need to unlock the local
semaphore lock before jumping to lock_all, in the rare case that
somebody started a complex operation while we were spinning on
the spinlock.
Can be folded into patch 7/7 before merging
Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rik van Riel [Tue, 26 Mar 2013 23:26:03 +0000 (10:26 +1100)]
ipc,sem: fine grained locking for semtimedop
Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.
If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.
If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.
On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:
Rik van Riel [Tue, 26 Mar 2013 23:26:02 +0000 (10:26 +1100)]
ipc,sem: have only one list in struct sem_queue
Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.
Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>