David S. Miller [Wed, 19 Dec 2012 23:19:11 +0000 (15:19 -0800)]
sparc64: Fix unrolled AES 256-bit key loops.
The basic scheme of the block mode assembler is that we start by
enabling the FPU, loading the key into the floating point registers,
then iterate calling the encrypt/decrypt routine for each block.
For the 256-bit key cases, we run short on registers in the unrolled
loops.
So the {ENCRYPT,DECRYPT}_256_2() macros reload the key registers that
get clobbered.
The unrolled macros, {ENCRYPT,DECRYPT}_256(), are not mindful of this.
So if we have a mix of multi-block and single-block calls, the
single-block unrolled 256-bit encrypt/decrypt can run with some
of the key registers clobbered.
Handle this by always explicitly loading those registers before using
the non-unrolled 256-bit macro.
This was discovered thanks to all of the new test cases added by
Jussi Kivilinna.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 00:06:16 +0000 (16:06 -0800)]
sparc64: Define pte_accessible()
We can elide flush_tlb_*() calls when _PAGE_VALID is clear
as that is the test used to determine whether or not to
queue up a TLB flush in set_pte_at().
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Kleikamp [Mon, 17 Dec 2012 17:52:47 +0000 (11:52 -0600)]
sparc: huge_ptep_set_* functions need to call set_huge_pte_at()
Modifying the huge pte's requires that all the underlying pte's be
modified.
Version 2: added missing flush_tlb_page()
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Dec 2012 20:46:37 +0000 (12:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull second round of input updates from Dmitry Torokhov:
"As usual, there are a couple of new drivers, input core now supports
managed input devices (devres), a slew of drivers now have device tree
support and a bunch of fixes and cleanups."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (71 commits)
Input: walkera0701 - fix crash on startup
Input: matrix-keymap - provide a proper module license
Input: gpio_keys_polled - switch to using gpio_request_one()
Input: gpio_keys - switch to using gpio_request_one()
Input: wacom - fix touch support for Bamboo Fun CTH-461
Input: xpad - add a few new VID/PID combinations
Input: xpad - minor formatting fixes
Input: gpio-keys-polled - honor 'autorepeat' setting in platform data
Input: tca8418-keypad - switch to using managed resources
Input: tca8418_keypad - increase severity of failures in probe()
Input: tca8418_keypad - move device ID tables closer to where they are used
Input: tca8418_keypad - use dev_get_platdata() to retrieve platform data
Input: tca8418_keypad - use a temporary variable for parent device
Input: tca8418_keypad - add support for shared interrupt
Input: tca8418_keypad - add support for device tree bindings
Input: remove Compaq iPAQ H3600 (Bitsy) touchscreen driver
Input: bu21013_ts - add support for Device Tree booting
Input: bu21013_ts - move GPIO init and exit functions into the driver
Input: bu21013_ts - request regulator that actually exists
ARM: ux500: Strip out duplicate touch screen platform information
...
Linus Torvalds [Tue, 18 Dec 2012 20:34:29 +0000 (12:34 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull powertool update from Len Brown:
"This updates the tree w/ the latest version of turbostat, which
reports temperature and - on SNB and later - Watts."
Fix up semantic merge conflict as per Len.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools: Allow tools to be installed in a user specified location
tools/power: turbostat: make Makefile a bit more capable
tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
tools/power turbostat: v3.0: monitor Watts and Temperature
tools/power turbostat: fix output buffering issue
tools/power turbostat: prevent infinite loop on migration error path
x86 power: define RAPL MSRs
tools/power/x86/turbostat: share kernel MSR #defines
Linus Torvalds [Tue, 18 Dec 2012 20:28:39 +0000 (12:28 -0800)]
Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull minor tracing updates and fixes from Steven Rostedt:
"It seems that one of my old pull requests have slipped through.
The changes are contained to just the files that I maintain, and are
changes from others that I told I would get into this merge window.
They have already been in linux-next for several weeks, and should be
well tested."
* 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read
tracing: Remove unneeded checks from the stack tracer
tracing: Add a resize function to make one buffer equivalent to another buffer
Linus Torvalds [Tue, 18 Dec 2012 20:26:54 +0000 (12:26 -0800)]
Merge tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bugfixes from Konrad Rzeszutek Wilk:
"Two fixes. One of them is caused by the recent change introduced by
the 'x86-bsp-hotplug-for-linus' tip tree that inhibited bootup (old
function does not do what it used to do). The other one is just a
vanilla bug.
- Fix to bootup regression introduced by 'x86-bsp-hotplug-for-linus'
tip branch.
- Fix to vcpu hotplug code."
* tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/vcpu: Fix vcpu restore path.
xen: Add EVTCHNOP_reset in Xen interface header files.
xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
Linus Torvalds [Tue, 18 Dec 2012 18:56:07 +0000 (10:56 -0800)]
Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull SLAB changes from Pekka Enberg:
"This contains preparational work from Christoph Lameter and Glauber
Costa for SLAB memcg and cleanups and improvements from Ezequiel
Garcia and Joonsoo Kim.
Please note that the SLOB cleanup commit from Arnd Bergmann already
appears in your tree but I had also merged it myself which is why it
shows up in the shortlog."
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm/sl[aou]b: Common alignment code
slab: Use the new create_boot_cache function to simplify bootstrap
slub: Use statically allocated kmem_cache boot structure for bootstrap
mm, sl[au]b: create common functions for boot slab creation
slab: Simplify bootstrap
slub: Use correct cpu_slab on dead cpu
mm: fix slab.c kernel-doc warnings
mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN
slab: Ignore internal flags in cache creation
mm/slob: Use free_page instead of put_page for page-size kmalloc allocations
mm/sl[aou]b: Move common kmem_cache_size() to slab.h
mm/slob: Use object_size field in kmem_cache_size()
mm/slob: Drop usage of page->private for storing page-sized allocations
slub: Commonize slab_cache field in struct page
sl[au]b: Process slabinfo_show in common code
mm/sl[au]b: Move print_slabinfo_header to slab_common.c
mm/sl[au]b: Move slabinfo processing to slab_common.c
slub: remove one code path and reduce lock contention in __slab_free()
Linus Torvalds [Tue, 18 Dec 2012 18:55:28 +0000 (10:55 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull (again) user namespace infrastructure changes from Eric Biederman:
"Those bugs, those darn embarrasing bugs just want don't want to get
fixed.
Linus I just updated my mirror of your kernel.org tree and it appears
you successfully pulled everything except the last 4 commits that fix
those embarrasing bugs.
When you get a chance can you please repull my branch"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Fix typo in description of the limitation of userns_install
userns: Add a more complete capability subset test to commit_creds
userns: Require CAP_SYS_ADMIN for most uses of setns.
Fix cap_capable to only allow owners in the parent user namespace to have caps.
Linus Torvalds [Tue, 18 Dec 2012 18:10:36 +0000 (10:10 -0800)]
Merge tag 'disintegrate-alpha-20121217' of git://git.infradead.org/users/dhowells/linux-headers
Pull UAPI disintegration for Alpha from David Howells:
"I've been asked to send the Alpha UAPI disintegration to you directly.
The acks I have been given have been added into the patch."
* tag 'disintegrate-alpha-20121217' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: (Scripted) Disintegrate arch/alpha/include/asm
Linus Torvalds [Tue, 18 Dec 2012 18:08:47 +0000 (10:08 -0800)]
Merge tag 'for-3.8' of git://openrisc.net/~jonas/linux
Pull OpenRISC update from Jonas Bonn:
"Trivial cleanups for OpenRISC."
* tag 'for-3.8' of git://openrisc.net/~jonas/linux:
openrisc: use kbuild.h instead of defining macros in asm-offset.c
openrisc: Use Kbuild infrastructure for kvm_para.h
Linus Torvalds [Tue, 18 Dec 2012 18:06:00 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 update #2 from Martin Schwidefsky:
"The main patch is the function measurement blocks extension for PCI to
do performance statistics and help with debugging. The other patch is
a small cleanup in ccwdev.h."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ccwdev: Include asm/schid.h.
s390/pci: performance statistics and debug infrastructure
Linus Torvalds [Tue, 18 Dec 2012 17:58:09 +0000 (09:58 -0800)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
"The main highlight is probably some base POWER8 support. There's more
to come such as transactional memory support but that will wait for
the next one.
Overall it's pretty quiet, or rather I've been pretty poor at picking
things up from patchwork and reviewing them this time around and Kumar
no better on the FSL side it seems..."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits)
powerpc+of: Rename and fix OF reconfig notifier error inject module
powerpc: mpc5200: Add a3m071 board support
powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
powerpc/mpc52xx: use module_platform_driver macro
powerpc+of: Export of_reconfig_notifier_[register,unregister]
powerpc/dma/raidengine: add raidengine device
powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
powerpc/mpc85xx: Change spin table to cached memory
powerpc/fsl-pci: Add PCI controller ATMU PM support
powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS]
powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
powerpc: Disable relocation on exceptions when kexecing
powerpc: Enable relocation on during exceptions at boot
powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
powerpc: Add wrappers to enable/disable relocation on exceptions
powerpc: Add set_mode hcall
powerpc: Setup relocation on exceptions for bare metal systems
powerpc: Move initial mfspr LPCR out of __init_LPCR
powerpc: Add relocation on exception vector handlers
...
David Rientjes [Tue, 18 Dec 2012 08:03:23 +0000 (00:03 -0800)]
x86, paravirt: fix build error when thp is disabled
With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks
because set_pmd_at() is undeclared:
mm/memory.c: In function 'do_pmd_numa_page':
mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at'
mm/mprotect.c: In function 'change_pmd_protnuma':
mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at'
This is because paravirt defines set_pmd_at() only when
CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The
fix is to define it for all CONFIG_PARAVIRT configurations.
Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 18 Dec 2012 17:44:44 +0000 (09:44 -0800)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull exofs changes from Boaz Harrosh:
"These are just 3 patches, the last two are bug fixes on the error
paths in exofs.
The important patch is the one to osd_uld which adds sysfs info to osd
devices for use by user-mode clustering discovery software. I'm
already sitting on this patch since before February this year, It is
important for some of the big installation cluster systems, who's been
compiling their own kernel just for that patch."
Ugh. The osd_uld patch already went through the SCSI tree, so this was
kind of pointless. But at least it has the two small error-path fixes..
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: don't leak io_state and pages on read error
exofs: clean up the correct page collection on write error
osduld: Add osdname & systemid sysfs at scsi_osd class
Linus Torvalds [Tue, 18 Dec 2012 17:42:05 +0000 (09:42 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
"A big set of fixes and features.
In terms of line count, most of the code comes from Stefan, who added
the ability to replace a single drive in place. This is different
from how btrfs normally replaces drives, and is much much much faster.
Josef is plowing through our synchronous write performance. This pull
request does not include the DIO_OWN_WAITING patch that was discussed
on the list, but it has a number of other improvements to cut down our
latencies and CPU time during fsync/O_DIRECT writes.
Miao Xie has a big series of fixes and is spreading out ordered
operations over more CPUs. This improves performance and reduces
contention.
I've put in fixes for error handling around hash collisions. These
are going back to individual stable kernels as I test against them.
Otherwise we have a lot of fixes and cleanups, thanks everyone!
raid5/6 is being rebased against the device replacement code. I'll
have it posted this Friday along with a nice series of benchmarks."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (115 commits)
Btrfs: fix a bug of per-file nocow
Btrfs: fix hash overflow handling
Btrfs: don't take inode delalloc mutex if we're a free space inode
Btrfs: fix autodefrag and umount lockup
Btrfs: fix permissions of empty files not affected by umask
Btrfs: put raid properties into global table
Btrfs: fix BUG() in scrub when first superblock reading gives EIO
Btrfs: do not call file_update_time in aio_write
Btrfs: only unlock and relock if we have to
Btrfs: use tokens where we can in the tree log
Btrfs: optimize leaf_space_used
Btrfs: don't memset new tokens
Btrfs: only clear dirty on the buffer if it is marked as dirty
Btrfs: move checks in set_page_dirty under DEBUG
Btrfs: log changed inodes based on the extent map tree
Btrfs: add path->really_keep_locks
Btrfs: do not mark ems as prealloc if we are writing to them
Btrfs: keep track of the extents original block length
Btrfs: inline csums if we're fsyncing
Btrfs: don't bother copying if we're only logging the inode
...
Linus Torvalds [Tue, 18 Dec 2012 17:36:34 +0000 (09:36 -0800)]
Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Features include:
- Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client
code. Remove altogether where possible, and replace with
WARN_ON_ONCE and appropriate error returns where not.
- NFSv4.1 client adds session dynamic slot table management. There
is matching server side code that has been submitted to Bruce for
consideration.
Together, this code allows the server to dynamically manage the
amount of memory it allocates to the duplicate request cache for
each client. It will constantly resize those caches to reserve
more memory for clients that are hot while shrinking caches for
those that are quiescent.
In addition, there are assorted bugfixes for the generic NFS write
code, fixes to deal with the drop_nlink() warnings, and yet another
fix for NFSv4 getacl."
* tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits)
SUNRPC: continue run over clients list on PipeFS event instead of break
NFS: Don't use SetPageError in the NFS writeback code
SUNRPC: variable 'svsk' is unused in function bc_send_request
SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
NFSv4.1: Deal effectively with interrupted RPC calls.
NFSv4.1: Move the RPC timestamp out of the slot.
NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
NFS: Fix calls to drop_nlink()
NFS: Ensure that we always drop inodes that have been marked as stale
nfs: Remove unused list nfs4_clientid_list
nfs: Remove duplicate function declaration in internal.h
NFS: avoid NULL dereference in nfs_destroy_server
SUNRPC handle EKEYEXPIRED in call_refreshresult
SUNRPC set gss gc_expiry to full lifetime
nfs: fix page dirtying in NFS DIO read codepath
nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
NFSv4.1: Be conservative about the client highest slotid
NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
nfs: don't extend writes to cover entire page if pagecache is invalid
...
Linus Torvalds [Tue, 18 Dec 2012 17:32:44 +0000 (09:32 -0800)]
Merge tag 'md-3.8' of git://neil.brown.name/md
Pull md update from Neil Brown:
"Mostly just little fixes. Probably biggest part is AVX accelerated
RAID6 calculations."
* tag 'md-3.8' of git://neil.brown.name/md:
md/raid5: add blktrace calls
md/raid5: use async_tx_quiesce() instead of open-coding it.
md: Use ->curr_resync as last completed request when cleanly aborting resync.
lib/raid6: build proper files on corresponding arch
lib/raid6: Add AVX2 optimized gen_syndrome functions
lib/raid6: Add AVX2 optimized recovery functions
md: Update checkpoint of resync/recovery based on time.
md:Add place to update ->recovery_cp.
md.c: re-indent various 'switch' statements.
md: close race between removing and adding a device.
md: removed unused variable in calc_sb_1_csm.
Linus Torvalds [Tue, 18 Dec 2012 04:58:12 +0000 (20:58 -0800)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc patches from Andrew Morton:
"Incoming:
- lots of misc stuff
- backlight tree updates
- lib/ updates
- Oleg's percpu-rwsem changes
- checkpatch
- rtc
- aoe
- more checkpoint/restart support
I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc/<pid>/fdinfo/<fd> output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...
CC drivers/firmware/efivars.o
drivers/firmware/efivars.c: In function ‘efivarfs_get_inode’:
drivers/firmware/efivars.c:886:31: error: incompatible types when assigning to type ‘kgid_t’ from type ‘int’
make[2]: *** [drivers/firmware/efivars.o] Error 1
make[1]: *** [drivers/firmware/efivars.o] Error 2
Fix the build error by removing the duplicate initialization of i_uid and
i_gid inode_init_always has already initialized them to 0.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Mon, 10 Dec 2012 08:50:57 +0000 (19:50 +1100)]
mm,numa: fix update_mmu_cache_pmd call
This build error is currently hidden by the fact that the x86
implementation of 'update_mmu_cache_pmd()' is a macro that doesn't use
its last argument, but commit b32967ff101a ("mm: numa: Add THP migration
for the NUMA working set scanning fault case") introduced a call with
the wrong third argument.
In the akpm tree, it causes this build error:
mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
mm/migrate.c:1666:2: error: incompatible type for argument 3 of 'update_mmu_cache_pmd'
arch/x86/include/asm/pgtable.h:792:20: note: expected 'struct pmd_t *' but argument is of type 'pmd_t'
Fix it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
Git commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da
("x86, hotplug: Support functions for CPU0 online/offline") alters what
the call to smp_store_cpu_info() does. For BSP we should use the
smp_store_boot_cpu_info() and for secondary CPU's the old
variant of smp_store_cpu_info() should be used. This fixes
the regression introduced by said commit.
Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cyrill Gorcunov [Tue, 18 Dec 2012 00:05:16 +0000 (16:05 -0800)]
fs, fanotify: add @mflags field to fanotify output
The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from
fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags)
field so the user-space reader will be able to detect if such bit were
used on mark creation procedure.
Cyrill Gorcunov [Tue, 18 Dec 2012 00:05:06 +0000 (16:05 -0800)]
fs, exportfs: escape nil dereference if no s_export_op present
This routine will be used to generate a file handle in fdinfo output for
inotify subsystem, where if no s_export_op present the general
export_encode_fh should be used. Thus add a test if s_export_op present
inside exportfs_encode_fh itself.
Cyrill Gorcunov [Tue, 18 Dec 2012 00:04:55 +0000 (16:04 -0800)]
procfs: add ability to plug in auxiliary fdinfo providers
This patch brings ability to print out auxiliary data associated with
file in procfs interface /proc/pid/fdinfo/fd.
In particular further patches make eventfd, evenpoll, signalfd and
fsnotify to print additional information complete enough to restore
these objects after checkpoint.
To simplify the code we add show_fdinfo callback inside struct
file_operations (as Al and Pavel are proposing).
Dave Jones [Tue, 18 Dec 2012 00:04:52 +0000 (16:04 -0800)]
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
I was curious why sys_kcmp wasn't working, which led me to the testcase.
It turned out I hadn't enabled CHECKPOINT_RESTORE in the kernel I was
testing. Add a decoding of errno to the testcase to make that obvious.
Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Tue, 18 Dec 2012 00:04:50 +0000 (16:04 -0800)]
breakpoint selftests: print failure status instead of cause make error
In case breakpoint test exit non zero value it will cause make error.
Better way is just print the test failure status.
Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Tue, 18 Dec 2012 00:04:49 +0000 (16:04 -0800)]
kcmp selftests: print fail status instead of cause make error
In case kcmp_test exit non zero value it will cause make error.
Better way is just print the test failure status.
Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Tue, 18 Dec 2012 00:04:47 +0000 (16:04 -0800)]
kcmp selftests: make run_tests fix
make run_tests need the target is run_tests instead of run-tests
Also gcc output should be kcmp_test. Fix these two issues.
Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After applying the patch:
bash-4.1$ make -C memory-hotplug run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'
/bin/sh: ./on-off-test.sh: Permission denied
memory-hotplug selftests: [FAIL]
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'
Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After applying the patch:
bash-4.1$ make -C cpu-hotplug run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'
/bin/sh: ./on-off-test.sh: Permission denied
cpu-hotplug selftests: [FAIL]
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'
Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Tue, 18 Dec 2012 00:04:39 +0000 (16:04 -0800)]
mqueue selftests: print failure status instead of cause make error
Original behavior:
bash-4.1$ make -C mqueue run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
./mq_open_tests /test1
Not running as root, but almost all tests require root in order to modify
system settings. Exiting.
make: *** [run_tests] Error 1
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
After applying the patch:
bash-4.1$ make -C mqueue run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
Not running as root, but almost all tests require root in order to modify
system settings. Exiting.
mq_open_tests: [FAIL]
Not running as root, but almost all tests require root in order to modify
system settings. Exiting.
mq_perf_tests: [FAIL]
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Tue, 18 Dec 2012 00:04:38 +0000 (16:04 -0800)]
vm selftests: print failure status instead of cause make error
Original behavior:
bash-4.1$ make -C vm run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
/bin/sh ./run_vmtests
./run_vmtests: line 24: /proc/sys/vm/nr_hugepages: Permission denied
Please run this test as root
make: *** [run_tests] Error 1
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
After applying the patch:
bash-4.1$ make -C vm run_tests
make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
./run_vmtests: line 24: /proc/sys/vm/nr_hugepages: Permission denied
Please run this test as root
vmtests: [FAIL]
make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 18 Dec 2012 00:04:25 +0000 (16:04 -0800)]
prandom: introduce prandom_bytes() and prandom_bytes_state()
Add functions to get the requested number of pseudo-random bytes.
The difference from get_random_bytes() is that it generates pseudo-random
numbers by prandom_u32(). It doesn't consume the entropy pool, and the
sequence is reproducible if the same rnd_state is used. So it is suitable
for generating random bytes for testing.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: David Laight <david.laight@aculab.com> Cc: Michel Lespinasse <walken@google.com> Cc: Robert Love <robert.w.love@intel.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 18 Dec 2012 00:04:23 +0000 (16:04 -0800)]
random32: rename random32 to prandom
This renames all random32 functions to have 'prandom_' prefix as follows:
void prandom_seed(u32 seed); /* rename from srandom32() */
u32 prandom_u32(void); /* rename from random32() */
void prandom_seed_state(struct rnd_state *state, u64 seed);
/* rename from prandom32_seed() */
u32 prandom_u32_state(struct rnd_state *state);
/* rename from prandom32() */
The purpose of this renaming is to prevent some kernel developers from
assuming that prandom32() and random32() might imply that only
prandom32() was the one using a pseudo-random number generator by
prandom32's "p", and the result may be a very embarassing security
exposure. This concern was expressed by Theodore Ts'o.
And furthermore, I'm going to introduce new functions for getting the
requested number of pseudo-random bytes. If I continue to use both
prandom32 and random32 prefixes for these functions, the confusion
is getting worse.
As a result of this renaming, "prandom_" is the common prefix for
pseudo-random number library.
Currently, srandom32() and random32() are preserved because it is
difficult to rename too many users at once.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Robert Love <robert.w.love@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: David Laight <david.laight@aculab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:19 +0000 (16:04 -0800)]
aoe: update internal version number to 81
This version number is printed to the console on module initialization
and is available in sysfs, which is where the userland aoe-version tool
looks for it.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:17 +0000 (16:04 -0800)]
aoe: identify source of runt AoE packets
This change only affects experimental AoE storage networks.
It modifies the console message about runt packets detected so that the
AoE major and minor addresses of the AoE target that generated the runt
are mentioned.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:15 +0000 (16:04 -0800)]
aoe: allow comma separator in aoe_iflist value
By default, the aoe driver uses any ethernet interface for AoE, but the
aoe_iflist module parameter provides a convenient way to limit AoE
traffic to a specific list of local network interfaces.
This change allows a list to be specified using the comma character as a
separator. For example,
modprobe aoe aoe_iflist=eth2,eth3
Before, it was inconvenient to get the quoting right in shell scripts
when setting aoe_iflist to have more than one network interface.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:14 +0000 (16:04 -0800)]
aoe: allow user to disable target failure timeout
With this change, the aoe driver treats the value zero as special for
the aoe_deadsecs module parameter. Normally, this value specifies the
number of seconds during which the driver will continue to attempt
retransmits to an unresponsive AoE target. After aoe_deadsecs has
elapsed, the aoe driver marks the aoe device as "down" and fails all
I/O.
The new meaning of an aoe_deadsecs of zero is for the driver to
retransmit commands indefinitely.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:11 +0000 (16:04 -0800)]
aoe: use dynamic number of remote ports for AoE storage target
Many AoE targets have four or fewer network ports, but some existing
storage devices have many, and the AoE protocol sets no limit.
This patch allows the use of more than eight remote MAC addresses per AoE
target, while reducing the amount of memory used by the aoe driver in
cases where there are many AoE targets with fewer than eight MAC addresses
each.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:09 +0000 (16:04 -0800)]
aoe: avoid races between device destruction and discovery
This change avoids a race that could result in a NULL pointer derference
following a WARNing from kobject_add_internal, "don't try to register
things with the same name in the same directory."
The problem was found with a test that forgets and discovers an
aoe device in a loop:
while test ! -r /tmp/stop; do
aoe-flush -a
aoe-discover
done
The race was between aoedev_flush taking aoedevs out of the devlist,
allowing a new discovery of the same AoE target to take place before the
driver gets around to calling sysfs_remove_group. Fixing that one
revealed another race between do_open and add_disk, and this patch avoids
that, too.
The fix required some care, because for flushing (forgetting) an aoedev,
some of the steps must be performed under lock and some must be able to
sleep. Also, for discovering a new aoedev, some steps might sleep.
The check for a bad aoedev pointer remains from a time when about half of
this patch was done, and it was possible for the
bdev->bd_disk->private_data to become corrupted. The check should be
removed eventually, but it is not expected to add significant overhead,
occurring in the aoeblk_open routine.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:08 +0000 (16:04 -0800)]
aoe: improve handling of misbehaving network paths
An AoE target can have multiple network ports used for AoE, and in the
aoe driver, those are tracked by the aoetgt struct. These changes allow
the aoe driver to handle network paths, or aoetgts, that are not working
well, compared to the others.
Paths that do not get responses despite the retransmission of AoE
commands are marked as "tainted", and non-tainted paths are preferred.
Meanwhile, the aoe driver attempts to "probe" the tainted path in the
background by issuing reads of LBA 0 that are padded out to full
(possibly jumbo-frame) size. If the probes get responses, then the path
is "redeemed", and its taint is removed.
This mechanism has been shown to be helpful in transparently handling
and recovering from real-world network "brown outs" in ways that the
earlier "shoot the help-needing target in the head" mechanism could not.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:06 +0000 (16:04 -0800)]
aoe: return real minor number for static minors
The value returned by the static minor device number number allocator is
the real minor number, so it must be multiplied by the supported number
of partitions per aoedev.
Without this fix the support for systems without udev is incomplete, and
the few users of aoe on such systems will have surprising results when
device nodes names do not match the AoE target.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:04 +0000 (16:04 -0800)]
aoe: initialize sysminor to avoid compiler warning
Because the minor_get and related functions use the return values for
errors, the compiler doesn't know that sysminor will always either 1) be
initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be
unused as the "goto out" is executed.
This patch avoids the compiler warning.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:03 +0000 (16:04 -0800)]
aoe: make error messages more specific in static minor allocation
For some special-purpose systems where udev isn't present, static
allocation of minor numbers is desirable. This update distinguishes
different failure scenarios, to help the user understand what went
wrong.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:02 +0000 (16:04 -0800)]
aoe: remove call to request handler from I/O completion
There is no need to call the request handler function in the I/O
completion routine. The user impact of not doing it is a more "nice" aoe
driver that is less susceptible to causing soft lockups.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:04:00 +0000 (16:04 -0800)]
aoe: increase default cap on outstanding AoE commands in the network
The aoe driver will never be waiting for more than aoe_maxout AoE
commands from a given remote network port on an AoE target. Increasing
the cap increases performance. Users can tighten the setting to reduce
the amount of memory used for handling AoE traffic or the network
bandwidth used for AoE.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:58 +0000 (16:03 -0800)]
aoe: remove vestigial request queue allocation
Before the aoe driver was an I/O request handler, it was a
make_request-style block driver. Even so, there was a problem where
sysfs expected a request queue to exist, so one was provided in commit 7135a71b19be ("aoe: allocate unused request_queue for sysfs").
During the transition to the request-handler style, a patch was merged
that was based on a driver without the noop queue, and the noop queue
remained in place after the patch was merged, even though a new
functional queue was introduced by the patch, allocated through
blk_init_queue.
The user impact is a memory leak proportional to the number of AoE
targets discovered. This patch removes the memory leak and cleans up
vestiges of the old do-nothing queue from the aoeblk_gdalloc function.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:55 +0000 (16:03 -0800)]
aoe: copy fallback timing information on destination failover
Commit f3b8e07af774 ("aoe: commands in retransmit queue use new
destination on failure") omits the copying of the coarse-grained time
when an AoE command was sent during the failover from one destination
MAC address on the AoE target to another.
The coarse-grained timing is only used when the system time changes or
an unlikely length of time has passed since the sending of the AoE
command. Users will not be impacted unless their system clock is very
inaccurate or something unusual (e.g., 10 GbE link reset) happens during
the period when the aoe driver is handling the failure of a port on the
AoE target. Being effected will mean that an AoE target could be
considered "down" too eagerly.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:51 +0000 (16:03 -0800)]
aoe: commands in retransmit queue use new destination on failure
When one remote MAC address isn't working as a destination for AoE
commands, the frames used to track information associated with the AoE
commands are moved to a new aoetgt (defined by the tuple of {AoE major,
AoE minor, target MAC address}).
This patch makes sure that the frames on the queue for retransmits that
need to be done are updated to use the new destination, so that
retransmits will be sent through a working network path.
Without this change, packets on the retransmit queue will be needlessly
retransmitted to the unresponsive destination MAC, possibly causing
premature target failure before there's time for the retransmit timer to
run again, decide to retransmit again, and finally update the destination
to a working MAC address on the AoE target.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:49 +0000 (16:03 -0800)]
aoe: use high-resolution RTTs with fallback to low-res
These changes improve the accuracy of the decision about whether it's time
to retransmit an AoE command by using the microsecond-resolution
gettimeofday instead of jiffies.
Because the system time can jump suddenly, the decision reverts to using
jiffies if the high-resolution time difference is relatively large.
Otherwise the AoE targets could be considered failed inappropriately.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:47 +0000 (16:03 -0800)]
aoe: manipulate aoedev network stats under lock
With this bugfix in place the calculation of the criterion for "lateness"
is performed under lock. Without the lock, there is a chance that one of
the non-atomic operations performed on the round trip time statistics
could be incomplete, such that an incorrect lateness criterion would be
calculated.
Without this change, the effect of the bug would be rare unecessary but
benign retransmissions.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:45 +0000 (16:03 -0800)]
aoe: err device: include MAC addresses for unexpected responses
The /dev/etherd/err character device provides low-level information about
normal but sometimes interesting AoE command retransmits and "unexpected
responses", i.e., responses for packets that have already been
retransmitted.
This change adds MAC addresses to the messages about unexpected responses,
so that when they occur, it's more easy to determine the network paths to
which they belong.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:43 +0000 (16:03 -0800)]
aoe: improve network congestion handling
The aoe driver already had some congestion handling, but it was limited in
its ability to cope with the kind of congestion that can arise on more
complex networks such as those involving paths through multiple ethernet
switches.
Some of the lessons from TCP's history of development can be applied to
improving the congestion control and avoidance on AoE storage networks.
These changes use familar concepts from Van Jacobson's "Congestion
Avoidance and Control" paper from '88, without adding significant
overhead.
This patch depends on an upcoming patch that covers the failover case when
AoE commands being retransmitted are transferred from one retransmit queue
to another. Another upcoming patch increases the timing accuracy.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:42 +0000 (16:03 -0800)]
aoe: provide ATA identify device content to user on request
Make the aoe driver follow expected behavior when the user uses ioctl to
get the ATA device identify information, allowing access to model, serial
number, etc.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:34 +0000 (16:03 -0800)]
aoe: "payload" sysfs file exports per-AoE-command data transfer size
The userland aoetools package includes an "aoe-stat" command that can
display a "payload size" column when the aoe driver exports this
information. Users can quickly see what amount of user data is
transferred inside each AoE command on the network, network headers
excluded.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:32 +0000 (16:03 -0800)]
aoe: support larger I/O requests via aoe_maxsectors module param
The GPFS filesystem is an example of an aoe user that requires the aoe
driver to support I/O request sizes larger than the default. Most users
will not need large I/O request sizes, because they would need to be split
up into multiple AoE commands anyway.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:30 +0000 (16:03 -0800)]
aoe: support the forgetting (flushing) of a user-specified AoE target
Users sometimes want to cause the aoe driver to forget a particular
previously discovered device when it is no longer online. The aoetools
provide an "aoe-flush" command that users run to perform this
administrative task. The changes below provide the support needed in the
driver.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:29 +0000 (16:03 -0800)]
aoe: update cap on outstanding commands based on config query response
The ATA over Ethernet config query response contains a "buffer count"
field reflecting the AoE target's capacity to buffer incoming AoE
commands.
By taking the current value of this field into accound, we increase
performance throughput or avoid network congestion, when the value
has increased or decreased, respectively.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Tue, 18 Dec 2012 00:03:25 +0000 (16:03 -0800)]
Documentation/sparse.txt: document context annotations for lock checking
The context feature of sparse is used with the Linux kernel sources to
check for imbalanced uses of locks. Document the annotations defined in
include/linux/compiler.h that tell sparse what to expect when a lock is
held on function entry, exit, or both.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Tue, 18 Dec 2012 00:03:24 +0000 (16:03 -0800)]
linux/compiler.h: add __must_hold macro for functions called with a lock held
linux/compiler.h has macros to denote functions that acquire or release
locks, but not to denote functions called with a lock held that return
with the lock still held. Add a __must_hold macro to cover that case.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Ed Cashin <ecashin@coraid.com> Tested-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 18 Dec 2012 00:03:20 +0000 (16:03 -0800)]
exec: use -ELOOP for max recursion depth
To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.
This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:
The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.
Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Artem Bityutskiy [Tue, 18 Dec 2012 00:03:17 +0000 (16:03 -0800)]
proc: pid/status: show all supplementary groups
We display a list of supplementary group for each process in
/proc/<pid>/status. However, we show only the first 32 groups, not all of
them.
Although this is rare, but sometimes processes do have more than 32
supplementary groups, and this kernel limitation breaks user-space apps
that rely on the group list in /proc/<pid>/status.
Number 32 comes from the internal NGROUPS_SMALL macro which defines the
length for the internal kernel "small" groups buffer. There is no
apparent reason to limit to this value.
This patch removes the 32 groups printing limit.
The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX,
which is currently set to 65536. And this is the maximum count of groups
we may possibly print.
Kees Cook [Tue, 18 Dec 2012 00:03:14 +0000 (16:03 -0800)]
/proc/pid/status: add "Seccomp" field
It is currently impossible to examine the state of seccomp for a given
process. While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable. If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.
When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode. ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")
This adds the "Seccomp" line to /proc/$pid/status.
Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Morris <jmorris@namei.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Tue, 18 Dec 2012 00:03:13 +0000 (16:03 -0800)]
procfs: add VmFlags field in smaps output
During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().
This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.
This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.
[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
said that providing just a few flags looks somehow inconsistent. So all
flags are here now. ]
This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.
The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.
[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification]
[akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I suggest to hide non-existent capabilities. Here is two reasons.
* It's logically and easier for using.
* It helps to checkpoint-restore capabilities of tasks, because tasks
can be restored on another kernel, where CAP_LAST_CAP is bigger.
Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Reviewed-by: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 18 Dec 2012 00:03:07 +0000 (16:03 -0800)]
ptrace: introduce PTRACE_O_EXITKILL
Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.
Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.
Note that the new option is not equal to the last-option << 1. Because
currently all options have an event, and the new one starts the eventless
group. It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.
Suggested by Amnon Shiloh.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Amnon Shiloh <u3557@miso.sublimeip.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eldad Zack [Tue, 18 Dec 2012 00:03:04 +0000 (16:03 -0800)]
kstrto*: add documentation
As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc
comments. This patch adds kerneldoc comments to common variants of
kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Reisner [Tue, 18 Dec 2012 00:03:01 +0000 (16:03 -0800)]
fs/fat: strip "cp" prefix from codepage in display
Option parsing code expects an unsigned integer for the codepage option,
but prefixes and stores this option with "cp" before passing to
load_nls(). This makes the displayed option in /proc an invalid one.
Strip the prefix when printing so that the displayed option is valid for
reuse.
Signed-off-by: Dave Reisner <dreisner@archlinux.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>