]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
13 years agojbd/jbd2: remove obsolete summarise_journal_usage.
Tao Ma [Thu, 5 May 2011 15:54:19 +0000 (23:54 +0800)]
jbd/jbd2: remove obsolete summarise_journal_usage.

summarise_journal_usage seems to be obsolete for a long time,
so remove it.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
13 years agojbd: Fix forever sleeping process in do_get_write_access()
Jan Kara [Thu, 5 May 2011 11:59:35 +0000 (13:59 +0200)]
jbd: Fix forever sleeping process in do_get_write_access()

In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
from shadow state. The waking code in journal_commit_transaction() has
a bug because it does not issue a memory barrier after the buffer is moved
from the shadow state and before wake_up_bit() is called. Thus a waitqueue
check can happen before the buffer is actually moved from the shadow state
and waiting process may never be woken. Fix the problem by issuing proper
barrier.

CC: stable@kernel.org
Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
13 years agoext2: fix error msg when mounting fs with too-large blocksize
Robin Dong [Thu, 5 May 2011 02:44:04 +0000 (10:44 +0800)]
ext2: fix error msg when mounting fs with too-large blocksize

When ext2 mounts a filesystem, it attempts to set the block device
blocksize with a call to sb_set_blocksize, which can fail for
several reasons.  The current failure message in ext2 prints:

  EXT2-fs (loop1): error: blocksize is too small

which is not correct in all cases.  This can be demonstrated
by creating a filesystem with

  # mkfs.ext2 -b 8192

on a 4k page system, and attempting to mount it.

Change the error message to a more generic:

  EXT2-fs (loop1): bad blocksize 8192

to match the error message in ext3.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: Coly Li <bosong.ly@taobao.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
13 years agojbd: fix fsync() tid wraparound bug
Ted Ts'o [Sat, 30 Apr 2011 17:17:11 +0000 (13:17 -0400)]
jbd: fix fsync() tid wraparound bug

If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated.  If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:

J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync().  But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test,
since there's only one valid transaction id that is valid for us to
start: namely, the currently running transaction (if it exists).

CC: stable@kernel.org
Reported-by: Martin_Zielinski@McAfee.com
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
13 years agoext3: Fix fs corruption when make_indexed_dir() fails
Jan Kara [Wed, 27 Apr 2011 16:20:44 +0000 (18:20 +0200)]
ext3: Fix fs corruption when make_indexed_dir() fails

When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated
block for index tree root, we did not properly mark all changed buffers dirty.
This lead to only some of these buffers being written out and thus effectively
corrupting the directory.

Fix the issue by marking all changed data dirty even in the error failure case.

CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
13 years agoext3: Fix lock inversion in ext3_symlink()
Jan Kara [Thu, 21 Apr 2011 21:47:16 +0000 (23:47 +0200)]
ext3: Fix lock inversion in ext3_symlink()

ext3_symlink() cannot call __page_symlink() with transaction open.
__page_symlink() calls ext3_write_begin() which gets page lock which ranks
above transaction start (thus lock ordering is violated) and and also
ext3_write_begin() waits for a transaction commit when we run out of space
which never happens if we hold transaction open.

Fix the problem by stopping a transaction before calling __page_symlink()
(we have to be careful and put inode to orphan list so that it gets deleted
in case of crash) and starting another one after __page_symlink() returns
for addition of symlink into a directory.

Signed-off-by: Jan Kara <jack@suse.cz>
13 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Thu, 28 Apr 2011 20:14:02 +0000 (13:14 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/i915: restore only the mode of this driver on lastclose (v2)
  drm/radeon/kms: add info query for tile pipes
  drm/radeon/kms: add missing safe regs for 6xx/7xx
  drm: select FRAMEBUFFER_CONSOLE_PRIMARY if we have FRAMEBUFFER_CONSOLE

13 years agoMerge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/geert...
Linus Torvalds [Thu, 28 Apr 2011 20:13:44 +0000 (13:13 -0700)]
Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/mm: Set all online nodes in N_NORMAL_MEMORY

13 years agoMerge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
Linus Torvalds [Thu, 28 Apr 2011 20:13:07 +0000 (13:13 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: don't lose MS_SYNCHRONOUS on remount of noac mount
  NFS: Return meaningful status from decode_secinfo()
  NFSv4: Ensure we request the ordinary fileid when doing readdirplus
  NFSv4: Ensure that clientid and session establishment can time out
  SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
  NFSv4.1: Don't loop forever in nfs4_proc_create_session
  NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception()
  NFSv4.1: Don't update sequence number if rpc_task is not sent
  NFSv4.1: Ensure state manager thread dies on last umount
  SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies
  NFS: Use correct variable for page bounds checking
  NFS: don't negotiate when user specifies sec flavor
  NFS: Attempt mount with default sec flavor first
  NFS: flav_array honors NFS_MAX_SECFLAVORS
  NFS: Fix infinite loop in gss_create_upcall()
  Don't mark_inode_dirty_sync() while holding lock
  NFS: Get rid of pointless test in nfs_commit_done
  NFS: Remove unused argument from nfs_find_best_sec()
  NFS: Eliminate duplicate call to nfs_mark_request_dirty
  NFS: Remove dead code from nfs_fs_mount()

13 years agomm: check if PTE is already allocated during page fault
Mel Gorman [Wed, 27 Apr 2011 22:26:56 +0000 (15:26 -0700)]
mm: check if PTE is already allocated during page fault

With transparent hugepage support, handle_mm_fault() has to be careful
that a normal PMD has been established before handling a PTE fault.  To
achieve this, it used __pte_alloc() directly instead of pte_alloc_map as
pte_alloc_map is unsafe to run against a huge PMD.  pte_offset_map() is
called once it is known the PMD is safe.

pte_alloc_map() is smart enough to check if a PTE is already present
before calling __pte_alloc but this check was lost.  As a consequence,
PTEs may be allocated unnecessarily and the page table lock taken.  Thi
useless PTE does get cleaned up but it's a performance hit which is
visible in page_test from aim9.

This patch simply re-adds the check normally done by pte_alloc_map to
check if the PTE needs to be allocated before taking the page table lock.
The effect is noticable in page_test from aim9.

  AIM9
                  2.6.38-vanilla 2.6.38-checkptenone
  creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
  page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
  brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
  exec_test      382.00 ( 0.00%)   456.90 (16.39%)
  fork_test       60.11 ( 0.00%)    67.79 (11.34%)
  MMTests Statistics: duration
  Total Elapsed Time (seconds)                611.90    612.22

(While this affects 2.6.38, it is a performance rather than a functional
bug and normally outside the rules -stable.  While the big performance
differences are to a microbench, the difference in fork and exec
performance may be significant enough that -stable wants to consider the
patch)

Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agokernel/watchdog.c: disable nmi perf event in the error path of enabling watchdog
Hillf Danton [Wed, 27 Apr 2011 22:26:55 +0000 (15:26 -0700)]
kernel/watchdog.c: disable nmi perf event in the error path of enabling watchdog

In corner cases where softlockup watchdog is not setup successfully, the
relevant nmi perf event for hardlockup watchdog could be disabled, then
the status of the underlying hardware remains unchanged.

Also, if the kthread doesn't start then the hrtimer won't run and the
hardlockup detector will falsely fire.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoum: adjust current_thread_info() for newer gcc versions
Richard Weinberger [Wed, 27 Apr 2011 22:26:54 +0000 (15:26 -0700)]
um: adjust current_thread_info() for newer gcc versions

In some cases gcc >= 4.5.2 will optimize away current_thread_info().  To
prevent gcc from doing so the stack address has to be obtained via inline
asm.

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agouml: fix hppfs build
Randy Dunlap [Wed, 27 Apr 2011 22:26:53 +0000 (15:26 -0700)]
uml: fix hppfs build

Make HoneyPot ProcFS depend on CONFIG_PROC_FS so that it will build.
Recommended by Christoph Hellwig.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33692

Reported-by: Simon Danner <danner.simon@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoum: mdd support for 64 bit atomic operations
Richard Weinberger [Wed, 27 Apr 2011 22:26:51 +0000 (15:26 -0700)]
um: mdd support for 64 bit atomic operations

This adds support for 64 bit atomic operations on 32 bit UML systems.  XFS
needs them since 2.6.38.

  $ make ARCH=um SUBARCH=i386
  ...
    LD      .tmp_vmlinux1
  fs/built-in.o: In function `xlog_regrant_reserve_log_space':
  xfs_log.c:(.text+0xd8584): undefined reference to `atomic64_read_386'
  xfs_log.c:(.text+0xd85ac): undefined reference to `cmpxchg8b_emu'
  ...

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=32812

Reported-by: Martin Walch <walch.martin@web.de>
Tested-by: Martin Walch <walch.martin@web.de>
Cc: Martin Walch <walch.martin@web.de>
Cc: <stable@kernel.org> [2.6.38.x 084189a: um: disable CONFIG_CMPXCHG_LOCAL]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agooom: use pte pages in OOM score
KOSAKI Motohiro [Wed, 27 Apr 2011 22:26:50 +0000 (15:26 -0700)]
oom: use pte pages in OOM score

PTE pages eat up memory just like anything else, but we do not account for
them in any way in the OOM scores.  They are also _guaranteed_ to get
freed up when a process is OOM killed, while RSS is not.

Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> [2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMAINTAINERS: add EXYNOS ARM architectures
Kukjin Kim [Wed, 27 Apr 2011 22:26:49 +0000 (15:26 -0700)]
MAINTAINERS: add EXYNOS ARM architectures

Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: update documentation to describe usage_in_bytes
Daisuke Nishimura [Wed, 27 Apr 2011 22:26:48 +0000 (15:26 -0700)]
memcg: update documentation to describe usage_in_bytes

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
batched (delayed) uncharge at truncation/unmap.  And since cdec2e42(memcg:
coalesce charging via percpu storage), we have percpu cache for
res_counter.

These changes improved performance of memory cgroup very much, but made
res_counter->usage usually have a bigger value than the actual value of
memory usage.  So, *.usage_in_bytes, which show res_counter->usage, are
not desirable for precise values of memory(and swap) usage anymore.

Instead of removing these files completely(because we cannot know
res_counter->usage without them), this patch updates the meaning of those
files.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMAINTAINERS: update Documentation file entry of GPIO subsystem
Kukjin Kim [Wed, 27 Apr 2011 22:26:47 +0000 (15:26 -0700)]
MAINTAINERS: update Documentation file entry of GPIO subsystem

Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMAINTAINERS: re-alphabetize Xen entries
Ian Campbell [Wed, 27 Apr 2011 22:26:46 +0000 (15:26 -0700)]
MAINTAINERS: re-alphabetize Xen entries

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups
Andrea Arcangeli [Wed, 27 Apr 2011 22:26:45 +0000 (15:26 -0700)]
mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups

The huge_memory.c THP page fault was allowed to run if vm_ops was null
(which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't
setup a special vma->vm_ops and it would fallback to regular anonymous
memory) but other THP logics weren't fully activated for vmas with vm_file
not NULL (/dev/zero has a not NULL vma->vm_file).

So this removes the vm_file checks so that /dev/zero also can safely use
THP (the other albeit safer approach to fix this bug would have been to
prevent the THP initial page fault to run if vm_file was set).

After removing the vm_file checks, this also makes huge_memory.c stricter
in khugepaged for the DEBUG_VM=y case.  It doesn't replace the vm_file
check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
only be allowed to exist before the first page fault, and in turn when
vma->anon_vma is null (so preventing khugepaged registration).  So I tend
to think the previous comment saying if vm_file was set, VM_PFNMAP might
have been set and we could still be registered in khugepaged (despite
anon_vma was not NULL to be registered in khugepaged) was too paranoid.
The is_linear_pfn_mapping check is also I think superfluous (as described
by comment) but under DEBUG_VM it is safe to stay.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Caspar Zhang <bugs@casparzhang.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agovfs: avoid large kmalloc()s for the fdtable
Andrew Morton [Wed, 27 Apr 2011 22:26:41 +0000 (15:26 -0700)]
vfs: avoid large kmalloc()s for the fdtable

Azurit reports large increases in system time after 2.6.36 when running
Apache.  It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").

That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.

Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.

Reported-by: azurIt <azurit@pobox.sk>
Tested-by: azurIt <azurit@pobox.sk>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6
Linus Torvalds [Wed, 27 Apr 2011 22:20:33 +0000 (15:20 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] slub: fix panic with DISCONTIGMEM
  [PARISC] set memory ranges in N_NORMAL_MEMORY when onlined

13 years agoMerge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspe...
Linus Torvalds [Wed, 27 Apr 2011 22:20:00 +0000 (15:20 -0700)]
Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  ACPI / PM: Avoid infinite recurrence while registering power resources
  PM / Wakeup: Fix initialization of wakeup-related device sysfs files

13 years agoMerge branch 'spell-fix' of git://git.profusion.mobi/users/lucas/linux-2.6
Linus Torvalds [Wed, 27 Apr 2011 22:18:37 +0000 (15:18 -0700)]
Merge branch 'spell-fix' of git://git.profusion.mobi/users/lucas/linux-2.6

* 'spell-fix' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Revert wrong fixes for common misspellings

13 years agoMerge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 27 Apr 2011 22:17:52 +0000 (15:17 -0700)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (42 commits)
  [media] media: vb2: correct queue initialization order
  [media] media: vb2: fix incorrect v4l2_buffer->flags handling
  [media] s5p-fimc: Add support for the buffer timestamps and sequence
  [media] s5p-fimc: Fix bytesperline and plane payload setup
  [media] s5p-fimc: Do not allow changing format after REQBUFS
  [media] s5p-fimc: Fix FIMC3 pixel limits on Exynos4
  [media] tda18271: update tda18271c2_rf_cal as per NXP's rev.04 datasheet
  [media] tda18271: update tda18271_rf_band as per NXP's rev.04 datasheet
  [media] tda18271: fix bad calculation of main post divider byte
  [media] tda18271: prog_cal and prog_tab variables should be s32, not u8
  [media] tda18271: fix calculation bug in tda18271_rf_tracking_filters_init
  [media] omap3isp: queue: Don't corrupt buf->npages when get_user_pages() fails
  [media] v4l: Don't register media entities for subdev device nodes
  [media] omap3isp: Don't increment node entity use count when poweron fails
  [media] omap3isp: lane shifter support
  [media] omap3isp: ccdc: support Y10/12, 8-bit bayer fmts
  [media] media: add missing 8-bit bayer formats and Y12
  [media] v4l: add V4L2_PIX_FMT_Y12 format
  cx23885: Fix stv0367 Kconfig dependency
  [media] omap3isp: Use isp xclk defines
  ...

Fix up trivial conflict (spelink errurs) in drivers/media/video/omap3isp/isp.c

13 years agonfs: don't lose MS_SYNCHRONOUS on remount of noac mount
Jeff Layton [Wed, 27 Apr 2011 15:49:09 +0000 (11:49 -0400)]
nfs: don't lose MS_SYNCHRONOUS on remount of noac mount

On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the
assumption that the flags on the mount syscall will have it set if the
remounted fs is supposed to keep it.

In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of
such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part
of the mount options.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Return meaningful status from decode_secinfo()
Bryan Schumaker [Wed, 27 Apr 2011 19:28:44 +0000 (15:28 -0400)]
NFS: Return meaningful status from decode_secinfo()

When compiling, I was getting this warning:
fs/nfs/nfs4xdr.c: In function ‘decode_secinfo’:
fs/nfs/nfs4xdr.c:4839:6: warning: variable ‘status’ set but not used
[-Wunused-but-set-variable]

We were unconditionally returning 0 as long as there wasn't an error
coming out of xdr_inline_decode().  We probably want to check the error
status coming out of decode_op_hdr() and decode_secinfo_gss(), rather
than assuming that everything is OK all the time.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFSv4: Ensure we request the ordinary fileid when doing readdirplus
Trond Myklebust [Wed, 27 Apr 2011 17:47:52 +0000 (13:47 -0400)]
NFSv4: Ensure we request the ordinary fileid when doing readdirplus

When readdir() returns a directory entry for the root of a mounted
filesystem, Linux follows the old convention of returning the inode
number of the covered directory (despite newer versions of POSIX declaring
that this is a bug).
To ensure this continues to work, the NFSv4 readdir implementation requests
the 'mounted-on-fileid' from the server.

However, readdirplus also needs to instantiate an inode for this entry, and
for that, we also need to request the real fileid as per this patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agom68k/mm: Set all online nodes in N_NORMAL_MEMORY
Michael Schmitz [Tue, 26 Apr 2011 02:51:53 +0000 (14:51 +1200)]
m68k/mm: Set all online nodes in N_NORMAL_MEMORY

For m68k, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM.  This patch sets the bit at the time
node_present_pages has been set by free_area_init_node.
At the time the node is brought online, the node state would have to be
done unconditionally since information about present memory has not yet
been recorded.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.

This pach is an alternative to the one proposed by David Rientjes
<rientjes@google.com> attempting to set node state immediately when
bringing the node online.

Signed-off-by: Michael Schmitz <schmitz@debian.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
CC: stable@kernel.org
13 years agodrm/i915: restore only the mode of this driver on lastclose (v2)
Dave Airlie [Thu, 21 Apr 2011 21:18:32 +0000 (22:18 +0100)]
drm/i915: restore only the mode of this driver on lastclose (v2)

i915 calls the panic handler function on last close to reset the modes,
however this is a really bad idea for multi-gpu machines, esp shareable
gpus machines. So add a new entry point for the driver to just restore
its own fbcon mode.

v2: move code into fb helper, fix panic code to block mode change on
powered off GPUs.

[airlied: this hits drm core and I wrote it and it was reviewed on intel-gfx
 so really I signed it off twice ;-).]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm/radeon/kms: add info query for tile pipes
Alex Deucher [Tue, 26 Apr 2011 17:27:43 +0000 (13:27 -0400)]
drm/radeon/kms: add info query for tile pipes

needed by mesa for htile setup.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm/radeon/kms: add missing safe regs for 6xx/7xx
Alex Deucher [Tue, 26 Apr 2011 17:10:20 +0000 (13:10 -0400)]
drm/radeon/kms: add missing safe regs for 6xx/7xx

needed for HiS in mesa.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm: select FRAMEBUFFER_CONSOLE_PRIMARY if we have FRAMEBUFFER_CONSOLE
Dave Airlie [Thu, 21 Apr 2011 21:51:33 +0000 (07:51 +1000)]
drm: select FRAMEBUFFER_CONSOLE_PRIMARY if we have FRAMEBUFFER_CONSOLE

Multi-gpu/switcheroo relies on this option to get the console on the
correct GPU at bootup, some distros enable it but it seems some get
it wrong.

cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agoRevert wrong fixes for common misspellings
Lucas De Marchi [Wed, 27 Apr 2011 06:28:26 +0000 (23:28 -0700)]
Revert wrong fixes for common misspellings

These changes were incorrectly fixed by codespell. They were now
manually corrected.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
13 years agoLinux 2.6.39-rc5
Linus Torvalds [Wed, 27 Apr 2011 03:48:50 +0000 (20:48 -0700)]
Linux 2.6.39-rc5

13 years agoinit/Kconfig: fix EXPERT menu list
Randy Dunlap [Tue, 26 Apr 2011 19:33:21 +0000 (12:33 -0700)]
init/Kconfig: fix EXPERT menu list

The EXPERT menu list was recently broken by the insertion of a
kconfig symbol (EMBEDDED) at the beginning of the EXPERT list of
kconfig items.  Broken by:

  commit 6a108a14fa356ef607be308b68337939e56ea94e
  Author: David Rientjes <rientjes@google.com>
  Date:   Thu Jan 20 14:44:16 2011 -0800
    kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

Restore the EXPERT menu list -- don't inject a symbol (EMBEDDED)
that does not depend on EXPERT into the list.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Foley <pefoley2@verizon.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Linus Torvalds [Tue, 26 Apr 2011 18:39:37 +0000 (11:39 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Erratum #637 workaround
  amd64_edac: Factor in CC6 save area
  amd64_edac: Remove node interleave warning
  EDAC: Remove debugging output in scrub rate handling

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
Linus Torvalds [Tue, 26 Apr 2011 18:39:14 +0000 (11:39 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  watchdog: iTCO_wdt: TCO Watchdog patch for Intel Panther Point PCH

13 years agoMerge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Tue, 26 Apr 2011 18:38:48 +0000 (11:38 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] kvm-390: Let kernel exit SIE instruction on work
  [S390] dasd: check sense type in device change handler
  [S390] pfault: fix token handling
  [S390] qdio: reset error states immediately
  [S390] fix page table walk for changing page attributes
  [S390] prng: prevent access beyond end of stack
  [S390] dasd: fix race between open and offline

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Tue, 26 Apr 2011 15:26:58 +0000 (08:26 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: cleanup error handling in inode.c
  Btrfs: put the right bio if we have an error
  Btrfs: free bitmaps properly when evicting the cache
  Btrfs: Free free_space item properly in btrfs_trim_block_group()
  btrfs: add missing spin_unlock to a rare exit path
  Btrfs: check return value of kmalloc()
  btrfs: fix wrong allocating flag when reading page
  Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs...
Linus Torvalds [Tue, 26 Apr 2011 15:25:16 +0000 (08:25 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: do some plugging in the submit_bio threads

13 years agoamd64_edac: Erratum #637 workaround
Borislav Petkov [Wed, 30 Mar 2011 13:42:10 +0000 (15:42 +0200)]
amd64_edac: Erratum #637 workaround

F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
13 years agoamd64_edac: Factor in CC6 save area
Borislav Petkov [Mon, 21 Mar 2011 19:45:06 +0000 (20:45 +0100)]
amd64_edac: Factor in CC6 save area

F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
13 years agoamd64_edac: Remove node interleave warning
Borislav Petkov [Fri, 8 Apr 2011 13:05:21 +0000 (15:05 +0200)]
amd64_edac: Remove node interleave warning

This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
13 years agowatchdog: iTCO_wdt: TCO Watchdog patch for Intel Panther Point PCH
Seth Heasley [Wed, 20 Apr 2011 17:56:20 +0000 (10:56 -0700)]
watchdog: iTCO_wdt: TCO Watchdog patch for Intel Panther Point PCH

This patch adds the TCO Watchdog DeviceIDs for the Intel Panther Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agoACPI / PM: Avoid infinite recurrence while registering power resources
Rafael J. Wysocki [Tue, 26 Apr 2011 09:33:18 +0000 (11:33 +0200)]
ACPI / PM: Avoid infinite recurrence while registering power resources

There is at least one BIOS with a DSDT containing a power resource
object with a _PR0 entry pointing back to that power resource.  In
consequence, while registering that power resource
acpi_bus_get_power_flags() sees that it depends on itself and tries
to register it again, which leads to an infinitely deep recurrence.
This problem was introduced by commit bf325f9538d8c89312be305b9779e
(ACPI / PM: Register power resource devices as soon as they are
needed).

To fix this problem use the observation that power resources cannot
be power manageable and prevent acpi_bus_get_power_flags() from
being called for power resource objects.

References: https://bugzilla.kernel.org/show_bug.cgi?id=31872
Reported-and-tested-by: Pascal Dormeau <pdormeau@free.fr>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <lenb@kernel.org>
Cc: stable@kernel.org
13 years agoPM / Wakeup: Fix initialization of wakeup-related device sysfs files
Rafael J. Wysocki [Tue, 26 Apr 2011 09:33:09 +0000 (11:33 +0200)]
PM / Wakeup: Fix initialization of wakeup-related device sysfs files

It turns out that some PCI devices are only found to be
wakeup-capable during registration, in which case, when
device_set_wakeup_capable() is called, device_is_registered() already
returns 'true' for the given device, but dpm_sysfs_add() hasn't been
called for it yet.  This leads to situations in which the device's
power.can_wakeup flag is not set as requested because of failing
wakeup_sysfs_add() and its wakeup-related sysfs files are not
created, although they should be present.  This is a post-2.6.38
regression introduced by commit cb8f51bdadb7969139c2e39c2defd4cde98c1
(PM: Do not create wakeup sysfs files for devices that cannot wake
up).

To work around this problem initialize the device's power.entry
field to an empty list head and make device_set_wakeup_capable()
check if it is still empty before attempting to add the devices
wakeup-related sysfs files with wakeup_sysfs_add().  Namely, if
power.entry is still empty at this point, device_pm_add() hasn't been
called yet for the device and its wakeup-related files will be
created later, so device_set_wakeup_capable() doesn't have to create
them.

Reported-and-tested-by: Tino Keitel <tino.keitel@tikei.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
Linus Torvalds [Tue, 26 Apr 2011 03:38:50 +0000 (20:38 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  CIFS: Fix memory over bound bug in cifs_parse_mount_options

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs...
Linus Torvalds [Tue, 26 Apr 2011 02:01:12 +0000 (19:01 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Flush dirty pages in setattr
  eCryptfs: Handle failed metadata read in lookup
  eCryptfs: Add reference counting to lower files
  eCryptfs: dput dentries returned from dget_parent
  eCryptfs: Remove extra d_delete in ecryptfs_rmdir

13 years agoMerge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Tue, 26 Apr 2011 02:00:55 +0000 (19:00 -0700)]
Merge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson

* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  rtc: fix coh901331 startup crash
  mach-ux500: fix i2c0 device setup regression

13 years agoSELINUX: Make selinux cache VFS RCU walks safe
Eric Paris [Mon, 25 Apr 2011 20:26:29 +0000 (16:26 -0400)]
SELINUX: Make selinux cache VFS RCU walks safe

Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly.  The SELinux AVC and security server access decision
code is RCU safe.  A specific piece of the LSM audit code may not
be RCU safe.

This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code.  It will normally just work under RCU.  This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoadd hlist_bl_lock/unlock helpers
Christoph Hellwig [Mon, 25 Apr 2011 18:01:36 +0000 (14:01 -0400)]
add hlist_bl_lock/unlock helpers

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets.  Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agobit_spinlock: don't play preemption games inside the busy loop
Linus Torvalds [Tue, 26 Apr 2011 01:10:58 +0000 (18:10 -0700)]
bit_spinlock: don't play preemption games inside the busy loop

When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.

Do the preemption enable/disable around the loop, rather than inside of
it.

Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoeCryptfs: Flush dirty pages in setattr
Tyler Hicks [Fri, 22 Apr 2011 18:08:00 +0000 (13:08 -0500)]
eCryptfs: Flush dirty pages in setattr

After 57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.

To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.

https://bugzilla.kernel.org/show_bug.cgi?id=33372

Reported-by: Rocko <rockorequin@hotmail.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: Handle failed metadata read in lookup
Tyler Hicks [Tue, 15 Mar 2011 19:54:00 +0000 (14:54 -0500)]
eCryptfs: Handle failed metadata read in lookup

When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.

If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.

This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.

https://bugs.launchpad.net/bugs/509180

Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoBtrfs: cleanup error handling in inode.c
Tsutomu Itoh [Mon, 25 Apr 2011 23:43:53 +0000 (19:43 -0400)]
Btrfs: cleanup error handling in inode.c

The error processing of several places is changed like setting the
error number only at the error.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: put the right bio if we have an error
Josef Bacik [Mon, 25 Apr 2011 23:43:52 +0000 (19:43 -0400)]
Btrfs: put the right bio if we have an error

In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
the orig_bio, not bio.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: free bitmaps properly when evicting the cache
Josef Bacik [Mon, 25 Apr 2011 23:43:52 +0000 (19:43 -0400)]
Btrfs: free bitmaps properly when evicting the cache

If our space cache is wrong, we do the right thing and free up everything that
we loaded, however we don't reset the total_bitmaps counter or the thresholds or
anything.  So in btrfs_remove_free_space_cache make sure to call free_bitmap()
if it's a bitmap, this will keep us from panicing when we check to make sure we
don't have too many bitmaps.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: Free free_space item properly in btrfs_trim_block_group()
Li Zefan [Mon, 25 Apr 2011 23:43:52 +0000 (19:43 -0400)]
Btrfs: Free free_space item properly in btrfs_trim_block_group()

Since commit dc89e9824464e91fa0b06267864ceabe3186fd8b, we've changed
to use a specific slab for alocation of free_space items.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agobtrfs: add missing spin_unlock to a rare exit path
David Sterba [Mon, 25 Apr 2011 23:43:52 +0000 (19:43 -0400)]
btrfs: add missing spin_unlock to a rare exit path

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: check return value of kmalloc()
Tsutomu Itoh [Mon, 25 Apr 2011 23:43:52 +0000 (19:43 -0400)]
Btrfs: check return value of kmalloc()

The check on the return value of kmalloc() is added to some places.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agobtrfs: fix wrong allocating flag when reading page
Itaru Kitayama [Mon, 25 Apr 2011 23:43:51 +0000 (19:43 -0400)]
btrfs: fix wrong allocating flag when reading page

the space cache use extent_readpages() to read free space information,
so we can not use GFP_KERNEL flag to allocate memory, or it may lead
to deadlock.

Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoBtrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()
Tsutomu Itoh [Mon, 25 Apr 2011 23:43:51 +0000 (19:43 -0400)]
Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()

It is necessary to unlock mutex_lock before it return an error when
btrfs_alloc_path() fails.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoeCryptfs: Add reference counting to lower files
Tyler Hicks [Thu, 14 Apr 2011 20:35:11 +0000 (15:35 -0500)]
eCryptfs: Add reference counting to lower files

For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.

This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.

Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.

This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: dput dentries returned from dget_parent
Tyler Hicks [Tue, 12 Apr 2011 16:23:09 +0000 (11:23 -0500)]
eCryptfs: dput dentries returned from dget_parent

Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: Remove extra d_delete in ecryptfs_rmdir
Tyler Hicks [Tue, 12 Apr 2011 16:21:36 +0000 (11:21 -0500)]
eCryptfs: Remove extra d_delete in ecryptfs_rmdir

vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoNFSv4: Ensure that clientid and session establishment can time out
Trond Myklebust [Sun, 24 Apr 2011 18:29:33 +0000 (14:29 -0400)]
NFSv4: Ensure that clientid and session establishment can time out

The following patch ensures that we do not get permanently trapped in
the RPC layer when trying to establish a new client id or session.
This again ensures that the state manager can finish in a timely
fashion when the last filesystem to reference the nfs_client exits.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
Trond Myklebust [Sun, 24 Apr 2011 18:28:45 +0000 (14:28 -0400)]
SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO

On occasion, it is useful for the NFS layer to distinguish between
soft timeouts and other EIO errors due to (say) encoding errors,
or authentication errors.

The following patch ensures that the default behaviour of the RPC
layer remains to return EIO on soft timeouts (until we have
audited all the callers).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFSv4.1: Don't loop forever in nfs4_proc_create_session
Trond Myklebust [Sun, 24 Apr 2011 18:28:18 +0000 (14:28 -0400)]
NFSv4.1: Don't loop forever in nfs4_proc_create_session

If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
up looping forever inside nfs4_proc_create_session, and so the usual
mechanisms for detecting if the nfs_client is dead don't work.

Fix this by ensuring that we loop inside the nfs4_state_manager thread
instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoMerge branch 'dcache-cleanup'
Linus Torvalds [Sun, 24 Apr 2011 15:51:15 +0000 (08:51 -0700)]
Merge branch 'dcache-cleanup'

* dcache-cleanup:
  vfs: get rid of insane dentry hashing rules

13 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Sun, 24 Apr 2011 15:45:37 +0000 (08:45 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: ahci_start_engine compliant to AHCI spec
  ata: pata_at91.c bugfix for initial_timing initialisation
  ata: pata_at91.c bugfix for high master clock
  ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
  ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
  libata: Pioneer DVR-216D can't do SETXFER
  ahci: don't enable port irq before handler is registered
  libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
  libata: Kill unused ATA_DFLAG_{H|D}IPM flags
  ahci: EM supported message type sysfs attribute

13 years agoMerge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
Linus Torvalds [Sun, 24 Apr 2011 15:42:15 +0000 (08:42 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix master node recovery
  UBIFS: fix false assertion warning in case of I/O failures
  UBIFS: fix false space checking failure

13 years agolibata: ahci_start_engine compliant to AHCI spec
Jian Peng [Sat, 23 Apr 2011 06:58:10 +0000 (23:58 -0700)]
libata: ahci_start_engine compliant to AHCI spec

At the end of section 10.1 of AHCI spec (rev 1.3), it states

Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h

Even though most AHCI host controller works without this check,
specific controller will fail under this condition.

Signed-off-by: Jian Peng <jipeng2005@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata: pata_at91.c bugfix for initial_timing initialisation
Igor Plyatov [Mon, 28 Mar 2011 12:56:14 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for initial_timing initialisation

The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata: pata_at91.c bugfix for high master clock
Igor Plyatov [Mon, 28 Mar 2011 12:56:15 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for high master clock

The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
Seth Heasley [Wed, 20 Apr 2011 15:45:20 +0000 (08:45 -0700)]
ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs

The previously submitted patch was word-wrapped.

This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
Seth Heasley [Wed, 20 Apr 2011 15:43:37 +0000 (08:43 -0700)]
ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs

The previously submitted patch was word-wrapped.

This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Pioneer DVR-216D can't do SETXFER
Jeff Mahoney [Tue, 19 Apr 2011 15:13:32 +0000 (11:13 -0400)]
libata: Pioneer DVR-216D can't do SETXFER

 Commit 4a5610a04d415ed94af75bb1159d2621d62c8328 fixed an issue with
 the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
 reported a similar issue with his DVR-216D that the NOSETXFER horkage
 worked around for him as well.

 This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.

 The issue was reported at:
 https://bugzilla.novell.com/show_bug.cgi?id=679143

Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: don't enable port irq before handler is registered
Maxime Bizon [Wed, 16 Mar 2011 13:58:32 +0000 (14:58 +0100)]
ahci: don't enable port irq before handler is registered

The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.

Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
Tejun Heo [Wed, 16 Mar 2011 10:14:55 +0000 (11:14 +0100)]
libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65

NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Kill unused ATA_DFLAG_{H|D}IPM flags
Tejun Heo [Wed, 16 Mar 2011 10:14:25 +0000 (11:14 +0100)]
libata: Kill unused ATA_DFLAG_{H|D}IPM flags

ATA_DFLAG_{H|D}IPM flags are no longer used.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: EM supported message type sysfs attribute
Hannes Reinecke [Fri, 4 Mar 2011 08:54:52 +0000 (09:54 +0100)]
ahci: EM supported message type sysfs attribute

This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agokconfig: Avoid buffer underrun in choice input
Ben Hutchings [Sat, 23 Apr 2011 17:42:56 +0000 (18:42 +0100)]
kconfig: Avoid buffer underrun in choice input

Commit 40aee729b350 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.

However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer.  If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agovfs: get rid of insane dentry hashing rules
Linus Torvalds [Sun, 24 Apr 2011 14:58:46 +0000 (07:58 -0700)]
vfs: get rid of insane dentry hashing rules

The dentry hashing rules have been really quite complicated for a long
while, in odd ways.  That made functions like __d_drop() very fragile
and non-obvious.

In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit.  That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).

The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").

That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.

That's just insane.  It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.

So turn all of it around, and make it logical.

Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that.  Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.

And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules.  Now suddently that test at dentry free
time makes sense too.

And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.

Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspe...
Linus Torvalds [Sun, 24 Apr 2011 05:35:16 +0000 (22:35 -0700)]
Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add missing syscore_suspend() and syscore_resume() calls
  PM: Fix error code paths executed after failing syscore_suspend()

13 years agovfs: get rid of 'struct dcache_hash_bucket' abstraction
Linus Torvalds [Sun, 24 Apr 2011 05:32:03 +0000 (22:32 -0700)]
vfs: get rid of 'struct dcache_hash_bucket' abstraction

It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse.  All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.

And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.

As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.

Make it all look and work the same.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Fri, 22 Apr 2011 23:19:19 +0000 (16:19 -0700)]
Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6

* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty/n_gsm: fix bug in CRC calculation for gsm1 mode
  serial/imx: read cts state only after acking cts change irq
  parport_pc.c: correctly release the requested region for the IT887x

13 years agoSECURITY: Move exec_permission RCU checks into security modules
Andi Kleen [Fri, 22 Apr 2011 00:23:19 +0000 (17:23 -0700)]
SECURITY: Move exec_permission RCU checks into security modules

Right now all RCU walks fall back to reference walk when CONFIG_SECURITY
is enabled, even though just the standard capability module is active.
This is because security_inode_exec_permission unconditionally fails
RCU walks.

Move this decision to the low level security module. This requires
passing the RCU flags down the security hook. This way at least
the capability module and a few easy cases in selinux/smack work
with RCU walks with CONFIG_SECURITY=y

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Fri, 22 Apr 2011 21:59:07 +0000 (14:59 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
  ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
  ASoC: add a module alias to the FSI driver
  ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
  ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
  ASoC: codecs: JZ4740: Fix OOPS
  ASoC: Fix output PGA enabling in wm_hubs CODECs
  ASoC: sn95031: decorate function with __devexit_p()
  ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
  ASoC: sst_platform: Fix lock acquring
  ASoC: fsi: driver safely remove for against irq
  ASoC: fsi: modify vague PM control on probe
  ASoC: fsi: take care in failing case of dai register
  MAINTAINERS: Update Samsung ASoC maintainer's id
  ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
  ASoC: Set left channel volume update bits for WM8994
  ASoC: fix config error path
  ASoC: check channel mismatch between cpu_dai and codec_dai
  ASoC: Tegra: Suspend/resume support

13 years ago[PARISC] slub: fix panic with DISCONTIGMEM
James Bottomley [Tue, 19 Apr 2011 21:29:36 +0000 (16:29 -0500)]
[PARISC] slub: fix panic with DISCONTIGMEM

Slub makes assumptions about page_to_nid() which are violated by
DISCONTIGMEM and !NUMA.  This violation results in a panic because
page_to_nid() can be non-zero for pages in the discontiguous ranges and
this leads to a null return by get_node().  The assertion by the
maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
defined.  However, at least six architectures: alpha, ia64, m32r, m68k,
mips, parisc violate this.  The panic is a regression against slab, so
just mark slub broken in the problem configuration to prevent users
reporting these panics.

Cc: stable@kernel.org
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
13 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 22 Apr 2011 18:31:27 +0000 (11:31 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Update/fix Intel Nehalem cache events
  perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
  x86, perf event: Turn off unstructured raw event access to offcore registers
  perf: Support Xeon E7's via the Westmere PMU driver

13 years agoMerge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 22 Apr 2011 18:31:21 +0000 (11:31 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xtensa: Fixup irq conversion fallout and nmi_count

13 years agoperf, x86: Update/fix Intel Nehalem cache events
Peter Zijlstra [Fri, 22 Apr 2011 11:39:56 +0000 (13:39 +0200)]
perf, x86: Update/fix Intel Nehalem cache events

Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.

Using:

main ()
{
        int i;

        for (i = 0; i < 1000000000; i++) {
                asm("mov (%%rsp), %%rbx;"
                    "mov %%rbx, (%%rsp);" : : : "rbx");
        }
}

We find:

 $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
      4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
      1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
         1.565184942  seconds time elapsed   ( +-   0.005% )

The 5b is surprising - we'd expect 1b:

 $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
      1,000,021,961 r10b:u                     ( +-   0.000% )
      1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
         1.565055422  seconds time elapsed   ( +-   0.003% )

Which this patch thus fixes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:21 +0000 (11:03 -0400)]
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow

It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).

Don pointed out that:

 " I also noticed this patch fixed some unknown NMIs
   on a P4 when I stressed the box".

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agox86, perf event: Turn off unstructured raw event access to offcore registers
Ingo Molnar [Fri, 22 Apr 2011 06:44:38 +0000 (08:44 +0200)]
x86, perf event: Turn off unstructured raw event access to offcore registers

Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf: Support Xeon E7's via the Westmere PMU driver
Andi Kleen [Thu, 21 Apr 2011 23:48:35 +0000 (16:48 -0700)]
perf: Support Xeon E7's via the Westmere PMU driver

There's a new model number public, 47, for Xeon E7 (aka Westmere EX).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Thu, 21 Apr 2011 17:50:56 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
  block: don't propagate unlisted DISK_EVENTs to userland
  elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

13 years agoide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
Tejun Heo [Thu, 21 Apr 2011 17:43:59 +0000 (19:43 +0200)]
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd

check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling.  Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.

As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem.  Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoblock: don't propagate unlisted DISK_EVENTs to userland
Tejun Heo [Thu, 21 Apr 2011 17:43:58 +0000 (19:43 +0200)]
block: don't propagate unlisted DISK_EVENTs to userland

DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoelevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
Jens Axboe [Thu, 21 Apr 2011 17:28:35 +0000 (19:28 +0200)]
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>