perf/x86: Disable LBR support for older Intel Atom processors
The patch adds a restriction for Intel Atom LBR support. Only
steppings 10 (PineView) and more recent are supported. Older models
do not have a functional LBR. Their LBR does not freeze on PMU
interrupt which makes LBR unusable in the context of perf_events.
perf/x86: Sync branch stack sampling with precise_sampling
If precise sampling is enabled on Intel x86 then perf_event uses PEBS.
To correct for the off-by-one error of PEBS, perf_event uses LBR when
precise_sample > 1.
On Intel x86 PERF_SAMPLE_BRANCH_STACK is implemented using LBR,
therefore both features must be coordinated as they may not
configure LBR the same way.
For PEBS, LBR needs to capture all branches at the priv level of
the associated event.
This patch checks that the branch type and priv level of BRANCH_STACK
is compatible with that of the PEBS LBR requirement, thereby allowing:
The Intel LBR on some recent processor is capable
of filtering branches by type. The filter is configurable
via the LBR_SELECT MSR register.
There are limitation on how this register can be used.
On Nehalem/Westmere, the LBR_SELECT is shared by the two HT threads
when HT is on. It is private to each core when HT is off.
On SandyBridge, the LBR_SELECT register is private to each thread
when HT is on. It is private to each core when HT is off.
The kernel must manage the sharing of LBR_SELECT. It allows
multiple users on the same logical CPU to use LBR_SELECT as
long as they program it with the same value. Across sibling
CPUs (HT threads), the same restriction applies on NHM/WSM.
This patch implements this sharing logic by leveraging the
mechanism put in place for managing the offcore_response
shared MSR.
We modify __intel_shared_reg_get_constraints() to cause
x86_get_event_constraint() to be called because LBR may
be associated with events that may be counter constrained.
This patch adds the ability to sample taken branches to the
perf_event interface.
The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.
This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.
To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.
Sampled taken branches may be filtered by type and/or priv
levels.
The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.
Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.
The following generic filters are currently defined:
- PERF_SAMPLE_USER
only branches whose targets are at the user level
- PERF_SAMPLE_KERNEL
only branches whose targets are at the kernel level
- PERF_SAMPLE_HV
only branches whose targets are at the hypervisor level
- PERF_SAMPLE_ANY
any type of branches (subject to priv levels filters)
- PERF_SAMPLE_ANY_CALL
any call branches (may incl. syscall on some arch)
- PERF_SAMPLE_ANY_RET
any return branches (may incl. syscall returns on some arch)
- PERF_SAMPLE_IND_CALL
indirect call branches
Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).
The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.
perf tools: Handle kernels that don't support attr.exclude_{guest,host}
Just fall back to resetting those fields, if set, warning the user that
that feature is not available.
If guest samples appear they will just be discarded because no struct
machine will be found and thus the event will be accounted as not
handled and dropped, see 0c09571.
Reported-by: Namhyung Kim <namhyung@gmail.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vuwxig36mzprl5n7nzvnxxsh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Joerg Roedel [Fri, 10 Feb 2012 17:05:05 +0000 (18:05 +0100)]
perf tools: Change perf_guest default back to false
Setting perf_guest to true by default makes no sense because the perf
subcommands can not setup guest symbol information and thus not process
and guest samples. The only exception is perf-kvm which changes the
perf_guest value on its own. So change the default for perf_guest back
to false.
Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jason Wang <jasowang@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328893505-4115-3-git-send-email-joerg.roedel@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Joerg Roedel [Wed, 29 Feb 2012 13:57:32 +0000 (14:57 +0100)]
perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.
This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.
The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Avi Kivity <avi@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: stable@vger.kernel.org # v3.2 Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf probe: Ensure offset provided is not greater than function length without DWARF info too
The 'perf probe' command allows kprobe to be inserted at any offset from
a function start, which results in adding kprobes to unintended
location. (example: perf probe do_fork+10000 is allowed even though
size of do_fork is ~904).
My previous patch https://lkml.org/lkml/2012/2/24/42 addressed the case
where DWARF info was available for the kernel. This patch fixes the
case where perf probe is used on a kernel without debuginfo available.
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F4C544D.1010909@linux.vnet.ibm.com Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 24 Feb 2012 19:31:38 +0000 (12:31 -0700)]
perf tools: Ensure comm string is properly terminated
If threads in a multi-threaded process have names shorter than the main
thread the comm for the named threads is not properly terminated.
E.g., for the process 'namedthreads' where each thread is named noploop%d
where %d is the thread number:
Before:
perf script -f comm,tid,ip,sym,dso
noploop:4ads 21616 400a49 noploop (/tmp/namedthreads)
The 'ads' in the thread comm bleeds over from the process name.
Namhyung Kim [Mon, 20 Feb 2012 01:47:26 +0000 (10:47 +0900)]
perf evlist: Return first evsel for non-sample event on old kernel
On old kernels that don't support sample_id_all feature,
perf_evlist__id2evsel() returns NULL for non-sampling events.
This breaks perf top when multiple events are given on command line. Fix
it by using first evsel in the evlist. This will also prevent getting
the same (potential) problem in such new tool/ old kernel combo.
Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1329702447-25045-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jason Baron [Tue, 28 Feb 2012 18:49:01 +0000 (13:49 -0500)]
static keys: Inline the static_key_enabled() function
In the jump label enabled case, calling static_key_enabled()
results in a function call. The function returns the results of
a compare, so it really doesn't need the overhead of a full
function call. Let's make it 'static inline' for both the jump
label enabled and disabled cases.
Namhyung Kim [Tue, 28 Feb 2012 01:19:38 +0000 (10:19 +0900)]
perf/hwbp: Fix a possible memory leak
If kzalloc() for TYPE_DATA failed on a given cpu, previous chunk
of TYPE_INST will be leaked. Fix it.
Thanks to Peter Zijlstra for suggesting this better solution. It
should work as long as the initial value of the region is all
0's and that's the case of static (per-cpu) memory allocation.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1330391978-28070-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Here are some trivial NTFS changes (a spelling fix and two use before
NULL check cases found by Coverity as well as an update in MAINTAINERS
for the path to the ntfs git repo) together with a simple LDM fix for
parsing fragmented VBLKs.
* git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs:
NTFS: Update git repo path in MAINTAINERS file.
LDM: Fix reassembly of extended VBLKs.
NTFS: Correct two spelling errors "dealocate" to "deallocate" in mft.c.
NTFS: Do not dereference pointer before checking for NULL.
NTFS: Remove unused variable.
Linus Torvalds [Mon, 27 Feb 2012 15:55:51 +0000 (07:55 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/AMD: Fix UP build error
x86: Specify a size for the cmp in the NMI handler
x86/nmi: Test saved %cs in NMI to determine nested NMI case
x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors
x86/microcode: Remove noisy AMD microcode warning
Linus Torvalds [Mon, 27 Feb 2012 15:54:57 +0000 (07:54 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Handle pending irqs in irq_startup()
genirq: Unmask oneshot irqs when thread was not woken
Linus Torvalds [Mon, 27 Feb 2012 05:03:16 +0000 (21:03 -0800)]
Merge tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Two fixes to fix a memory corruption bug when WC pages never get
converted back to WB but end up being recycled in the general memory
pool as WC.
There is a better way of fixing this, but there is not enough time to do
the full benchmarking to pick one of the right options - so picking the
one that favors stability for right now.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pat: Disable PAT support for now.
xen/setup: Remove redundant filtering of PTE masks.
mod/file2alias: make modpost compile on darwin again
commit e49ce14150c64b29a8dd211df785576fa19a9858 breaks cross compiling
the linux kernel on darwin hosts.
This fix introduce some minimal glue to adopt linker section handling
for darwin hosts.
Signed-off-by: Andreas BieĂźmann <andreas@biessmann.de> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Jochen Friedrich <jochen@scram.de> CC: Samuel Ortiz <sameo@linux.intel.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Bernhard Walle <bernhard@bwalle.de>
1) ICMP sockets leave err uninitialized but we try to return it for the
unsupported MSG_OOB case, reported by Dave Jones.
2) Add new Zaurus device ID entries, from Dave Jones.
3) Pointer calculation in hso driver memset is wrong, from Dan
Carpenter.
4) ks8851_probe() checks unsigned value as negative, fix also from Dan
Carpenter.
5) Fix crashes in atl1c driver due to TX queue handling, from Eric
Dumazet. I anticipate some TX side locking fixes coming in the near
future for this driver as well.
6) The inline directive fix in Bluetooth which was breaking the build
only with very new versions of GCC, from Johan Hedberg.
7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
window, reported by Meelis Roos and fixed by Eric Dumazet.
9) Some ip6_route_output() callers test the return value for NULL, but
this never happens as the convention is to return a dst entry with
dst->error set. Fixes from RonQing Li.
10) Logitech Harmony 900 should be handled by zaurus driver not
cdc_ether, update white lists and black lists accordingly. From
Scott Talbert.
11) Receiving from certain kinds of devices there won't be a MAC header,
so there is no MAC header to fixup in the IPSEC code, and if we try
to do it we'll crash. Fix from Eric Dumazet.
12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
Petrilin.
13) Fix regression in link-down handling in davinci_emac which causes
all RX descriptors to be freed up and therefore RX to wedge
completely, from Christian Riesch.
14) It took two attempts, but ctnetlink soft lockups seem to be
cured now, from Pablo Neira Ayuso.
15) Endianness bug fix in ENIC driver, from Santosh Nayak.
16) The long ago conversion of the PPP fragmentation code over to
abstracted SKB list handling wasn't perfect, once we get an
out of sequence SKB we don't flush the rest of them like we
should. From Ben McKeegan.
17) Fix regression of ->ip_summed initialization in sfc driver.
From Ben Hutchings.
18) Bluetooth timeout mistakenly using msecs instead of jiffies,
from Andrzej Kaczmarek.
19) Using _sync variant of work cancellation results in deadlocks,
use the non _sync variants instead. From Andre Guedes.
20) Bluetooth rfcomm code had reference counting problems leading
to crashes, fix from Octavian Purdila.
21) The conversion of netem over to classful qdisc handling added
two bugs to netem_dequeue(), fixes from Eric Dumazet.
22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall.
23) b44_pci_exit() should not have __exit tag since it's invoked from
non-__exit code. From Nikola Pajkovsky.
24) The conversion of the neighbour hash tables over to RCU added a
race, fixed here by adding the necessary reread of tbl->nht, fix
from Michel Machado.
25) When we added VF (virtual function) attributes for network device
dumps, this potentially bloats up the size of the dump of one
network device such that the dump size is too large for the buffer
allocated by properly written netlink applications.
In particular, if you add 255 VFs to a network device, parts of
GLIBC stop working.
To fix this, we add an attribute that is used to turn on these
extended portions of the network device dump. Sophisticaed
applications like 'ip' that want to see this stuff will be changed
to set the attribute, whereas things like GLIBC that don't care
about VFs simply will not, and therefore won't be busted by the
mere presence of VFs on a network device.
Thanks to the tireless work of Greg Rose on this fix.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
sfc: Fix assignment of ip_summed for pre-allocated skbs
ppp: fix 'ppp_mp_reconstruct bad seq' errors
enic: Fix endianness bug.
gre: fix spelling in comments
netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
davinci_emac: Do not free all rx dma descriptors during init
mlx4_core: Fixing array indexes when setting port types
phy: IC+101G and PHY_HAS_INTERRUPT flag
netdev/phy/icplus: Correct broken phy_init code
ipsec: be careful of non existing mac headers
Move Logitech Harmony 900 from cdc_ether to zaurus
hso: memsetting wrong data in hso_get_count()
netfilter: ip6_route_output() never returns NULL.
ethernet/broadcom: ip6_route_output() never returns NULL.
ipv6: ip6_route_output() never returns NULL.
jme: Fix FIFO flush issue
atm: clip: remove clip_tbl
ipv4: ping: Fix recvmsg MSG_OOB error handling.
rtnetlink: Fix problem with buffer allocation
...
Linus Torvalds [Sun, 26 Feb 2012 17:44:55 +0000 (09:44 -0800)]
Fix autofs compile without CONFIG_COMPAT
The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.
Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.
We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.
Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 25 Feb 2012 20:12:08 +0000 (12:12 -0800)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Couple of minor driver fixes.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (max34440) Fix resetting temperature history
hwmon: (f75375s) Fix register write order when setting fans to full speed
hwmon: (ads1015) Fix file leak in probe function
hwmon: (max6639) Fix PPR register initialization to set both channels
hwmon: (max6639) Fix FAN_FROM_REG calculation
Linus Torvalds [Sat, 25 Feb 2012 20:11:25 +0000 (12:11 -0800)]
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
three kbuild fixes for 3.3:
- make deb-pkg symlink race fix.
- make coccicheck fix.
- Dropping the check for modutils. This is not a regression, but
allows the module-init-tools replacement kmod work with the 3.3
kernel.
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
coccicheck: change handling of C={1,2} when M= is set
builddeb: Don't create files in /tmp with predictable names
kbuild: do not check for ancient modutils tools
Ian Kent [Wed, 22 Feb 2012 12:45:44 +0000 (20:45 +0800)]
autofs: work around unhappy compat problem on x86-64
When the autofs protocol version 5 packet type was added in commit 5c0a32fc2cd0 ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.
However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.
And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.
End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.
As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.
Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.
So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.
Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Hutchings [Sat, 25 Feb 2012 00:03:10 +0000 (00:03 +0000)]
sfc: Fix assignment of ip_summed for pre-allocated skbs
When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY. We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.
Commit bc8acf2c8c3e43fcc192762a9f964b3e9a17748b ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE. This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.
Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Linus Torvalds [Sat, 25 Feb 2012 00:08:51 +0000 (16:08 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
SCSI fixes on 20120224:
"This is a set of assorted bug fixes for power management, mpt2sas,
ipr, the rdac device handler and quite a big chunk for qla2xxx (plus a
use after free of scsi_host in scsi_scan.c). "
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] scsi_dh_rdac: Fix for unbalanced reference count
[SCSI] scsi_pm: Fix bug in the SCSI power management handler
[SCSI] scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost'
[SCSI] qla2xxx: Update version number to 8.03.07.13-k.
[SCSI] qla2xxx: Proper detection of firmware abort error code for ISP82xx.
[SCSI] qla2xxx: Remove resetting memory during device initialization for ISP82xx.
[SCSI] qla2xxx: Complete mailbox command timedout to avoid initialization failures during next reset cycle.
[SCSI] qla2xxx: Remove check for null fcport from host reset handler.
[SCSI] qla2xxx: Correct out of bounds read of ISP2200 mailbox registers.
[SCSI] qla2xxx: Remove errant clearing of MBX_INTERRUPT flag during CT-IOCB processing.
[SCSI] qla2xxx: Clear options-flags while issuing stop-firmware mbx command.
[SCSI] qla2xxx: Add an "is reset active" helper.
[SCSI] qla2xxx: Add check for null fcport references in qla2xxx_queuecommand.
[SCSI] qla2xxx: Propagate up abort failures.
[SCSI] isci: Fix NULL ptr dereference when no firmware is being loaded
[SCSI] ipr: fix eeh recovery for 64-bit adapters
[SCSI] mpt2sas: Fix mismatch in mpt2sas_base_hard_reset_handler() mutex lock-unlock
Ben McKeegan [Fri, 24 Feb 2012 06:33:56 +0000 (06:33 +0000)]
ppp: fix 'ppp_mp_reconstruct bad seq' errors
This patch fixes a (mostly cosmetic) bug introduced by the patch
'ppp: Use SKB queue abstraction interfaces in fragment processing'
found here: http://www.spinics.net/lists/netdev/msg153312.html
The above patch rewrote and moved the code responsible for cleaning
up discarded fragments but the new code does not catch every case
where this is necessary. This results in some discarded fragments
remaining in the queue, and triggering a 'bad seq' error on the
subsequent call to ppp_mp_reconstruct. Fragments are discarded
whenever other fragments of the same frame have been lost.
This can generate a lot of unwanted and misleading log messages.
This patch also adds additional detail to the debug logging to
make it clearer which fragments were lost and which other fragments
were discarded as a result of losses. (Run pppd with 'kdebug 1'
option to enable debug logging.)
Signed-off-by: Ben McKeegan <ben@netservers.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 24 Feb 2012 20:32:51 +0000 (12:32 -0800)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] hdpvr: update picture controls to support firmware versions > 0.15
[media] wl128x: fix build errors when GPIOLIB is not enabled
[media] hdpvr: fix race conditon during start of streaming
[media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes
[media] imon: don't wedge hardware after early callbacks
Oleg Nesterov [Fri, 24 Feb 2012 19:07:29 +0000 (20:07 +0100)]
epoll: ep_unregister_pollwait() can use the freed pwq->whead
signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.
Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.
This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Oleg Nesterov [Fri, 24 Feb 2012 19:07:11 +0000 (20:07 +0100)]
epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.
epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.
This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.
__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.
ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.
The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.
In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.
Note:
- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".
Linus Torvalds [Fri, 24 Feb 2012 17:02:53 +0000 (09:02 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Quoth Chris:
"This is later than I wanted because I got backed up running through
btrfs bugs from the Oracle QA teams. But they are all bug fixes that
we've queued and tested since rc1.
Nothing in particular stands out, this just reflects bug fixing and QA
done in parallel by all the btrfs developers. The most user visible
of these is:
Btrfs: clear the extent uptodate bits during parent transid failures
Because that helps deal with out of date drives (say an iscsi disk
that has gone away and come back). The old code wasn't always
properly retrying the other mirror for this type of failure."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
Btrfs: fix compiler warnings on 32 bit systems
Btrfs: increase the global block reserve estimates
Btrfs: clear the extent uptodate bits during parent transid failures
Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
Btrfs: make sure we update latest_bdev
Btrfs: improve error handling for btrfs_insert_dir_item callers
Btrfs: be less strict on finding next node in clear_extent_bit
Btrfs: fix a bug on overcommit stuff
Btrfs: kick out redundant stuff in convert_extent_bit
Btrfs: skip states when they does not contain bits to clear
Btrfs: check return value of lookup_extent_mapping() correctly
Btrfs: fix deadlock on page lock when doing auto-defragment
Btrfs: fix return value check of extent_io_ops
btrfs: honor umask when creating subvol root
btrfs: silence warning in raid array setup
btrfs: fix structs where bitfields and spinlock/atomic share 8B word
btrfs: delalloc for page dirtied out-of-band in fixup worker
Btrfs: fix memory leak in load_free_space_cache()
btrfs: don't check DUP chunks twice
Btrfs: fix trim 0 bytes after a device delete
...
Linus Torvalds [Fri, 24 Feb 2012 17:01:46 +0000 (09:01 -0800)]
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
This is the arch/c6x part of commit 7c43185138cf ("Kbuild: Use dtc's -d
(dependency) option") which was dropped because c6x had not yet been
merged at the time.
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
Kbuild: Use dtc's -d (dependency) option
Kyle McMartin [Fri, 24 Feb 2012 15:36:16 +0000 (10:36 -0500)]
MAINTAINERS: drop me from PA-RISC maintenance
I don't even live in the same country as any of my PA-RISC hardware
these days, so the odds of me touching the code are pretty low.
(Also re-order things to ensure jejb gets CC'd since he's been the
primary maintainer for the last few years.)
David Howells [Thu, 23 Feb 2012 13:50:35 +0000 (13:50 +0000)]
NOMMU: Lock i_mmap_mutex for access to the VMA prio list
Lock i_mmap_mutex for access to the VMA prio list to prevent concurrent
access. Currently, certain parts of the mmap handling are protected by
the region mutex, but not all.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 24 Feb 2012 16:56:51 +0000 (08:56 -0800)]
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
SuperH fixes for 3.3-rc5
* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
sh: Fix sh2a build error for CONFIG_CACHE_WRITETHROUGH
sh: modify a resource of sh_eth_giga1_resources in board-sh7757lcr
arch/sh: remove references to cpu_*_map.
sh: Fix typo in pci-sh7780.c
sh: add platform_device for SPI1 in setup-sh7757
sh: modify resource for SPI0 in setup-sh7757
sh: se7724: fix compile breakage
sh: clkfwk: bugfix: use clk_reparent() for div6 clocks
sh: clock-sh7724: fixup sh_fsi clock settings
sh: sh7757lcr: update to the new MMCIF DMA configuration
sh: fix the sh_mmcif_plat_data in board-sh7757lcr
video: pvr2fb: Fix up spurious section mismatch warnings.
sh: Defer to asm-generic/device.h.
Anton Vorontsov [Fri, 24 Feb 2012 01:14:46 +0000 (05:14 +0400)]
mm: memcg: Correct unregistring of events attached to the same eventfd
There is an issue when memcg unregisters events that were attached to
the same eventfd:
- On the first call mem_cgroup_usage_unregister_event() removes all
events attached to a given eventfd, and if there were no events left,
thresholds->primary would become NULL;
- Since there were several events registered, cgroups core will call
mem_cgroup_usage_unregister_event() again, but now kernel will oops,
as the function doesn't expect that threshold->primary may be NULL.
That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.
FWIW, w/o the patch the following oops may be observed:
Jozsef Kadlecsik [Fri, 24 Feb 2012 10:45:49 +0000 (11:45 +0100)]
netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.
The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Extended VBLKs (those larger than the preset VBLK size) are divided
into fragments, each with its own VBLK header. Our LDM implementation
generally assumes that each VBLK is contiguous in memory, so these
fragments must be assembled before further processing.
Currently the reassembly seems to be done quite wrongly - no VBLK
header is copied into the contiguous buffer, and the length of the
header is subtracted twice from each fragment. Also the total
length of the reassembled VBLK is calculated incorrectly.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Ingo Molnar [Fri, 24 Feb 2012 07:31:31 +0000 (08:31 +0100)]
static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.
Typical usage scenarios:
#include <linux/static_key.h>
struct static_key key = STATIC_KEY_INIT_TRUE;
if (static_key_false(&key))
do unlikely code
else
do likely code
Or:
if (static_key_true(&key))
do likely code
else
do unlikely code
The 'slow' prefix makes it abundantly clear that this is an
expensive operation.
I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.
On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Said commit adds a check whether the carrier link is ok. If the link is
not ok, the skb is freed and no new dma descriptor added to the rx dma
channel. This causes trouble during initialization when the carrier
status has not yet been updated. If a lot of packets are received while
netif_carrier_ok returns false, all dma descriptors are freed and the
rx dma transfer is stopped.
The bug occurs when the board is connected to a network with lots of
traffic and the ifconfig down/up is done, e.g., when reconfiguring
the interface with DHCP.
The bug can be reproduced by flood pinging the davinci board while doing
ifconfig eth0 down
ifconfig eth0 up
on the board.
After that, the rx path stops working and the overrun value reported
by ifconfig is counting up.
Rusty Russell [Wed, 15 Feb 2012 04:58:04 +0000 (15:28 +1030)]
arch/sh: remove references to cpu_*_map.
This has been obsolescent for a while; time for the final push.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mundt <lethal@linux-sh.org> Cc: linux-sh@vger.kernel.org Signed-off-by: Paul Mundt <lethal@linux-sh.org>
With kernel 3.1, Christoph removed i_alloc_sem and replaced it with
calls (namely inode_dio_wait() and inode_dio_done()) which are
EXPORT_SYMBOL_GPL() thus they cannot be used by non-GPL file systems and
further inode_dio_wait() was pushed from notify_change() into the file
system ->setattr() method but no non-GPL file system can make this call.
That means non-GPL file systems cannot exist any more unless they do not
use any VFS functionality related to reading/writing as far as I can
tell or at least as long as they want to implement direct i/o.
Both Linus and Al (and others) have said on LKML that this breakage of
the VFS API should not have happened and that the change was simply
missed as it was not documented in the change logs of the patches that
did those changes.
This patch changes the two function exports in question to be
EXPORT_SYMBOL() thus restoring the VFS API as it used to be - accessible
for all modules.
Christoph, who introduced the two functions and exported them GPL-only
is CC-ed on this patch to give him the opportunity to object to the
symbols being changed in this manner if he did indeed intend them to be
GPL-only and does not want them to become available to all modules.
Signed-off-by: Anton Altaparmakov <anton@tuxera.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 23 Feb 2012 23:38:57 +0000 (15:38 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
A fix from Jesper Juhl removes an assignment in an ASSERT when a compare
is intended. Two fixes from Mitsuo Hayasaka address off-by-ones in XFS
quota enforcement.
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: make inode quota check more general
xfs: change available ranges of softlimit and hardlimit in quota check
XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended
This patch adds the PHY_HAS_INTERRUPT flag for IC+101 device series.
Also the patch does a simple dity-up to signal that
the driver actually is for IP101A LF and IP101G devices.
In fact, these are two similar PHYs that have the same IDs
and mainly differ for the EEE capability supported in the
G series.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David McKay [Tue, 21 Feb 2012 21:24:57 +0000 (21:24 +0000)]
netdev/phy/icplus: Correct broken phy_init code
The code for ip1001_config_init() was totally broken if you were not
using RGMII. Instead of returning an error code or zero it actually
returned the value in the IP1001_SPEC_CTRL_STATUS_2 register. It was
also trying to set the IP1001_APS_ON bit , but never actually wrote
back the register.
The error checking was also incorrect in both this function and the
reset function, so this patch fixes that up in a consistent fashion.
Signed-off-by: David McKay <david.mckay@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 23 Feb 2012 19:48:36 +0000 (11:48 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
BenH says:
'Here are a few more powerpc bits for you. A stupid regression I
introduced with my previous commit to "fix" program check exceptions
(brown paper bag for me), fix the cpuidle default, a bug fix for
something that isn't strictly speaking a regression but some upstream
changes causes it to show in lockdep now while it didn't before, and
finally a trivial one for rusty to make his life easier later on
removing the old cpumask cruft. '
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix various issues with return to userspace
cpuidle: Default y on powerpc pSeries
powerpc: Fix program check handling when lockdep is enabled
powerpc: Remove references to cpu_*_map
Chris Mason [Wed, 22 Feb 2012 17:36:24 +0000 (12:36 -0500)]
Btrfs: clear the extent uptodate bits during parent transid failures
If btrfs reads a block and finds a parent transid mismatch, it clears
the uptodate flags on the extent buffer, and the pages inside it. But
we only clear the uptodate bits in the state tree if the block straddles
more than one page.
This is from an old optimization from to reduce contention on the extent
state tree. But it is buggy because the code that retries a read from
a different copy of the block is going to find the uptodate state bits
set and skip the IO.
The end result of the bug is that we'll never actually read the good
copy (if there is one).
The fix here is to always clear the uptodate state bits, which is safe
because this code is only called when the parent transid fails.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Nikolaus Schulz [Wed, 22 Feb 2012 22:18:44 +0000 (23:18 +0100)]
hwmon: (f75375s) Fix register write order when setting fans to full speed
By hwmon sysfs interface convention, setting pwm_enable to zero sets a fan
to full speed. In the f75375s driver, this need be done by enabling
manual fan control, plus duty mode for the F875387 chip, and then setting
the maximum duty cycle. Fix a bug where the two necessary register writes
were swapped, effectively discarding the setting to full-speed.
Guenter Roeck [Wed, 22 Feb 2012 16:13:52 +0000 (08:13 -0800)]
hwmon: (ads1015) Fix file leak in probe function
An error while creating sysfs attribute files in the driver's probe function
results in an error abort, but already created files are not removed. This patch
fixes the problem.
David Smith [Tue, 7 Feb 2012 16:11:05 +0000 (10:11 -0600)]
tracepoint, vfs, sched: Add exec() tracepoint
Added a minimal exec tracepoint. Exec is an important major event
in the life of a task, like fork(), clone() or exit(), all of
which we already trace.
[ We also do scheduling re-balancing during exec() - so it's useful
from a scheduler instrumentation POV as well. ]
If you want to watch a task start up, when it gets exec'ed is a good place
to start. With the addition of this tracepoint, exec's can be monitored
and better picture of general system activity can be obtained. This
tracepoint will also enable better process life tracking, allowing you to
answer questions like "what process keeps starting up binary X?".
This tracepoint can also be useful in ftrace filtering and trigger
conditions: i.e. starting or stopping filtering when exec is called.
Signed-off-by: David Smith <dsmith@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Doug Ledford [Mon, 20 Feb 2012 17:19:03 +0000 (12:19 -0500)]
mlx4_core: Exported functions can't be static
At least on powerpc, it breaks the build if exported functions are
static. Fix some static exported functions introduced with the mlx4
SR-IOV support added in 3.3-rc1.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Scott Talbert [Tue, 21 Feb 2012 13:06:00 +0000 (13:06 +0000)]
Move Logitech Harmony 900 from cdc_ether to zaurus
In the current kernel implementation, the Logitech Harmony 900 remote
control is matched to the cdc_ether driver through the generic
USB_CDC_SUBCLASS_MDLM entry. However, this device appears to be of the
pseudo-MDLM (Belcarra) type, rather than the standard one. This patch
blacklists the Harmony 900 from the cdc_ether driver and whitelists it for
the pseudo-MDLM driver in zaurus.
Signed-off-by: Scott Talbert <talbert@techie.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 21 Feb 2012 21:30:25 +0000 (21:30 +0000)]
hso: memsetting wrong data in hso_get_count()
The intent was to clear out the icount struct here, but we accidentally
clear stack memory instead. It probably will lead to a NULL dereference
right away.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>