git.karo-electronics.de Git - karo-tx-linux.git/log

drm/ast: deal with bo reserve fail in dirty update path

commit 306373b645d80625335b8e684fa09b14ba460cec upstream.

Port over the mgag200 fix to ast as it suffers the same issue.

    On F19 testing, it was noticed we get a lot of errors in dmesg
    about being unable to reserve the buffer when plymouth starts,
    this is due to the buffer being in the process of migrating,
    so it makes sense we can't reserve it.

    In order to deal with it, this adds delayed updates for the dirty
    updates, when the bo is unreservable, in the normal console case
    this shouldn't ever happen, its just when plymouth or X is
    pushing the console bo to system memory.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/prime: keep a reference from the handle to exported dma-buf (v6)

commit 219b47339ced80ca580bb6ce7d1636166984afa7 upstream.

Currently we have a problem with this:
1. i915: create gem object
2. i915: export gem object to prime
3. radeon: import gem object
4. close prime fd
5. radeon: unref object
6. i915: unref object

i915 has an imported object reference in its file priv, that isn't
cleaned up properly until fd close. The reference gets added at step 2,
but at step 6 we don't have enough info to clean it up.

The solution is to take a reference on the dma-buf when we export it,
and drop the reference when the gem handle goes away.

So when we export a dma_buf from a gem object, we keep track of it
with the handle, we take a reference to the dma_buf. When we close
the handle (i.e. userspace is finished with the buffer), we drop
the reference to the dma_buf, and it gets collected.

This patch isn't meant to fix any other problem or bikesheds, and it doesn't
fix any races with other scenarios.

v1.1: move export symbol line back up.

v2: okay I had to do a bit more, as the first patch showed a leak
on one of my tests, that I found using the dma-buf debugfs support,
the problem case is exporting a buffer twice with the same handle,
we'd add another export handle for it unnecessarily, however
we now fail if we try to export the same object with a different gem handle,
however I'm not sure if that is a case I want to support, and I've
gotten the code to WARN_ON if we hit something like that.

v2.1: rebase this patch, write better commit msg.
v3: cleanup error handling, track import vs export in linked list,
these two patches were separate previously, but seem to work better
like this.
v4: danvet is correct, this code is no longer useful, since the buffer
better exist, so remove it.
v5: always take a reference to the dma buf object, import or export.
(Imre Deak contributed this originally)
v6: square the circle, remove import vs export tracking now
that there is no difference

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/gma500: fix backlight hotkeys behaviour on netbooks

commit e127dc28cc3057575da0216cde85687153ca180f upstream.

Backlight hotkeys weren't working before on certain cedartrail laptops.

The source of this problem is that the hotkeys' ASLE opregion interrupts
were simply ignored. Driver seemed to expect the interrupt to be
associated with a pipe, but it wasn't.

Accepting the ASLE interrupt without an associated pipe event flag fixes
the issue, the backlight code is called when needed, making the
brightness keys work properly.

[patrik: This patch affects irq handling on any netbook with opregion support]

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=833597
Reference: http://lists.freedesktop.org/archives/dri-devel/2012-July/025279.html
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/mgag200: deal with bo reserve fail in dirty update path

commit 641719599528d806e00de8ae8c8453361266a312 upstream.

On F19 testing, it was noticed we get a lot of errors in dmesg
about being unable to reserve the buffer when plymouth starts,
this is due to the buffer being in the process of migrating,
so it makes sense we can't reserve it.

In order to deal with it, this adds delayed updates for the dirty
updates, when the bo is unreservable, in the normal console case
this shouldn't ever happen, its just when plymouth or X is
pushing the console bo to system memory.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/cirrus: deal with bo reserve fail in dirty update path

commit f3b2bbdc8a87a080ccd23d27fca4b87d61340dd4 upstream.

Port over the mgag200 fix to cirrus as it suffers the same issue.

    On F19 testing, it was noticed we get a lot of errors in dmesg
    about being unable to reserve the buffer when plymouth starts,
    this is due to the buffer being in the process of migrating,
    so it makes sense we can't reserve it.

    In order to deal with it, this adds delayed updates for the dirty
    updates, when the bo is unreservable, in the normal console case
    this shouldn't ever happen, its just when plymouth or X is
    pushing the console bo to system memory.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: fix max discard sectors limit

commit 871dd9286e25330c8a581e5dacfa8b1dfe1dd641 upstream.

linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
commit 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9
(block: add plug for blkdev_issue_discard )

For example,
1) DISCARD rq-1 with size size 4GB
2) DISCARD rq-2 with size size 1GB

If these 2 discard requests get merged, final request size will be 5GB.

In this case, request's __data_len field may overflow as it can store
max 4GB(unsigned int).

This issue was observed while doing mkfs.f2fs on 5GB SD card:
https://lkml.org/lkml/2013/4/1/292

Info: sector size = 512
Info: total sectors = 11370496 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
[  257.789764] blk_update_request: bio idx 0 >= vcnt 0

mkfs process gets stuck in D state and I see the following in the dmesg:

[  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
[  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
[  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656
[  257.789764] blk_update_request: bio idx 0 >= vcnt 0
[  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
[  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
[  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656

This patch fixes this issue.

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: Ignore the 'write' ESR flag on cache maintenance faults

commit 0e7f7bcc3fc87489cda5aa6aff8ce40eed912279 upstream.

ESR.WnR bit is always set on data cache maintenance faults even though
the page is not required to have write permission. If a translation
fault (page not yet mapped) happens for read-only user address range,
Linux incorrectly assumes a permission fault. This patch adds the check
of the ESR.CM bit during the page fault handling to ignore the 'write'
flag.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Tim Northover <Tim.Northover@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled

commit 5b0c275926b8149c555da874bb4ec258ea3292aa upstream.

Commit c079c28714e4 ("RDMA/cxgb4: Fix error handling in create_qp()")
broke SQ allocation.  Instead of falling back to host allocation when
on-chip allocation fails, it tries to allocate both.  And when it
does, and we try to free the address from the genpool using the host
address, we hit a BUG and the system crashes as below.

We create a new function that has the previous behavior and properly
propagate the error, as intended.

    kernel BUG at /usr/src/packages/BUILD/kernel-ppc64-3.0.68/linux-3.0/lib/genalloc.c:340!
    Oops: Exception in kernel mode, sig: 5 [#1]
    SMP NR_CPUS=1024 NUMA pSeries
    Modules linked in: rdma_ucm rdma_cm ib_addr ib_cm iw_cm ib_sa ib_mad ib_uverbs iw_cxgb4 ib_core ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop dm_mod ipv6 ipv6_lib sr_mod cdrom ibmveth(X) cxgb4 sg ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh ibmvscsic(X) scsi_transport_srp scsi_tgt scsi_mod
    Supported: Yes
    NIP: c00000000037d41c LR: d000000003913824 CTR: c00000000037d3b0
    REGS: c0000001f350ae50 TRAP: 0700   Tainted: G            X  (3.0.68-0.9-ppc64)
    MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24042482  XER: 00000001
    TASK = c0000001f6f2a840[3616] 'rping' THREAD: c0000001f3508000 CPU: 0
    GPR00: c0000001f6e875c8 c0000001f350b0d0 c000000000fc9690 c0000001f6e875c0
    GPR04: 00000000000c0000 0000000000010000 0000000000000000 c0000000009d482a
    GPR08: 000000006a170000 0000000000100000 c0000001f350b140 c0000001f6e875c8
    GPR12: d000000003915dd0 c000000003f40000 000000003e3ecfa8 c0000001f350bea0
    GPR16: c0000001f350bcd0 00000000003c0000 0000000000040100 c0000001f6e74a80
    GPR20: d00000000399a898 c0000001f6e74ac8 c0000001fad91600 c0000001f6e74ab0
    GPR24: c0000001f7d23f80 0000000000000000 0000000000000002 000000006a170000
    GPR28: 000000000000000c c0000001f584c8d0 d000000003925180 c0000001f6e875c8
    NIP [c00000000037d41c] .gen_pool_free+0x6c/0xf8
    LR [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4]
    Call Trace:
    [c0000001f350b0d0] [c0000001f350b180] 0xc0000001f350b180 (unreliable)
    [c0000001f350b170] [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4]
    [c0000001f350b210] [d00000000390fd70] .dealloc_sq+0x90/0xb0 [iw_cxgb4]
    [c0000001f350b280] [d00000000390fe08] .destroy_qp+0x78/0xf8 [iw_cxgb4]
    [c0000001f350b310] [d000000003912738] .c4iw_destroy_qp+0x208/0x2d0 [iw_cxgb4]
    [c0000001f350b460] [d000000003861874] .ib_destroy_qp+0x5c/0x130 [ib_core]
    [c0000001f350b510] [d0000000039911bc] .ib_uverbs_cleanup_ucontext+0x174/0x4f8 [ib_uverbs]
    [c0000001f350b5f0] [d000000003991568] .ib_uverbs_close+0x28/0x70 [ib_uverbs]
    [c0000001f350b670] [c0000000001e7b2c] .__fput+0xdc/0x278
    [c0000001f350b720] [c0000000001a9590] .remove_vma+0x68/0xd8
    [c0000001f350b7b0] [c0000000001a9720] .exit_mmap+0x120/0x160
    [c0000001f350b8d0] [c0000000000af330] .mmput+0x80/0x160
    [c0000001f350b960] [c0000000000b5d0c] .exit_mm+0x1ac/0x1e8
    [c0000001f350ba10] [c0000000000b8154] .do_exit+0x1b4/0x4b8
    [c0000001f350bad0] [c0000000000b84b0] .do_group_exit+0x58/0xf8
    [c0000001f350bb60] [c0000000000ce9f4] .get_signal_to_deliver+0x2f4/0x5d0
    [c0000001f350bc60] [c000000000017ee4] .do_signal_pending+0x6c/0x3e0
    [c0000001f350bdb0] [c0000000000182cc] .do_signal+0x74/0x78
    [c0000001f350be30] [c000000000009e74] do_work+0x24/0x28

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Emil Goode <emilgoode@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

r8169: fix 8168evl frame padding.

commit e5195c1f31f399289347e043d6abf3ffa80f0005 upstream.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: add check for inodes_count overflow in new resize ioctl

commit 3f8a6411fbada1fa482276591e037f3b1adcf55b upstream.

Addresses-Red-Hat-Bugzilla: #913245

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths

commit 906b1c394d0906a154fbdc904ca506bceb515756 upstream.

The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too

commit f83a7ea2075ca896f2dbf07672bac9cf3682ff74 upstream.

Alex Efros reported rpfilter module doesn't match following packets:
IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
(netfilter bugzilla #814).

Problem is that network stack arranges for the locally generated broadcasts
to appear on the interface they were sent out, so the IFF_LOOPBACK check
doesn't trigger.

As -m rpfilter is restricted to PREROUTING, we can check for existing
rtable instead, it catches locally-generated broad/multicast case, too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: ctnetlink: don't permit ct creation with random tuple

commit 442fad9423b78319e0019a7f5047eddf3317afbc upstream.

Userspace can cause kernel panic by not specifying orig/reply
tuple: kernel will create a tuple with random stack values.

Problem is that tuple.dst.dir will be random, too, which
causes nf_ct_tuplehash_to_ctrack() to return garbage.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_ct_helper: don't discard helper if it is actually the same

commit 6e2f0aa8cf8892868bf2c19349cb5d7c407f690d upstream.

commit (32f5376 netfilter: nf_ct_helper: disable automatic helper
re-assignment of different type) broke transparent proxy scenarios.

For example, initial helper lookup might yield "ftp" (dport 21),
while re-lookup after REDIRECT yields "ftp-2121".

This causes the autoassign code to toss the ftp helper, even
though these are just different instances of the same helper.

Change the test to check for the helper function address instead
of the helper address, as suggested by Pablo.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: ipset: "Directory not empty" error message

commit dd82088dab3646ed28e4aa43d1a5b5d5ffc2afba upstream.

When an entry flagged with "nomatch" was tested by ipset, it
returned the error message "Kernel error received:
Directory not empty" instead of "<element> is NOT in set <setname>"
(reported by John Brendler).

The internal error code was not properly transformed before returning
to userspace, fixed.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_ct_sip: don't drop packets with offsets pointing outside the packet

commit 3a7b21eaf4fb3c971bdb47a98f570550ddfe4471 upstream.

Some Cisco phones create huge messages that are spread over multiple packets.
After calculating the offset of the SIP body, it is validated to be within
the packet and the packet is dropped otherwise. This breaks operation of
these phones. Since connection tracking is supposed to be passive, just let
those packets pass unmodified and untracked.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: ipset: list:set: fix reference counter update

commit 02f815cb6d3f57914228be84df9613ee5a01c2e6 upstream.

The last element can be replaced or pushed off and in both
cases the reference counter must be updated.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_nat: fix race when unloading protocol modules

commit c2d421e171868586939c328dfb91bab840fe4c49 upstream.

following oops was reported:
RIP: 0010:[<ffffffffa03227f2>] [<ffffffffa03227f2>] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat]
RSP: 0018:ffff880202c63d40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0
RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8
[..]
Call Trace:
[..]
[<ffffffffa02febed>] destroy_conntrack+0xbd/0x110 [nf_conntrack]

Happens when a conntrack timeout expires right after first part
of the nat cleanup has completed (bysrc hash removal), but before
part 2 has completed (re-initialization of nat area).

[ destroy callback tries to delete bysrc again ]

Patrick suggested to just remove the affected conntracks -- the
connections won't work properly anyway without nat transformation.

So, lets do that.

Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipvs: ip_vs_sip_fill_param() BUG: bad check of return value

commit f7a1dd6e3ad59f0cfd51da29dfdbfd54122c5916 upstream.

The reason for this patch is crash in kmemdup
caused by returning from get_callid with uniialized
matchoff and matchlen.

Removing Zero check of matchlen since it's done by ct_sip_get_header()

BUG: unable to handle kernel paging request at ffff880457b5763f
IP: [<ffffffff810df7fc>] kmemdup+0x2e/0x35
PGD 27f6067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: xt_state xt_helper nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle xt_connmark xt_conntrack ip6_tables nf_conntrack_ftp ip_vs_ftp nf_nat xt_tcpudp iptable_mangle xt_mark ip_tables x_tables ip_vs_rr ip_vs_lblcr ip_vs_pe_sip ip_vs nf_conntrack_sip nf_conntrack bonding igb i2c_algo_bit i2c_core
CPU 5
Pid: 0, comm: swapper/5 Not tainted 3.9.0-rc5+ #5                  /S1200KP
RIP: 0010:[<ffffffff810df7fc>]  [<ffffffff810df7fc>] kmemdup+0x2e/0x35
RSP: 0018:ffff8803fea03648  EFLAGS: 00010282
RAX: ffff8803d61063e0 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000003 RSI: ffff880457b5763f RDI: ffff8803d61063e0
RBP: ffff8803fea03658 R08: 0000000000000008 R09: 0000000000000011
R10: 0000000000000011 R11: 00ffffffff81a8a3 R12: ffff880457b5763f
R13: ffff8803d67f786a R14: ffff8803fea03730 R15: ffffffffa0098e90
FS:  0000000000000000(0000) GS:ffff8803fea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880457b5763f CR3: 0000000001a0c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/5 (pid: 0, threadinfo ffff8803ee18c000, task ffff8803ee18a480)
Stack:
ffff8803d822a080 000000000000001c ffff8803fea036c8 ffffffffa000937a
ffffffff81f0d8a0 000000038135fdd5 ffff880300000014 ffff880300110000
ffffffff150118ac ffff8803d7e8a000 ffff88031e0118ac 0000000000000000
Call Trace:
<IRQ>

[<ffffffffa000937a>] ip_vs_sip_fill_param+0x13a/0x187 [ip_vs_pe_sip]
[<ffffffffa007b209>] ip_vs_sched_persist+0x2c6/0x9c3 [ip_vs]
[<ffffffff8107dc53>] ? __lock_acquire+0x677/0x1697
[<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
[<ffffffff8100972e>] ? native_sched_clock+0x3c/0x7d
[<ffffffff810649bc>] ? sched_clock_cpu+0x43/0xcf
[<ffffffffa007bb1e>] ip_vs_schedule+0x181/0x4ba [ip_vs]
...

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xhci: Don't warn on empty ring for suspended devices.

commit a83d6755814e4614ba77e15d82796af0f695c6b8 upstream.

When a device attached to the roothub is suspended, the endpoint rings
are stopped. The host may generate a completion event with the
completion code set to 'Stopped' or 'Stopped Invalid' when the ring is
halted. The current xHCI code prints a warning in that case, which can
be really annoying if the USB device is coming into and out of suspend.

Remove the unnecessary warning.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e1000e: fix accessing to suspended device

commit e60b22c5b7e59db09a7c9490b1e132c7e49ae904 upstream.

This patch fixes some annoying messages like 'Error reading PHY register' and
'Hardware Erorr' and saves several seconds on reboot.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Tóth Attila <atoth@atoth.sote.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e1000e: fix runtime power management transitions

commit 66148babe728f3e00e13c56f6b0ecf325abd80da upstream.

This patch removes redundant actions from driver and fixes its interaction
with actions in pci-bus runtime power management code.

It removes pci_save_state() from __e1000_shutdown() for normal adapters,
PCI bus callbacks pci_pm_*() will do all this for us. Now __e1000_shutdown()
switches to D3-state only quad-port adapters, because they needs quirk for
clearing false-positive error from downsteam pci-e port.

pci_save_state() now called after clearing bus-master bit, thus __e1000_resume()
and e1000_io_slot_reset() must set it back after restoring configuration space.

This patch set get_link_status before calling pm_runtime_put() in e1000_open()
to allow e1000_idle() get real link status and schedule first runtime suspend.

This patch also enables wakeup for device if management mode is enabled
(like for WoL) as result pci_prepare_to_sleep() would setup wakeup without
special actions like custom 'enable_wakeup' sign.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Tóth Attila <atoth@atoth.sote.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI/PM: Clear state_saved during suspend

commit 82fee4d67ab86d6fe5eb0f9a9e988ca9d654d765 upstream.

This patch clears pci_dev->state_saved at the beginning of suspending.
PCI config state may be saved long before that. Some drivers call
pci_save_state() from the ->probe() callback to get snapshot of sane
configuration space to use in the ->slot_reset() callback.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> # add comment
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tóth Attila <atoth@atoth.sote.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

commit 7cc23cd6c0c7d7f4bee057607e7ce01568925717 upstream.

We should always have proper privileges when requesting kernel
data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl
[ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel/lbr: Fix LBR filter

commit 6e15eb3ba6c0249c9e8c783517d131b47db995ca upstream.

The LBR 'from' adddress is under full userspace control; ensure
we validate it before reading from it.

Note: is_module_text_address() can potentially be quite
expensive; for those running into that with high overhead
in modules optimize it using an RCU backed rb-tree.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel: Fix unintended variable name reuse

commit 1b0dac2ac6debdbf1541e15f2cede03613cf4465 upstream.

The variable name events_group is already in used and led to a
compilation error when using clang to build the Linux Kernel .
The fix is just to rename the var. No functional change. Please
apply.

Fix suggested in discussion by PaX Team <pageexec@freemail.hu>

Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Cc: rostedt@goodmis.org
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/1367316153-14808-1-git-send-email-dl9pf@gmx.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel: Add support for IvyBridge model 58 Uncore

commit 9a6bc14350b130427725f33e371e86212fa56c85 upstream.

According to Intel Vol3b 18.9, the IvyBridge model 58 uncore is
the same as that of SandyBridge.

I've done some simple tests and with this patch things seem to
work on my mac-mini.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@gmail.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1304291549320.15827@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/eth/ibmveth: Fixup retrieval of MAC address

commit 13f85203e1060da83d9ec1c1c5a63343eaab8de4 upstream.

Some ancient pHyp versions used to create a 8 bytes local-mac-address
property in the device-tree instead of a 6 bytes one for veth.

The Linux driver code to deal with that is an insane hack which also
happens to break with some choices of MAC addresses in qemu by testing
for a bit in the address rather than just looking at the size of the
property.

Sanitize this by doing the latter instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iommu/amd: Properly initialize irq-table lock

commit 197887f03daecdb3ae21bafeb4155412abad3497 upstream.

Fixes a lockdep warning.

Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hugetlbfs: fix mmap failure in unaligned size request

commit af73e4d9506d3b797509f3c030e7dcd554f7d9c4 upstream.

The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned. This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.

This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds
alignment code in caller side. And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: <iceman_dvd@yahoo.com>
Cc: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

autofs - remove autofs dentry mount check

commit ce8a5dbdf9e709bdaf4618d7ef8cceb91e8adc69 upstream.

When checking if an autofs mount point is busy it isn't sufficient to
only check if it's a mount point.

For example, if the mount of an offset mountpoint in a tree is denied
for this host by its export and the dentry becomes a process working
directory the check incorrectly returns the mount as not in use at
expire.

This can happen since the default when mounting within a tree is
nostrict, which means ingnore mount fails on mounts within the tree and
continue. The nostrict option is meant to allow mounting in this case.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pwm: spear: Fix checking return value of clk_enable() and clk_prepare()

commit 563861cd633ae52932843477bb6ca3f1c9e2f78b upstream.

The logic to check return value of clk_enable() and clk_prepare() is reversed,
fix it.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: fix numa distance for form0 device tree

commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.

The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f24124df89e38067b3766b7bef06ddb
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes. This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same. (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: Emulate non privileged DSCR read and write

commit 73d2fb758e678c93bc76d40876c2359f0729b0ef upstream.

POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.

Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.

A simple test was created to verify the fix:

http://ozlabs.org/~anton/junkcode/user_dscr_test.c

Without the patch we get a SIGILL and it passes with the patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen/arm: actually pass a non-NULL percpu pointer to request_percpu_irq

commit 2798ba7d19aed645663398a21ec4006bfdbb1ef3 upstream.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Ian Campbell <ian.camjpbell@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.8.12

mfd: adp5520: Restore mode bits on resume

commit c6cc25fda58da8685ecef3f179adc7b99c8253b2 upstream.

The adp5520 unfortunately also clears the BL_EN bit when the nSTNDBY bit is
cleared. So we need to make sure to restore it during resume if it was set
before suspend.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rcutrace: single_open() leaks

commit 7ee2b9e56495c56dcaffa2bab19b39451d9fdc8a upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: atmel-mci: pio hang on block errors

commit bdbc5d0c60f3e9de3eeccf1c1a18bdc11dca62cc upstream.

The driver is doing, by default, multi-block reads. When a block error
occurs, card/block.c instigates a single block read: "mmcblk0: retrying
using single block read". It leaves the sg chain intact and just changes
the length attribute for the first sg entry and the overall sg_len
parameter. When atmci_read_data_pio is called to read the single block
of data it ignores the sg_len and expects to read more than 512 bytes as
it sees there are multiple items in the sg list. No more data comes as
the controller has only been commanded to get one block.

Signed-off-by: Terry Barnaby <terry@beam.ltd.uk>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: core: Fix bit width test failing on old eMMC cards

commit 836dc2fe89c968c10cada87e0dfae6626f8f9da3 upstream.

PARTITION_SUPPORT needs to be set before doing the compare on version
number so the bit width test does not get invalid data. Before this
patch, a Sandisk iNAND eMMC card would detect 1-bit width although
the hardware supports 4-bit.

Only affects old emmc devices - pre 4.4 devices.

Reported-by: Elad Yi <elad.yi@gmail.com>
Signed-off-by: Philip Rakity <prakity@yahoo.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86: Eliminate irq_mis_count counted in arch_irq_stat

commit f7b0e1055574ce06ab53391263b4e205bf38daf3 upstream.

With the current implementation, kstat_cpu(cpu).irqs_sum is also
increased in case of irq_mis_count increment.

So there is no need to count irq_mis_count in arch_irq_stat,
otherwise irq_mis_count will be counted twice in the sum of
/proc/stat.

Reported-by: Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
Acked-by: Liu Chuansheng <chuansheng.liu@intel.com>
Cc: tomoki.sekiyama.qu@hitachi.com
Cc: joe@perches.com
Link: http://lkml.kernel.org/r/1366980611.32469.7.camel@fli24-HP-Compaq-8100-Elite-CMT-PC
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions

commit 660696d1d16a71e15549ce1bf74953be1592bcd3 upstream.

Source operand for one byte mov[zs]x is decoded incorrectly if it is in
high byte register. Fix that.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Give the OID registry file module info to avoid kernel tainting

commit 9e6879460c8edb0cd3c24c09b83d06541b5af0dc upstream.

Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.

Reported-by: Dros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: at91/avr32/atmel-mci: fix DMA-channel leak on module unload

commit 91cf54feecf815bec0b6a8d6d9dbd0e219f2f2cc upstream.

Fix regression introduced by commit 796211b7953 ("mmc: atmel-mci: add
pdc support and runtime capabilities detection") which removed the need
for CONFIG_MMC_ATMELMCI_DMA but kept the Kconfig-entry as well as the
compile guards around dma_release_channel() in remove(). Consequently,
DMA is always enabled (if supported), but the DMA-channel is not
released on module unload unless the DMA-config option is selected.

Remove the no longer used CONFIG_MMC_ATMELMCI_DMA option completely.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG

commit 7f3e3c7cfcec148ccca9c0dd2dbfd7b00b7ac10f upstream.

Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
change made by commit a0b30c1229: ext4: use module parameters instead
of debugfs for mballoc_debug

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix online resizing for ext3-compat file systems

commit c5c72d814cf0f650010337c73638b25e6d14d2d4 upstream.

Commit fb0a387dcdc restricts block allocations for indirect-mapped
files to block groups less than s_blockfile_groups. However, the
online resizing code wasn't setting s_blockfile_groups, so the newly
added block groups were not available for non-extent mapped files.

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix big-endian bug in metadata checksum calculations

commit 171a7f21a76a0958c225b97c00a97a10390d40ee upstream.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix journal callback list traversal

commit 5d3ee20855e28169d711b394857ee608a5023094 upstream.

It is incorrect to use list_for_each_entry_safe() for journal callback
traversial because ->next may be removed by other task:
->ext4_mb_free_metadata()
  ->ext4_mb_free_metadata()
    ->ext4_journal_callback_del()

This results in the following issue:

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70

This patch fix the issue as follows:
- ext4_journal_commit_callback() make list truly traversial safe
  simply by always starting from list_head
- fix race between two ext4_journal_callback_del() and
  ext4_journal_callback_try_del()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback

commit 794446c6946513c684d448205fbd76fa35f38b72 upstream.

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&journal->j_list_lock);
                                         ->jbd2_journal_remove_checkpoint()
   ->jbd2_journal_free_transaction();
     ->kmem_cache_free(transaction)
  ->j_commit_callback(journal, transaction);
    -> USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

commit d76a3a77113db020d9bb1e894822869410450bd9 upstream.

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ixgbe: fix EICR write in ixgbe_msix_other

commit d87d830720a1446403ed38bfc2da268be0d356d1 upstream.

Previously, the ixgbe_msix_other was writing the full 32bits of the set
interrupts, instead of only the ones which the ixgbe_msix_other is
handling. This resulted in a loss of performance when the X540's PPS feature is
enabled due to sometimes clearing queue interrupts which resulted in the driver
not getting the interrupt for cleaning the q_vector rings often enough. The fix
is to simply mask the lower 16bits off so that this handler does not write them
in the EICR, which causes them to remain high and be properly handled by the
clean_rings interrupt routine as normal.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipc: sysv shared memory limited to 8TiB

commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
459 {
...
465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
474         if (ns->shm_tot + numpages > ns->shm_ctlall)
475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt <holt@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wireless: regulatory: fix channel disabling race condition

commit 990de49f74e772b6db5208457b7aa712a5f4db86 upstream.

When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz
part of the scan disables a 5.2 GHz channel due to, e.g. receiving
country or frequency information, that 5.2 GHz channel might already
be in the list of channels to scan next. Then, when the driver checks
if it should do a passive scan, that will return false and attempt an
active scan. This is not only wrong but can also lead to the iwlwifi
device firmware crashing since it checks regulatory as well.

Fix this by not setting the channel flags to just disabled but rather
OR'ing in the disabled flag. That way, even if the race happens, the
channel will be scanned passively which is still (mostly) correct.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: Decode and send 64bit time values

commit bf8d909705e9d9bac31d9b8eac6734d2b51332a7 upstream.

The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: don't run get_file if nfs4_preprocess_stateid_op return error

commit b022032e195ffca83d7002d6b84297d796ed443b upstream.

we should return error status directly when nfs4_preprocess_stateid_op
return error.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: don't close read-write opens too soon

commit 0c7c3e67ab91ec6caa44bdf1fc89a48012ceb0c5 upstream.

Don't actually close any opens until we don't need them at all.

This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall

commit 8b6cc4d6f841d31f72fe7478453759166d366274 upstream.

A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MD: ignore discard request for hard disks of hybid raid1/raid10 array

commit 32f9f570d04461a41bdcd5c1d93b41ebc5ce182a upstream.

In SSD/hard disk hybid storage, discard request should be ignored for hard
disk. We used to be doing this way, but the unplug path forgets it.

This is suitable for stable tree since v3.6.

Reported-and-tested-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: bad block list should default to disabled.

commit 486adf72ccc0c235754923d47a2270c5dcb0c98b upstream.

Maintenance of a bad-block-list currently defaults to 'enabled'
and is then disabled when it cannot be supported.
This is backwards and causes problem for dm-raid which didn't know
to disable it.

So fix the defaults, and only enabled for v1.x metadata which
explicitly has bad blocks enabled.

The problem with dm-raid has been present since badblock support was
added in v3.1, so this patch is suitable for any -stable from 3.1
onwards.

Reported-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot

commit 1dfd89af8697a299e7982ae740d4695ecd917eef upstream.

After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.

Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.

Reported-by: Marc Eshel <eshel@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

exec: do not abuse ->cred_guard_mutex in threadgroup_lock()

commit e56fb2874015370e3b7f8d85051f6dce26051df9 upstream.

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/dcache.c: add cond_resched() to shrink_dcache_parent()

commit 421348f1ca0bf17769dee0aed4d991845ae0536d upstream.

Call cond_resched() in shrink_dcache_parent() to maintain interactivity.

Before this patch:

void shrink_dcache_parent(struct dentry * parent)
{
while ((found = select_parent(parent, &dispose)) != 0)
shrink_dentry_list(&dispose);
}

select_parent() populates the dispose list with dentries which
shrink_dentry_list() then deletes.  select_parent() carefully uses
need_resched() to avoid doing too much work at once.  But neither
shrink_dcache_parent() nor its called functions call cond_resched().  So
once need_resched() is set select_parent() will return single dentry
dispose list which is then deleted by shrink_dentry_list().  This is
inefficient when there are a lot of dentry to process.  This can cause
softlockup and hurts interactivity on non preemptable kernels.

This change adds cond_resched() in shrink_dcache_parent().  The benefit
of this is that need_resched() is quickly cleared so that future calls
to select_parent() are able to efficiently return a big batch of dentry.

These additional cond_resched() do not seem to impact performance, at
least for the workload below.

Here is a program which can cause soft lockup if other system activity
sets need_resched().

int main()
{
        struct rlimit rlim;
        int i;
        int f[100000];
        char buf[20];
        struct timeval t1, t2;
        double diff;

        /* cleanup past run */
        system("rm -rf x");

        /* boost nfile rlimit */
        rlim.rlim_cur = 200000;
        rlim.rlim_max = 200000;
        if (setrlimit(RLIMIT_NOFILE, &rlim))
                err(1, "setrlimit");

        /* make directory for files */
        if (mkdir("x", 0700))
                err(1, "mkdir");

        if (gettimeofday(&t1, NULL))
                err(1, "gettimeofday");

        /* populate directory with open files */
        for (i = 0; i < 100000; i++) {
                snprintf(buf, sizeof(buf), "x/%d", i);
                f[i] = open(buf, O_CREAT);
                if (f[i] == -1)
                        err(1, "open");
        }

        /* close some of the files */
        for (i = 0; i < 85000; i++)
                close(f[i]);

        /* unlink all files, even open ones */
        system("rm -rf x");

        if (gettimeofday(&t2, NULL))
                err(1, "gettimeofday");

        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
                ((double)t1.tv_sec * 1000000 + t1.tv_usec));

        printf("done: %g elapsed\n", diff/1e6);
        return 0;
}

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

inotify: invalid mask should return a error number but not set it

commit 04df32fa10ab9a6f0643db2949d42efc966bc844 upstream.

When we run the crackerjack testsuite, the inotify_add_watch test is
stalled.

This is caused by the invalid mask 0 - the task is waiting for the event
but it never comes. inotify_add_watch() should return -EINVAL as it did
before commit 676a0675cf92 ("inotify: remove broken mask checks causing
unmount to be EINVAL"). That commit removes the invalid mask check, but
that check is needed.

Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call.
If none are set, just return -EINVAL.

Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger
the problem that above commit fixed.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: Jim Somerville <Jim.Somerville@windriver.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sata_highbank: Rename proc_name to the module name

commit 2cc1144a31f76d4a9fb48bec5d6ba1359f980813 upstream.

mkinitrd looks at /sys/class/scsi_host/host$hostnum/proc_name to find
the module name of a disk driver. Current name is "highbank-ahci" but
the module is "sata_highbank". Rename it to match the module name.

Signed-off-by: Robert Richter <robert.richter@calxeda.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clockevents: Set dummy handler on CPU_DEAD shutdown

commit 6f7a05d7018de222e40ca003721037a530979974 upstream.

Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.

We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().

The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e76897 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.

Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: 700333@bugs.debian.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

localmodconfig: Process source kconfig files as they are found

commit ced9cb1af1e3486cc14dca755a1b3fbadf06e90b upstream.

A bug was reported that caused localmodconfig to not keep all the
dependencies of ATH9K. This was caused by the kconfig file:

In drivers/net/wireless/ath/Kconfig:

cgroup: fix broken file xattrs

commit 712317ad97f41e738e1a19aa0a6392a78a84094e upstream.

We should store file xattrs in struct cfent instead of struct cftype,
because cftype is a type while cfent is object instance of cftype.

For example each cgroup has a tasks file, and each tasks file is
associated with a uniq cfent, but all those files share the same
struct cftype.

Alexey Kodanev reported a crash, which can be reproduced:

  # mount -t cgroup -o xattr /sys/fs/cgroup
  # mkdir /sys/fs/cgroup/test
  # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
  # rmdir /sys/fs/cgroup/test
  # umount /sys/fs/cgroup
  oops!

In this case, simple_xattrs_free() will free the same struct simple_xattrs
twice.

tj: Dropped unused local variable @cft from cgroup_diput().

Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cgroup: fix an off-by-one bug which may trigger BUG_ON()

commit 3ac1707a13a3da9cfc8f242a15b2fae6df2c5f88 upstream.

The 3rd parameter of flex_array_prealloc() is the number of elements,
not the index of the last element.

The effect of the bug is, when opening cgroup.procs, a flex array will
be allocated and all elements of the array is allocated with
GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
allocate memory for it, it'll trigger a BUG_ON().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points

commit 94a409319561ec1847fd9bf996a2d5843ad00932 upstream.

Commit 4ae46be "Thermal: Introduce thermal_zone_trip_update()"
introduced a regression causing the fan to be always on even when
the system is idle.

My original idea in that commit is that:
- when the current temperature is above the trip point,
   keep the fan on, even if the temperature is dropping.
- when the current temperature is below the trip point,
   turn on the fan when the temperature is raising,
   turn off the fan when the temperature is dropping.

But this is what the code actually does:
- when the current temperature is above the trip point,
   the fan keeps on.
- when the current temperature is below the trip point,
   the fan is always on because thermal_get_trend()
   in driver/acpi/thermal.c returns THERMAL_TREND_RAISING.
Thus the fan keeps running even if the system is idle.

Fix this in drivers/acpi/thermal.c.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=56591
References: https://bugzilla.kernel.org/show_bug.cgi?id=56601
References: https://bugzilla.kernel.org/show_bug.cgi?id=50041#c45
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Matthias <morpheusxyz123@yahoo.de>
Tested-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: Fix wrong parameter passed to memblock_reserve

commit a6432ded299726f123b93d0132fead200551535c upstream.

Commit 53aac44 (ACPI: Store valid ACPI tables passed via early initrd
in reserved memblock areas) introduced acpi_initrd_override() that
passes a wrong value as the second argument to memblock_reserve().

Namely, the second argument of memblock_reserve() is the size of the
region, not the address of the top of it, so make
acpi_initrd_override() pass the size in there as appropriate.

[rjw: Changelog]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libata: acpi: make ata_ap_acpi_handle not block

commit d66af4df0837f21bf267305dc5ccab2d29e24d86 upstream.

Since commit 30dcf76acc, ata_ap_acpi_handle will always do a namespace
walk, which requires acquiring an acpi namespace mutex. This made it
impossible to be used when calling path has held a spinlock.

For example, it can occur in the following code path for pata_acpi:
ata_scsi_queuecmd (ap->lock is acquired)
  __ata_scsi_queuecmd
    ata_scsi_translate
      ata_qc_issue
        pacpi_qc_issue
          ata_acpi_stm
            ata_ap_acpi_handle
              acpi_get_child
                acpi_walk_namespace
                  acpi_ut_acquire_mutex (acquire mutex while holding lock)
This caused scheduling while atomic bug, as reported in bug #56781.

Actually, ata_ap_acpi_handle doesn't have to walk the namespace every
time it is called, it can simply return the bound acpi handle on the
corresponding SCSI host. The reason previously it is not done this way
is, ata_ap_acpi_handle is used in the binding function
ata_acpi_bind_host by ata_acpi_gtm when the handle is not bound to the
SCSI host yet. Since we already have the ATA port's handle in its
binding function, we can simply use it instead of calling
ata_ap_acpi_handle there. So introduce a new function __ata_acpi_gtm,
where it will receive an acpi handle param in addition to the ATA port
which is solely used for debug statement. With this change, we can make
ata_ap_acpi_handle simply return the bound handle for SCSI host instead
of walking the acpi namespace now.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=56781
Reported-and-tested-by: <kenzopl@o2.pl>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend

commit e005715efaf674660ae59af83b13822567e3a758 upstream.

There's a bug where rtc alarms are ignored after the rtc cmos suspends
but before the system finishes suspend. Since hpet emulation is
disabled and it still handles the interrupts, a wake event is never
registered which is done from the rtc layer.

This patch reverts commit d1b2efa83fbf ("rtc: disable hpet emulation on
suspend") which disabled hpet emulation. To fix the problem mentioned
in that commit, hpet_rtc_timer_init() is called directly on resume.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: swap: mark swap pages writeback before queueing for direct IO

commit 0cdc444a67ccdbd58bfbcba865cb17a9f17a7691 upstream.

As pointed out by Andrew Morton, the swap-over-NFS writeback is not
setting PageWriteback before it is queued for direct IO.  While swap
pages do not participate in BDI or process dirty accounting and the IO
is synchronous, the writeback bit is still required and not setting it
in this case was an oversight.  swapoff depends on the page writeback to
synchronoise all pending writes on a swap page before it is reused.
Swapcache freeing and reuse depend on checking the PageWriteback under
lock to ensure the page is safe to reuse.

Direct IO handlers and the direct IO handler for NFS do not deal with
PageWriteback as they are synchronous writes.  In the case of NFS, it
schedules pages (or a page in the case of swap) for IO and then waits
synchronously for IO to complete in nfs_direct_write().  It is
recognised that this is a slowdown from normal swap handling which is
asynchronous and uses a completion handler.  Shoving PageWriteback
handling down into direct IO handlers looks like a bad fit to handle the
swap case although it may have to be dealt with some day if swap is
converted to use direct IO in general and bmap is finally done away
with.  At that point it will be necessary to refit asynchronous direct
IO with completion handlers onto the swap subsystem.

As swapcache currently depends on PageWriteback to protect against
races, this patch sets PageWriteback under the page lock before queueing
it for direct IO.  It is cleared when the direct IO handler returns.  IO
errors are treated similarly to the direct-to-bio case except PageError
is not set as in the case of swap-over-NFS, it is likely to be a
transient error.

It was asked what prevents such a page being reclaimed in parallel.
With this patch applied, such a page will now be skipped (most of the
time) or blocked until the writeback completes.  Reclaim checks
PageWriteback under the page lock before calling try_to_free_swap and
the page lock should prevent the page being requeued for IO before it is
freed.

This and Jerome's related patch should considered for -stable as far
back as 3.6 when swap-over-NFS was introduced.

[akpm@linux-foundation.org: use pr_err_ratelimited()]
[akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

swap: redirty page if page write fails on swap file

commit 2d30d31ea3c5be426ce25607b9bd1835acb85e0a upstream.

Since commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages"), swap_writepage()
calls direct_IO on swap files. However, in that case the page isn't
redirtied if I/O fails, and is therefore handled afterwards as if it has
been successfully written to the swap file, leading to memory corruption
when the page is eventually swapped back in.

This patch sets the page dirty when direct_IO() fails. It fixes a
memory corruption that happened while using swap-over-NFS.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Add expiry time overflow check in hrtimer_interrupt

commit 8f294b5a139ee4b75e890ad5b443c93d1e558a8b upstream.

The settimeofday01 test in the LTP testsuite effectively does

        gettimeofday(current time);
        settimeofday(Jan 1, 1970 + 100 seconds);
        settimeofday(current time);

This test causes a stack trace to be displayed on the console during the
setting of timeofday to Jan 1, 1970 + 100 seconds:

[  131.066751] ------------[ cut here ]------------
[  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
[  131.104935] Hardware name: Dinar
[  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
[  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
[  131.182248] Call Trace:
[  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
[  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
[  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
[  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
[  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
[  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
[  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
[  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
[  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
[  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
[  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
[  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
[  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
[  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
[  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
[  131.274449] ---[ end trace 1151a50552231615 ]---

When we change the system time to a low value like this, the value of
timekeeper->offs_real will be a negative value.

It seems that the WARN occurs because an hrtimer has been started in the time
between the releasing of the timekeeper lock and the IPI call (via a call to
on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
is that a REALTIME_CLOCK timer has been added with softexpires = expires =
KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
clock base's offset (which was set to timekeeper->offs_real in
do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
was KTIME_MAX):

KTIME_MAX - (a negative value) = overflow

A simple check for an overflow can resolve this problem.  Using KTIME_MAX
instead of the overflow value will result in the hrtimer function being run,
and the reprogramming of the timer after that.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hrtimer: Fix ktime_add_ns() overflow on 32bit architectures

commit 51fd36f3fad8447c487137ae26b9d0b3ce77bb25 upstream.

One can trigger an overflow when using ktime_add_ns() on a 32bit
architecture not supporting CONFIG_KTIME_SCALAR.

When passing a very high value for u64 nsec, e.g. 7881299347898368000
the do_div() function converts this value to seconds (7881299347) which
is still to high to pass to the ktime_set() function as long. The result
in is a negative value.

The problem on my system occurs in the tick-sched.c,
tick_nohz_stop_sched_tick() when time_delta is set to
timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
valid, thus ktime_add_ns() is called with a too large value resulting in
a negative expire value. This leads to an endless loop in the ticker code:

time_delta: 7881299347898368000
expires = ktime_add_ns(last_update, time_delta)
expires: negative value

This fix caps the value to KTIME_MAX.

This error doesn't occurs on 64bit or architectures supporting
CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).

Signed-off-by: David Engraf <david.engraf@sysgo.com>
[jstultz: Minor tweaks to commit message & header]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: max98088: Fix logging of hardware revision.

commit 98682063549bedd6e2d2b6b7222f150c6fbce68c upstream.

The hardware revision of the codec is based at 0x40. Subtract that
before convering to ASCII. The same as it is done for 98095.

Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Add the support for ALC286 codec

commit 7fc7d047216aa4923d401c637be2ebc6e3d5bd9b upstream.

It's yet another ALC269-variant.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: USB: adjust for changed 3.8 USB API

commit c75c5ab575af7db707689cdbb5a5c458e9a034bb upstream.

The recent changes in the USB API ("implement new semantics for
URB_ISO_ASAP") made the former meaning of the URB_ISO_ASAP flag the
default, and changed this flag to mean that URBs can be delayed.
This is not the behaviour wanted by any of the audio drivers because
it leads to discontinuous playback with very small period sizes.
Therefore, our URBs need to be submitted without this flag.

Reported-by: Joe Rayhawk <jrayhawk@fairlystable.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Fix autopm error during probing

commit 60af3d037eb8c670dcce31401501d1271e7c5d95 upstream.

We've got strange errors in get_ctl_value() in mixer.c during
probing, e.g. on Hercules RMX2 DJ Controller:

  ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x201, wIndex = 0xa00, type = 4
  ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x200, wIndex = 0xa00, type = 4
  ....

It turned out that the culprit is autopm: snd_usb_autoresume() returns
-ENODEV when called during card->probing = 1.

Since the call itself during card->probing = 1 is valid, let's fix the
return value of snd_usb_autoresume() as success.

Reported-and-tested-by: Daniel Schürmann <daschuer@mixxx.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: disable autopm for MIDI devices

commit cbc200bca4b51a8e2406d4b654d978f8503d430b upstream.

Commit 88a8516a2128 (ALSA: usbaudio: implement USB autosuspend)
introduced autopm for all USB audio/MIDI devices. However, many MIDI
devices, such as synthesizers, do not merely transmit MIDI messages but
use their MIDI inputs to control other functions. With autopm, these
devices would get powered down as soon as the last MIDI port device is
closed on the host.

Even some plain MIDI interfaces could get broken: they automatically
send Active Sensing messages while powered up, but as soon as these
messages cease, the receiving device would interpret this as an
accidental disconnection.

Commit f5f165418cab (ALSA: usb-audio: Fix missing autopm for MIDI input)
introduced another regression: some devices (e.g. the Roland GAIA SH-01)
are self-powered but do a reset whenever the USB interface's power state
changes.

To work around all this, just disable autopm for all USB MIDI devices.

Reported-by: Laurens Holst
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb: Add quirk for 192KHz recording on E-Mu devices

commit 1539d4f82ad534431cc67935e8e442ccf107d17d upstream.

When recording at 176.2KHz or 192Khz, the device adds a 32-bit length
header to the capture packets, which obviously needs to be ignored for
recording to work properly.

Userspace expected: L0 L1 L2 R0 R1 R2
...but actually got: R2 L0 L1 L2 R0 R1

Also, the last byte of the length header being interpreted as L0 of
the first sample caused spikes every 0.5ms, resulting in a loud 16KHz
tone (about the highest 'B' on a piano) being present throughout
captures.

Tested at all sample rates on an E-Mu 0404USB, and tested for
regressions on a generic USB headset.

Signed-off-by: Calvin Owens <jcalvinowens@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT

commit ebfc594c02148b6a85c2f178cf167a44a3c3ce10 upstream.

The USB_DT_CS_ENDPOINT class-specific endpoint descriptor is usually
stuffed directly after the standard USB endpoint descriptor, and this is
where the driver currently expects it to be.

There are, however, devices in the wild that have it the other way
around in their descriptor sets, so the USB_DT_CS_ENDPOINT comes
*before* the standard enpoint. Devices known to implement it that way
are "Sennheiser BTD-500" and Plantronics USB headsets.

When the driver can't find the USB_DT_CS_ENDPOINT, it won't be able to
change sample rates, as the bitmask for the validity of this command is
storen in bmAttributes of that descriptor.

Fix this by searching the entire interface instead of just the extra
bytes of the first endpoint, in case the latter fails.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Torstein Hegge <hegge@resisty.net>
Reported-and-tested-by: Yves G <alsa-user@vivigatt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: emu10k1: Fix dock firmware loading

commit e08b34e86dfdb72a62196ce0f03d33f48958d8b9 upstream.

The commit [b209c4df: ALSA: emu10k1: cache emu1010 firmware] broke the
firmware loading of the dock, just (mistakenly) ignoring a different
firmware for docks on some models. This patch revives them again.

Bugzilla: https://bugs.archlinux.org/task/34865
Reported-and-tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

TPM: Retry SaveState command in suspend path

commit 32d33b29ba077d6b45de35f2181e0a7411b162f4 upstream.

If the TPM has already been sent a SaveState command before the driver
is loaded it may have problems sending that same command again later.

This issue is seen with the Chromebook Pixel due to a firmware bug in
the legacy mode boot path which is sending the SaveState command
before booting the kernel.  More information is available at
http://crbug.com/203524

This change introduces a retry of the SaveState command in the suspend
path in order to work around this issue.  A future firmware update
should fix this but this is also a trivial workaround in the driver
that has no effect on systems that do not show this problem.

When this does happen the TPM responds with a non-fatal TPM_RETRY code
that is defined in the specification:

  The TPM is too busy to respond to the command immediately, but the
  command could be resubmitted at a later time.  The TPM MAY return
  TPM_RETRY for any command at any time.

It can take several seconds before the TPM will respond again.  I
measured a typical time between 3 and 4 seconds and the timeout is set
at a safe 5 seconds.

It is also possible to reproduce this with commands via /dev/tpm0.
The bug linked above has a python script attached which can be used to
test for this problem.  I tested a variety of TPMs from Infineon,
Nuvoton, Atmel, and STMicro but was only able to reproduce this with
LPC and I2C TPMs from Infineon.

The TPM specification only loosely defines this behavior:

  TPM Main Level 2 Part 3 v1.2 r116, section 3.3. TPM_SaveState:
  The TPM MAY declare all preserved values invalid in response to any
  command other than TPM_Init.

  TCG PC Client BIOS Spec 1.21 section 8.3.1.
  After issuing a TPM_SaveState command, the OS SHOULD NOT issue TPM
  commands before transitioning to S3 without issuing another
  TPM_SaveState command.

  TCG PC Client TIS 1.21, section 4. Power Management:
  The TPM_SaveState command allows a Static OS to indicate to the TPM
  that the platform may enter a low power state where the TPM will be
  required to enter into the D3 power state.  The use of the term "may"
  is significant in that there is no requirement for the platform to
  actually enter the low power state after sending the TPM_SaveState
  command.  The software may, in fact, send subsequent commands after
  sending the TPM_SaveState command.

Change-Id: I52b41e826412688e5b6c8ddd3bb16409939704e9
Signed-off-by: Duncan Laurie <dlaurie@chromium.org>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Cc: Dirk Hohndel <dirk@hohndel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: allow arch code to control the user page table ceiling

commit 6ee8630e02be6dd89926ca0fbc21af68b23dc087 upstream.

On architectures where a pgd entry may be shared between user and kernel
(e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0.
This patch introduces a generic USER_PGTABLES_CEILING that arch code can
override. It is the responsibility of the arch code setting the ceiling
to ensure the complete freeing of the page tables (usually in
pgd_free()).

[catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/fscache/stats.c: fix memory leak

commit ec686c9239b4d472052a271c505d04dae84214cc upstream.

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release. Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Wrong asm register contraints in the kvm implementation

commit de53e9caa4c6149ef4a78c2f83d7f5b655848767 upstream.

The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/kvm/vtlb.c.

I observed this on Kernel 3.2.35 but it is also true on the most
recent Kernel 3.9-rc1.

File arch/ia64/kvm/vtlb.c:

u64 guest_vhpt_lookup(u64 iha, u64 *pte)
{
u64 ret;
struct thash_data *data;

data = __vtr_lookup(current_vcpu, iha, D_TLB);
if (data != NULL)
thash_vhpt_insert(current_vcpu, data->page_flags,
data->itir, iha, D_TLB);

asm volatile (
"rsm psr.ic|psr.i;;"
"srlz.d;;"
"ld8.s r9=[%1];;"
"tnat.nz p6,p7=r9;;"
"(p6) mov %0=1;"
"(p6) mov r9=r0;"
"(p7) extr.u r9=r9,0,53;;"
"(p7) mov %0=r0;"
"(p7) st8 [%2]=r9;;"
"ssm psr.ic;;"
"srlz.d;;"
"ssm psr.i;;"
"srlz.d;;"
: "=r"(ret) : "r"(iha), "r"(pte):"memory");

return ret;
}

The list of output registers is
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.

The attached patch fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.

This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).

The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.

Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Wrong asm register contraints in the futex implementation

commit 136f39ddc53db3bcee2befbe323a56d4fbf06da8 upstream.

The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/include/asm/futex.h.

I observed this on Kernel 3.2.23 but it is also true on the most
recent Kernel 3.9-rc1.

File arch/ia64/include/asm/futex.h:

static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

{
register unsigned long r8 __asm ("r8");
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov %0=r0 \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "=r" (r8), "=r" (prev)
: "r" (uaddr), "r" (newval),
  "rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}

The list of output registers is
: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
" mov %0=r0 \n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.

The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:

static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
      u32 uval, u32 newval)
{
int ret;

pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();

return ret;
}

Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o

0000000000000230 <cmpxchg_futex_value_locked>:
      230: 0b 18 80 1b 18 21 [MMI]       adds r3=3168,r13;;
      236: 80 40 0d 00 42 00             adds r8=40,r3
      23c: 00 00 04 00                    nop.i 0x0;;
      240: 0b 50 00 10 10 10 [MMI]       ld4 r10=[r8];;
      246: 90 08 28 00 42 00             adds r9=1,r10
      24c: 00 00 04 00                    nop.i 0x0;;
      250: 09 00 00 00 01 00 [MMI]       nop.m 0x0
      256: 00 48 20 20 23 00             st4 [r8]=r9
      25c: 00 00 04 00                    nop.i 0x0;;
      260: 08 10 80 06 00 21 [MMI]       adds r2=32,r3
      266: 00 00 00 02 00 00             nop.m 0x0
      26c: 02 08 f1 52                    extr.u r16=r33,0,61
      270: 05 40 88 00 08 e0 [MLX]       addp4 r8=r34,r0
      276: ff ff 0f 00 00 e0             movl r15=0xfffffffbfff;;
      27c: f1 f7 ff 65
      280: 09 70 00 04 18 10 [MMI]       ld8 r14=[r2]
      286: 00 00 00 02 00 c0             nop.m 0x0
      28c: f0 80 1c d0                    cmp.ltu p6,p7=r15,r16;;
      290: 08 40 fc 1d 09 3b [MMI]       cmp.eq p8,p9=-1,r14
      296: 00 00 00 02 00 40             nop.m 0x0
      29c: e1 08 2d d0                    cmp.ltu p10,p11=r14,r33
      2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2a6: 02 08 00 80 21 03       (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
      2ac: 40 00 00 41              (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2b0: 0a 00 00 00 22 00 [MMI]       mf;;
      2b6: 80 00 00 00 42 00             mov r8=r0
      2bc: 00 00 04 00                    nop.i 0x0
      2c0: 0b 00 20 40 2a 04 [MMI]       mov.m ar.ccv=r8;;
      2c6: 10 1a 85 22 20 00             cmpxchg4.acq r33=[r33],r35,ar.ccv
      2cc: 00 00 04 00                    nop.i 0x0;;
      2d0: 10 00 84 40 90 11 [MIB]       st4 [r32]=r33
      2d6: 00 00 00 02 00 00             nop.i 0x0
      2dc: 20 00 00 40                    br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
      2e0: 09 40 c8 f9 ff 27 [MMI]       mov r8=-14
      2e6: 00 00 00 02 00 00             nop.m 0x0
      2ec: 00 00 04 00                    nop.i 0x0;;
      2f0: 0b 58 20 1a 19 21 [MMI]       adds r11=3208,r13;;
      2f6: 20 01 2c 20 20 00             ld4 r18=[r11]
      2fc: 00 00 04 00                    nop.i 0x0;;
      300: 0b 88 fc 25 3f 23 [MMI]       adds r17=-1,r18;;
      306: 00 88 2c 20 23 00             st4 [r11]=r17
      30c: 00 00 04 00                    nop.i 0x0;;
      310: 11 00 00 00 01 00 [MIB]       nop.m 0x0
      316: 00 00 00 02 00 80             nop.i 0x0
      31c: 08 00 84 00                    br.ret.sptk.many b0;;

The lines
      2b0: 0a 00 00 00 22 00 [MMI]       mf;;
      2b6: 80 00 00 00 42 00             mov r8=r0
      2bc: 00 00 04 00                    nop.i 0x0
      2c0: 0b 00 20 40 2a 04 [MMI]       mov.m ar.ccv=r8;;
      2c6: 10 1a 85 22 20 00             cmpxchg4.acq r33=[r33],r35,ar.ccv
      2cc: 00 00 04 00                    nop.i 0x0;;
are the instructions of the assembly block.
The line
      2b6: 80 00 00 00 42 00             mov r8=r0
sets the r8 register to 0 and after that
      2c0: 0b 00 20 40 2a 04 [MMI]       mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.

(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)

The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:

static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

{
register unsigned long r8 __asm ("r8") = 0;
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "+r" (r8), "=&r" (prev)
: "r" (uaddr), "r" (newval),
  "rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}

I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.

Patched the Kernel 3.2.23 and compiled it with GCC4.4:

0000000000000230 <cmpxchg_futex_value_locked>:
      230: 0b 18 80 1b 18 21 [MMI]       adds r3=3168,r13;;
      236: 80 40 0d 00 42 00             adds r8=40,r3
      23c: 00 00 04 00                    nop.i 0x0;;
      240: 0b 50 00 10 10 10 [MMI]       ld4 r10=[r8];;
      246: 90 08 28 00 42 00             adds r9=1,r10
      24c: 00 00 04 00                    nop.i 0x0;;
      250: 09 00 00 00 01 00 [MMI]       nop.m 0x0
      256: 00 48 20 20 23 00             st4 [r8]=r9
      25c: 00 00 04 00                    nop.i 0x0;;
      260: 08 10 80 06 00 21 [MMI]       adds r2=32,r3
      266: 20 12 01 10 40 00             addp4 r34=r34,r0
      26c: 02 08 f1 52                    extr.u r16=r33,0,61
      270: 05 40 00 00 00 e1 [MLX]       mov r8=r0
      276: ff ff 0f 00 00 e0             movl r15=0xfffffffbfff;;
      27c: f1 f7 ff 65
      280: 09 70 00 04 18 10 [MMI]       ld8 r14=[r2]
      286: 00 00 00 02 00 c0             nop.m 0x0
      28c: f0 80 1c d0                    cmp.ltu p6,p7=r15,r16;;
      290: 08 40 fc 1d 09 3b [MMI]       cmp.eq p8,p9=-1,r14
      296: 00 00 00 02 00 40             nop.m 0x0
      29c: e1 08 2d d0                    cmp.ltu p10,p11=r14,r33
      2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2a6: 02 08 00 80 21 03       (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
      2ac: 40 00 00 41              (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2b0: 0b 00 00 00 22 00 [MMI]       mf;;
      2b6: 00 10 81 54 08 00             mov.m ar.ccv=r34
      2bc: 00 00 04 00                    nop.i 0x0;;
      2c0: 09 58 8c 42 11 10 [MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
      2c6: 00 00 00 02 00 00             nop.m 0x0
      2cc: 00 00 04 00                    nop.i 0x0;;
      2d0: 10 00 2c 40 90 11 [MIB]       st4 [r32]=r11
      2d6: 00 00 00 02 00 00             nop.i 0x0
      2dc: 20 00 00 40                    br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
      2e0: 09 40 c8 f9 ff 27 [MMI]       mov r8=-14
      2e6: 00 00 00 02 00 00             nop.m 0x0
      2ec: 00 00 04 00                    nop.i 0x0;;
      2f0: 0b 88 20 1a 19 21 [MMI]       adds r17=3208,r13;;
      2f6: 30 01 44 20 20 00             ld4 r19=[r17]
      2fc: 00 00 04 00                    nop.i 0x0;;
      300: 0b 90 fc 27 3f 23 [MMI]       adds r18=-1,r19;;
      306: 00 90 44 20 23 00             st4 [r17]=r18
      30c: 00 00 04 00                    nop.i 0x0;;
      310: 11 00 00 00 01 00 [MIB]       nop.m 0x0
      316: 00 00 00 02 00 80             nop.i 0x0
      31c: 08 00 84 00                    br.ret.sptk.many b0;;

Much better.
There is a
      270: 05 40 00 00 00 e1 [MLX]       mov r8=r0
which was generated by C code r8 = 0. Below
      2b6: 00 10 81 54 08 00             mov.m ar.ccv=r34
what means that oldval is no longer overwritten.

This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).

The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.

Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rt2x00: Fix transmit power troubles on some Ralink RT30xx cards

commit 7e9dafd873034dd64ababcb858be424c4780ae13 upstream.

Some cards on Ralink RT30xx chipset not have correctly TX_MIXER_GAIN
value in them EEPROM/EFUSE. In this case, we must use default value,
but always used EEPROM/EFUSE value. As result we have tranmitt power
range from -10dBm to +6dBm instead 0dBm to +16dBm.

Correctly value in EEPROM/EFUSE is one or more for RT3070 and two or
more for other RT30xx chips.

Tested on Canyon CNP-WF518N1 usb Wi-Fi dongle and Jorjin WN8020 usb
embedded Wi-Fi module.

Signed-off-by: Alex A. Mihaylov <minimumlaw@rambler.ru>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI/PM: Fix fallback to PCI_D0 in pci_platform_power_transition()

commit 769ba7212f2059ca9fe0c73371e3d415c8c1c529 upstream.

Commit b51306c (PCI: Set device power state to PCI_D0 for device
without native PM support) modified pci_platform_power_transition()
by adding code causing dev->current_state for devices that don't
support native PCI PM but are power-manageable by the platform to be
changed to PCI_D0 regardless of the value returned by the preceding
platform_pci_set_power_state(). In particular, that also is done
if the platform_pci_set_power_state() has been successful, which
causes the correct power state of the device set by
pci_update_current_state() in that case to be overwritten by PCI_D0.

Fix that mistake by making the fallback to PCI_D0 only happen if
the platform_pci_set_power_state() has returned an error.

[bhelgaas: folded in Yinghai's simplification, added URL & stable info]
Reference: http://lkml.kernel.org/r/27806FC4E5928A408B78E88BBC67A2306F466BBA@ORSMSX101.amr.corp.intel.com
Reported-by: Chris J. Benenati <chris.j.benenati@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI / ACPI: Don't query OSC support with all possible controls

commit 545d6e189a41c94c11f55045a771118eccc9d9eb upstream.

Found problem on system that firmware that could handle pci aer.
Firmware get error reporting after pci injecting error, before os boots.
But after os boots, firmware can not get report anymore, even pci=noaer
is passed.

Root cause: BIOS _OSC has problem with query bit checking.
It turns out that BIOS vendor is copying example code from ACPI Spec.
In ACPI Spec 5.0, page 290:

If (Not(And(CDW1,1))) // Query flag clear?
{ // Disable GPEs for features granted native control.
If (And(CTRL,0x01)) // Hot plug control granted?
{
Store(0,HPCE) // clear the hot plug SCI enable bit
Store(1,HPCS) // clear the hot plug SCI status bit
}
...
}

When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
So it will get into code path that should be for control set only.
BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"

Current kernel code is using _OSC query to notify firmware about support
from OS and then use _OSC to set control bits.
During query support, current code is using all possible controls.
So will execute code that should be only for control set stage.

That will have problem when pci=noaer or aer firmware_first is used.
As firmware have that control set for os aer already in query support stage,
but later will not os aer handling.

We should avoid passing all possible controls, just use osc_control_set
instead.
That should workaround BIOS bugs with affected systems on the field
as more bios vendors are copying sample code from ACPI spec.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Fix initialization of CMCI/CMCP interrupts

commit d303e9e98fce56cdb3c6f2ac92f626fc2bd51c77 upstream.

Back 2010 during a revamp of the irq code some initializations
were moved from ia64_mca_init() to ia64_mca_late_init() in

commit c75f2aa13f5b268aba369b5dc566088b5194377c
Cannot use register_percpu_irq() from ia64_mca_init()

But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.

Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.

Reported-by: Fred Hartnett <fred.hartnett@hp.com>
Tested-by: Fred Hartnett <fred.hartnett@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sysfs: fix use after free in case of concurrent read/write and readdir

commit f7db5e7660b122142410dcf36ba903c73d473250 upstream.

The inode->i_mutex isn't hold when updating filp->f_pos
in read()/write(), so the filp->f_pos might be read as
0 or 1 in readdir() when there is concurrent read()/write()
on this same file, then may cause use after free in readdir().

The bug can be reproduced with Li Zefan's test code on the
link:

https://patchwork.kernel.org/patch/2160771/

This patch fixes the use after free under this situation.

Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: crc32-pclmul - Use gas macro for pclmulqdq

commit 57ae1b0532977b30184aaba04b6cafe0a284c21f upstream.

Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y.
Older versions of bintuils do not support the pclmulqdq instruction. The
PCLMULQDQ gas macro is used instead.

Signed-off-by: Sandy Wu <sandyw@twitter.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: xiic: must always write 16-bit words to TX_FIFO

commit c39e8e4354ce4daf23336de5daa28a3b01f00aa6 upstream.

The TX_FIFO register is 10 bits wide. The lower 8 bits are the data to be
written, while the upper two bits are flags to indicate stop/start.

The driver apparently attempted to optimize write access, by only writing a
byte in those cases where the stop/start bits are zero. However, we have
seen cases where the lower byte is duplicated onto the upper byte by the
hardware, which causes inadvertent stop/starts.

This patch changes the write access to the transmit FIFO to always be 16 bits
wide.

Signed off by: Steven A. Falco <sfalco@harris.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Reset ftrace_graph_filter_enabled if count is zero

commit 9f50afccfdc15d95d7331acddcb0f7703df089ae upstream.

The ftrace_graph_count can be decreased with a "!" pattern, so that
the enabled flag should be updated too.

Link: http://lkml.kernel.org/r/1365663698-2413-1-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Check return value of tracing_init_dentry()

commit ed6f1c996bfe4b6e520cf7a74b51cd6988d84420 upstream.

Check return value and bail out if it's NULL.

Link: http://lkml.kernel.org/r/1365553093-10180-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix off-by-one on allocating stat->pages

commit 39e30cd1537937d3c00ef87e865324e981434e5b upstream.

The first page was allocated separately, so no need to start from 0.

Link: http://lkml.kernel.org/r/1364820385-32027-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Remove most or all of stack tracer stack size from stack_max_size

commit 4df297129f622bdc18935c856f42b9ddd18f9f28 upstream.

Currently, the depth reported in the stack tracer stack_trace file
does not match the stack_max_size file. This is because the stack_max_size
includes the overhead of stack tracer itself while the depth does not.

The first time a max is triggered, a calculation is not performed that
figures out the overhead of the stack tracer and subtracts it from
the stack_max_size variable. The overhead is stored and is subtracted
from the reported stack size for comparing for a new max.

Now the stack_max_size corresponds to the reported depth:

# cat stack_max_size
4640

# cat stack_trace
        Depth    Size   Location    (48 entries)
        -----    ----   --------
  0)     4640      32   _raw_spin_lock+0x18/0x24
  1)     4608     112   ____cache_alloc+0xb7/0x22d
  2)     4496      80   kmem_cache_alloc+0x63/0x12f
  3)     4416      16   mempool_alloc_slab+0x15/0x17
[...]

While testing against and older gcc on x86 that uses mcount instead
of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
stack trace show one more function deep which was missing before.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>