git.karo-electronics.de Git - karo-tx-linux.git/log

ARM: dts: imx6sl: Adding cpu frequency and VDDSOC/PU table.

Device tree for iMX6SL doesn't have an existing cpu frequency table
as well as the VDDSOC/PU.

Signed-off-by: John Tobias <john.tobias.ph@gmail.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit b0d300d3a22c from upstream]

ARM: dts: imx6dl: enable cpufreq support

This patch adds cpufreq dts for i.mx6dl to support cpufreq driver.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 978ed904c17c from upstream]

ARM: dts: imx6q: add 852MHz setpoint for CPU freq

According to datasheet, i.MX6Q has setpoint of 852MHz
which is exclusive with 996MHz, the fuse map of speed_grading
defines the max speed of ARM, here we add this 852MHz
setpoint opp info, kernel will check the speed_grading
fuse and remove all illegal setpoints.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 89ef8ef45e96 from upstream]

ARM: dts: imx6q: add vddsoc/pu setpoint info

i.MX6Q needs to update vddarm, vddsoc/pu regulators when cpu freq
is changed, each setpoint has different voltage, so we need to
pass vddarm, vddsoc/pu's freq-voltage info from dts together.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 69171eda49ae from upstream]

ARM: dts: imx6q: update setting of VDDARM_CAP voltage

According to datasheet, VDD_CACHE_CAP must not exceed VDDARM_CAP
by more than 200mV, as all of i.MX6Q boards' VDD_CACHE_CAP currently
are connected to VDDSOC_CAP, so we need to follow this rule by
increasing VDDARM_CAP's voltage.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 26ea58019eda from upstream]

ARM: imx: add shared gate clock support

It's quite common on i.MX that one gate bit controls the gating of
multiple clocks, i.e. this is a shared gate. The patch adds the
function imx_clk_gate2_shared() for such case. The clocks controlled
by the same gate bits should call this function with a pointer to a
single share count variable, so that the gate bits will only be
operated on the first enabling and the last disabling of these shared
gate clocks.

Thanks to Gerhard Sittig <gsi@denx.de> for this idea.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
[shawn.guo: cherry-pick commit f9f28cdf2167 from upstream]

ARM: imx: lock is always valid for clk_gate2

The imx specific clk_gate2 always has a valid lock with the clock. So
the validation on gate->lock is not really needed. Remove it.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
[shawn.guo: cherry-pick commit 94b5c0288299 from upstream]

ARM: imx: define struct clk_gate2 on our own

The imx clk-gate2 driver implements an i.MX specific gate clock, which
has two bits controlling the gate states.  While this is a completely
separate gate driver from the common clk-gate one, it reuses the common
clk_gate structure.  Such reusing makes the extending of clk_gate2
clumsy.  Let's define struct clk_gate2 on our own to make the driver
independent of the common clk-gate one, and ease the clk_gate2 extending
at a later time.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
[shawn.guo: cherry-pick commit 54ee1471f820 from upstream]

ARM: i.MX6: ipu_di_sel clocks can set parent rates

To obtain exact pixel clocks, allow the DI clock selectors to influence
the PLLs that they are derived from.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
[shawn.guo: cherry-pick commit 4591b13289b5 from upstream]

ARM: imx6q-clk: parent lvds_gate from lvds_sel

Allows fror proper refcounting of the parent clocks
when enabling the clock output on CLK1/2 pads.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: Richard Zhu <r65037@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
[shawn.guo: cherry-pick commit c2bece3cb121 from upstream]

ARM: imx6: drop .text.head section annotation from headsmp.S

The function v7_secondary_startup() works just fine in .text section, so
there is no need to have .text.head section annotation at all. Drop it.

Suggested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit c8ae7e9bfc8c from upstream]

ARM: imx6: build suspend-imx6.o with CONFIG_SOC_IMX6

Even when CONFIG_SUSPEND is enabled, it makes no sense to build
suspend-imx6.o if none of i.MX6 support is built in. Let's build
suspend-imx6.o only when both CONFIG_SUSPEND and CONFIG_SOC_IMX6 are
enabled.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 6a5637e52eca from upstream]

ARM: imx6: rename pm-imx6q.c to pm-imx6.c

The pm-imx6q.c works for all i.MX6 SoCs, so let's rename it to pm-imx6.c
and have the build controlled by option CONFIG_SOC_IMX6.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 9cdde7217e92 from upstream]

ARM: imx6: introduce CONFIG_SOC_IMX6 for i.MX6 common stuff

The i.MX6 SoCs have something in common, so let's introduce
CONFIG_SOC_IMX6 for those stuff.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 94f890ec91c8 from upstream]

Conflicts:
arch/arm/mach-imx/Kconfig

ARM: imx6: do not call imx6q_suspend_init() with !CONFIG_SUSPEND

When CONFIG_SUSPEND is not enabled, we should reasonably skip the call
to imx6q_suspend_init().

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 110666dc655a from upstream]

ARM: imx6: call suspend_set_ops() from suspend routine

Rename function imx6q_ocram_suspend_init() to imx6q_suspend_init() and
call suspend_set_ops() from there. Now we get a centralized function
for suspend initialization.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit afc51f464306 from upstream]

ARM: imx6: build headsmp.o only on CONFIG_SMP

With v7_cpu_resume() being moved out of headsmp.S, all the remaining
code in the file is only needed by CONFIG_SMP build. So we can control
the build of headsmp.o with only obj-$(CONFIG_SMP) now.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit facadba6a128 from upstream]

ARM: imx6: move v7_cpu_resume() into suspend-imx6.S

The suspend-imx6.S is introduced recently for suspend low-level assembly
code. Since function v7_cpu_resume() is only used by suspend support,
it makes sense to move the function into suspend-imx6.S, and control the
build of the file with CONFIG_SUSPEND option.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit c356bdb407ba from upstream]

ARM: mach-imx: Select CONFIG_SRAM at ARCH_MXC level

Booting a mx6q system built with multi_v7_defconfig leads to the following
error messages on boot:

[ 0.037758] imx6q_ocram_suspend_init: ocram pool unavailable!
[ 0.037768] imx6_pm_common_init: failed to initialize ocram suspend -19!

This happens because CONFIG_SRAM is not selected by default in
multi_v7_defconfig.

Fix this by selecting CONFIG_SRAM at ARCH_MXC level, so that other SoCs could
use the SRAM driver as well.

As SRAM automatically selects GENERIC_ALLOCATOR, just drop it from the Kconfig
entry.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 67f5b30875af from upstream]

Conflicts:
arch/arm/mach-imx/Kconfig

ARM: imx: add speed grading check for i.mx6 soc

The fuse map of speed_grading[1:0] defines the max speed
of ARM, see below the definition:

2b'11: 1200000000Hz;
2b'10: 996000000Hz;
2b'01: 852000000Hz; -- i.MX6Q Only, exclusive with 996MHz.
2b'00: 792000000Hz;

Need to remove all illegal setpoints according to fuse
map.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit c962a0996335 from upstream]

ARM: imx: avoid calling clk APIs in idle thread which may cause schedule

As clk_pllv3_wait_lock will call usleep_range, and the clk APIs
mutex lock may be held when CPU entering idle, so calling clk
APIs must be avoided in cpu idle thread, this is to avoid reschedule
warning in cpu idle, just access register directly to achieve that.

bad: scheduling from the idle thread!
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1+ #657
Backtrace:
[<80012188>] (dump_backtrace) from [<8001246c>] (show_stack+0x18/0x1c)
r6:808c0038 r5:00000000 r4:808e5a1c r3:00000000
[<80012454>] (show_stack) from [<8064b2ec>] (dump_stack+0x84/0x9c)
[<8064b268>] (dump_stack) from [<80055ee0>] (dequeue_task_idle+0x20/0x30)
r5:808bef40 r4:bf7dff40
[<80055ec0>] (dequeue_task_idle) from [<8004f028>] (dequeue_task+0x30/0x50)
r4:bf7dff40 r3:80055ec0
[<8004eff8>] (dequeue_task) from [<800503c0>] (deactivate_task+0x30/0x34)
r4:bf7dff40
[<80050390>] (deactivate_task) from [<8064d8e4>] (__schedule+0x2c8/0x5c0)
[<8064d61c>] (__schedule) from [<8064dc14>] (schedule+0x38/0x88)
r10:80912964 r9:808c1e50 r8:808c0038 r7:808cbf30 r6:80e128ec r5:60000093
r4:80912968
[<8064dbdc>] (schedule) from [<8064dfec>] (schedule_preempt_disabled+0x10/0x14)
[<8064dfdc>] (schedule_preempt_disabled) from [<8064ebc0>] (mutex_lock_nested+0x1c0/0x3c0)
[<8064ea00>] (mutex_lock_nested) from [<804ae71c>] (clk_prepare_lock+0x44/0xe4)
r10:806554cc r9:bf7df1bc r8:808cf4f8 r7:808cf544 r6:bf7df1b8 r5:808c0010
r4:80e69750
[<804ae6d8>] (clk_prepare_lock) from [<804af214>] (clk_get_rate+0x14/0x64)
r6:bf7df1b8 r5:00000002 r4:bf017000 r3:80922ad0
[<804af200>] (clk_get_rate) from [<80025d30>] (imx6sl_set_wait_clk+0x18/0x20)
r5:00000002 r4:00000001
[<80025d18>] (imx6sl_set_wait_clk) from [<80023454>] (imx6sl_enter_wait+0x20/0x48)
[<80023434>] (imx6sl_enter_wait) from [<80477c24>] (cpuidle_enter_state+0x44/0xfc)
r4:3c386e48 r3:80023434
[<80477be0>] (cpuidle_enter_state) from [<80477dd8>] (cpuidle_idle_call+0xfc/0x160)
r8:808cf4f8 r7:00000001 r6:80e69534 r5:00000000 r4:bf7df1b8
[<80477cdc>] (cpuidle_idle_call) from [<8000f61c>] (arch_cpu_idle+0x10/0x50)
r9:808c0000 r8:00000000 r7:80921a89 r6:808c8938 r5:808c899c r4:808c0000
[<8000f60c>] (arch_cpu_idle) from [<8006fa94>] (cpu_startup_entry+0x108/0x160)
[<8006f98c>] (cpu_startup_entry) from [<806452ac>] (rest_init+0xb4/0xdc)
r7:808afae0
[<806451f8>] (rest_init) from [<8086fb58>] (start_kernel+0x328/0x38c)
r6:ffffffff r5:808c8880 r4:808c8a30
[<8086f830>] (start_kernel) from [<80008074>] (0x80008074)

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 6e6cdf665630 from upstream]

ARM: imx6q: support ptp and rmii clock from pad

On imx6qdl, the ENET RMII and PTP clock can come from either internal
ANATOP/CCM or external clock source through pad GPIO_16. But in case
of the external clock source, bit IOMUXC_GPR1[21] needs to be cleared.

The patch adds the support for systems that use an external clock source
and distinguishes above two cases by checking if the PTP clock specified
in device tree is the one coming from the internal ANATOP/CCM.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 810c0ca87909 from upstream]

ARM: imx6q: remove unneeded clk lookups

Since commit (a94f8ec ARM: imx6q: remove board specific CLKO setup),
a number of clk lookups in imx6q clock driver is no longer needed.
Let's remove them.

The cpu0 lookup is also removed since we are now running imx6 cpufreq
driver and looking up clocks from device tree.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit b30c6d018094 from upstream]

ARM: imx: enable delaytimer on the imx timer

The imx can support timer-based delays, so implement this.
Skips past jiffy calibration.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 1119c84aa305 from upstream]

ARM: imx: add always-on clock array for i.mx6sl to maintain correct usecount

IPG, ARM and MMDC's clock should be enabled during kernel boot up,
so we need to maintain their usecount, otherwise, they may be
disabled unexpectedly if their children's clock are turned off, and
caused their parent PLLs also get disabled, which is incorrect.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 17626b7cfc56 from upstream]

ARM: imx: add suspend in ocram support for i.mx6sl

i.MX6SL's suspend in ocram function is derived from i.MX6Q,
it can lower the DDR IO power from ~10mA@1.2V to ~1mA@1.2V,
measured on i.MX6SL EVK board, SH5.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 64b08681398a from upstream]

ARM: imx: add suspend in ocram support for i.mx6dl

i.MX6DL's suspend in ocram function is derived from i.MX6Q,
it can lower the DDR IO power from ~26mA@1.5V to ~15mA@1.5V,
measured on i.MX6DL SabreSD board, R25.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit da9e92613526 from upstream]

ARM: imx: add suspend in ocram support for i.mx6q

When system enter suspend, we can set the DDR IO to
high-Z state to save DDR IOs' power consumption, this
operation can save many power(from ~26mA@1.5V to ~15mA@1.5V,
measured on i.MX6Q SabreSD board, R25) of DDR IOs. To
achieve that, we need to copy the suspend code to ocram
and run the low level hardware related code(set DDR IOs
to high-Z state) in ocram.

If there is no ocram space available, then system will
still do suspend in external DDR, hence no DDR IOs will
be set to high-Z.

The OCRAM usage layout is as below,

ocram suspend region(4K currently):
======================== high address ======================
                              .
                              .
                              .
                              ^
                              ^
                              ^
                      imx6_suspend code
             PM_INFO structure(imx6_cpu_pm_info)
======================== low address =======================

Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit df595746fa69 from upstream]

ARM: imx: clk-imx6sl: Suppress duplicate const sparse warning

There should be no duplicate const specifiers for those static
constant character string arrays defined for clock mux options.
Also, the arrays are only taken as the 5th argument for the
imx_clk_mux() function, which is in the type of 'const char
**parents'. So, let's remove the 2nd const specifier right
after 'char'.

This patch fixes these sparse warnings:
arch/arm/mach-imx/clk-imx6sl.c:21:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:22:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:23:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:24:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:25:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:26:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:27:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:28:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:29:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:30:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:31:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:32:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:33:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:34:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:35:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:36:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:37:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:38:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:39:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:40:25: warning: duplicate const
arch/arm/mach-imx/clk-imx6sl.c:41:25: warning: duplicate const

Signed-off-by: Liu Ying <Ying.Liu@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit b21c22e3a900 from upstream]

ARM: imx: add select on ARCH_MXC for cpufreq support

Move ARCH_HAS_CPUFREQ, ARCH_HAS_OPP and PM_OPP on ARCH_MXC so that
the user can enable the cpufreq support for iMX6Q and/or iMX6SL.

Signed-off-by: John Tobias <john.tobias.ph@gmail.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 5a1513f60fa7 from upstream]

ARM: imx: add cpuidle support for i.mx6sl

Add cpuidle support for i.MX6SL, currently only support
two cpuidle levels(ARM wfi and WAIT mode), and add software
workaround for WAIT mode errata as below:

ERR005311 CCM: After exit from WAIT mode, unwanted interrupt(s) taken
          during WAIT mode entry process could cause cache memory
          corruption.

Software workaround:
    To prevent this issue from occurring, software should ensure that
the ARM to IPG clock ratio is less than 12:5 (that is < 2.4x), before
entering WAIT mode.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 751f7e999afc from upstream]

ARM: imx: AHB rate must be set to 132MHz on i.mx6sl

The reset value of AHB divider is 3, so current AHB rate
is 99MHz which is not correct for kernel, need to ensure
AHB rate is 132MHz in clk driver, as ipg is sourcing from
AHB, and it should be 66MHz by default.

Signed-off-by: Anson Huang <b20788@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit 848db4a0a17a from upstream]

ARM: imx: Use INT_MEM_CLK_LPM as the bit name

Bit 17 of register CCM_CGPR is called INT_MEM_CLK_LPM as per the mx6
reference manual, so use this name instead.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[shawn.guo: cherry-pick commit fa6be65ed4d6 from upstream]

Linux 3.14.28

Btrfs: fix fs corruption on transaction abort if device supports discard

commit 678886bdc6378c1cbd5072da2c5a3035000214e3 upstream.

When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.

This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.

I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:

  [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim

The corruption always happened when a transaction aborted and then fsck complained
like this:

   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   read block failed check_tree_block
   Couldn't open file system

In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:

   1) transaction N started, fs_info->pinned_extents pointed to
      fs_info->freed_extents[0];

   2) node/eb 94945280 is created;

   3) eb is persisted to disk;

   4) transaction N commit starts, fs_info->pinned_extents now points to
      fs_info->freed_extents[1], and transaction N completes;

   5) transaction N + 1 starts;

   6) eb is COWed, and btrfs_free_tree_block() called for this eb;

   7) eb range (94945280 to 94945280 + 16Kb) is added to
      fs_info->pinned_extents (fs_info->freed_extents[1]);

   8) Something goes wrong in transaction N + 1, like hitting ENOSPC
      for example, and the transaction is aborted, turning the fs into
      readonly mode. The stack trace I got for example:

      [112065.253935]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
      [112065.254271]  [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
      [112065.254567]  [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.261674]  [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
      [112065.261922]  [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
      [112065.262211]  [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.262545]  [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
      [112065.262771]  [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
      [112065.263105]  [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
      (...)
      [112065.264493] ---[ end trace dd7903a975a31a08 ]---
      [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
      [112065.264997] BTRFS info (device sdc): forced readonly

   9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
      fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
      turn calls btrfs_destroy_pinned_extent();

   10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
       marked as dirty in fs_info->freed_extents[], and for each one
       it calls discard, if the fs was mounted with "-o discard", and
       adds the range to the free space cache of the respective block
       group;

   11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
       sees the free space entries and performs a discard;

   12) After an umount and mount (or fsck), our eb's location on disk was full
       of zeroes, and it should have been untouched, because it was marked as
       dirty in the fs_info->pinned_extents tree, and therefore used by the
       trees that the last committed superblock points to.

Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.

This isn't a new problem, as it's been present since 2011 (git commit
acce952b0263825da32cf10489413dec78053347).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Btrfs: do not move em to modified list when unpinning

commit a28046956c71985046474283fa3bcd256915fb72 upstream.

We use the modified list to keep track of which extents have been modified so we
know which ones are candidates for logging at fsync() time.  Newly modified
extents are added to the list at modification time, around the same time the
ordered extent is created.  We do this so that we don't have to wait for ordered
extents to complete before we know what we need to log.  The problem is when
something like this happens

log extent 0-4k on inode 1
copy csum for 0-4k from ordered extent into log
sync log
commit transaction
log some other extent on inode 1
ordered extent for 0-4k completes and adds itself onto modified list again
log changed extents
see ordered extent for 0-4k has already been logged
at this point we assume the csum has been copied
sync log
crash

On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
which is the same one that we are replaying which also drops the csum, and then
we won't find the csum in the log for that bytenr.  This of course causes us to
have errors about not having csums for certain ranges of our inode.  So remove
the modified list manipulation in unpin_extent_cache, any modified extents
should have been added well before now, and we don't want them re-logged.  This
fixes my test that I could reliably reproduce this problem with.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eCryptfs: Remove buggy and unnecessary write in file name decode routine

commit 942080643bce061c3dd9d5718d3b745dcb39a8bc upstream.

Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eCryptfs: Force RO mount when encrypted view is enabled

commit 332b122d39c9cbff8b799007a825d94b2e7c12f2 upstream.

The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:

e77a56d [PATCH] eCryptfs: Encrypted passthrough

This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Priya Bansal <p.bansal@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

udf: Verify symlink size before loading it

commit a1d47b262952a45aae62bd49cfaf33dd76c11a2c upstream.

UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.

Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream.

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ncpfs: return proper error from NCP_IOC_SETROOT ioctl

commit a682e9c28cac152e6e54c39efcf046e0c8cfcf63 upstream.

If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error
return value is then (in most cases) just overwritten before we return.
This can result in reporting success to userspace although error happened.

This bug was introduced by commit 2e54eb96e2c8 ("BKL: Remove BKL from
ncpfs"). Propagate the errors correctly.

Coverity id: 1226925.

Fixes: 2e54eb96e2c80 ("BKL: Remove BKL from ncpfs")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: af_alg - fix backlog handling

commit 7e77bdebff5cb1e9876c561f69710b9ab8fa1f7e upstream.

If a request is backlogged, it's complete() handler will get called
twice: once with -EINPROGRESS, and once with the final error code.

af_alg's complete handler, unlike other users, does not handle the
-EINPROGRESS but instead always completes the completion that recvmsg()
is waiting on.  This can lead to a return to user space while the
request is still pending in the driver.  If userspace closes the sockets
before the requests are handled by the driver, this will lead to
use-after-frees (and potential crashes) in the kernel due to the tfm
having been freed.

The crashes can be easily reproduced (for example) by reducing the max
queue length in cryptod.c and running the following (from
http://www.chronox.de/libkcapi.html) on AES-NI capable hardware:

$ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \
    -k 00000000000000000000000000000000 \
    -p 00000000000000000000000000000000 >/dev/null & done

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

audit: restore AUDIT_LOGINUID unset ABI

commit 041d7b98ffe59c59fdd639931dea7d74f9aa9a59 upstream.

A regression was caused by commit 780a7654cee8:
audit: Make testing for a valid loginuid explicit.
(which in turn attempted to fix a regression caused by e1760bd)

When audit_krule_to_data() fills in the rules to get a listing, there was a
missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

This broke userspace by not returning the same information that was sent and
expected.

The rule:
auditctl -a exit,never -F auid=-1
gives:
auditctl -l
LIST_RULES: exit,never f24=0 syscall=all
when it should give:
LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

Tag it so that it is reported the same way it was set. Create a new
private flags audit_krule field (pflags) to store it that won't interact with
the public one from the API.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Unbreak the unprivileged remount tests

commit db86da7cb76f797a1a8b445166a15cb922c6ff85 upstream.

A security fix in caused the way the unprivileged remount tests were
using user namespaces to break. Tweak the way user namespaces are
being used so the test works again.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Allow setting gid_maps without privilege when setgroups is disabled

commit 66d2f338ee4c449396b6f99f5e75cd18eb6df272 upstream.

Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed. Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Add a knob to disable setgroups on a per user namespace basis

commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream.

- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of "deny" means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls currently do
  not allow checking the permissions at open time.

- Writing to the proc file is restricted to before the gid_map
  for the user namespace is set.

  This ensures that disabling setgroups at a user namespace
  level will never remove the ability to call setgroups
  from a process that already has that ability.

  A process may opt in to the setgroups disable for itself by
  creating, entering and configuring a user namespace or by calling
  setns on an existing user namespace with setgroups disabled.
  Processes without privileges already can not call setgroups so this
  is a noop.  Prodcess with privilege become processes without
  privilege when entering a user namespace and as with any other path
  to dropping privilege they would not have the ability to call
  setgroups.  So this remains within the bounds of what is possible
  without a knob to disable setgroups permanently in a user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Rename id_map_mutex to userns_state_mutex

commit f0d62aec931e4ae3333c797d346dc4f188f454ba upstream.

Generalize id_map_mutex so it can be used for more state of a user namespace.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Only allow the creator of the userns unprivileged mappings

commit f95d7918bd1e724675de4940039f2865e5eec5fe upstream.

If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Check euid no fsuid when establishing an unprivileged uid mapping

commit 80dd00a23784b384ccea049bfb3f259d3f973b9d upstream.

setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid. Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Don't allow unprivileged creation of gid mappings

commit be7c6dba2332cef0677fbabb606e279ae76652c3 upstream.

As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups. Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Don't allow setgroups until a gid mapping has been setablished

commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream.

setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established. Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userns: Document what the invariant required for safe unprivileged mappings.

commit 0542f17bf2c1f2430d368f44c8fcf2f82ec9e53e upstream.

The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

groups: Consolidate the setgroups permission checks

commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream.

Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged. Add a common function so that
they may all share the same permission checking code.

This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.

A user namespace security fix will update this new helper, shortly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

umount: Disallow unprivileged mount force

commit b2f5d4dc38e034eecb7987e513255265ff9aa1cf upstream.

Forced unmount affects not just the mount namespace but the underlying
superblock as well. Restrict forced unmount to the global root user
for now. Otherwise it becomes possible a user in a less privileged
mount namespace to force the shutdown of a superblock of a filesystem
in a more privileged mount namespace, allowing a DOS attack on root.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mnt: Update unprivileged remount test

commit 4a44a19b470a886997d6647a77bb3e38dcbfa8c5 upstream.

- MNT_NODEV should be irrelevant except when reading back mount flags,
  no longer specify MNT_NODEV on remount.

- Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts.

- Add a test to verify that remount of a prexisting mount with the same flags
  is allowed and does not change those flags.

- Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used
  when the code is built in an environment without them.

- Correct the test error messages when tests fail.  There were not 5 tests
  that tested MS_RELATIME.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount

commit 3e1866410f11356a9fd869beb3e95983dc79c067 upstream.

Now that remount is properly enforcing the rule that you can't remove
nodev at least sandstorm.io is breaking when performing a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount when nodev was implicitly added on mount.

Tested-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal: Fix error path in thermal_init()

commit 9d367e5e7b05c71a8c1ac4e9b6e00ba45a79f2fc upstream.

thermal_unregister_governors() and class_unregister() were being called in
the wrong order.

Fixes: 80a26a5c22b9 ("Thermal: build thermal governors into thermal_sys module")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mnt: Fix a memory stomp in umount

commit c297abfdf15b4480704d6b566ca5ca9438b12456 upstream.

While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.

Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.

This isn't likely to hit, but if it does I don't know how anyone could
track it down.

[ This happened because we don't have all the same operations for
  hlist's as we do for normal doubly-linked lists. In particular,
  list_splice() is easy on our standard doubly-linked lists, while
  hlist_splice() doesn't exist and needs both start/end entries of the
  hlist.  And commit 38129a13e6e7 incorrectly open-coded that missing
  hlist_splice().

  We should think about making these kinds of "mindless" conversions
  easier to get right by adding the missing hlist helpers   - Linus ]

Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: free management frame keys when removing station

commit 28a9bc68124c319b2b3dc861e80828a8865fd1ba upstream.

When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.

Fix this by iterating the array of keys over the right length.

Fixes: e31b82136d1a ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: fix multicast LED blinking and counter

commit d025933e29872cb1fe19fc54d80e4dfa4ee5779c upstream.

As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
stopped being incremented after the use-after-free fix. Furthermore, the
RX-LED will be triggered by every multicast frame (which wouldn't happen
before) which wouldn't allow the LED to rest at all.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
patch.

Fixes: b8fff407a180 ("mac80211: fix use-after-free in defragmentation")
Signed-off-by: Andreas Müller <goo@stapelspeicher.org>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KEYS: Fix stale key registration at error path

commit b26bdde5bb27f3f900e25a95e33a0c476c8c2c48 upstream.

When loading encrypted-keys module, if the last check of
aes_get_sizes() in init_encrypted() fails, the driver just returns an
error without unregistering its key type. This results in the stale
entry in the list. In addition to memory leaks, this leads to a kernel
crash when registering a new key type later.

This patch fixes the problem by swapping the calls of aes_get_sizes()
and register_key_type(), and releasing resources properly at the error
paths.

Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

isofs: Fix unchecked printing of ER records

commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream.

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/tls: Don't validate lm in set_thread_area() after all

commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream.

It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
.entry_number    = idx,
.base_addr       = base,
.limit           = 0xfffff,
.seg_32bit       = 1,
.contents        = 0, /* Data, grow-up */
.read_exec_only  = 0,
.limit_in_pages  = 1,
.seg_not_present = 0,
.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: mvebu: fix ordering in Armada 370 .dtsi

commit ab1e85372168892387dd1ac171158fc8c3119be4 upstream.

Commit a095b1c78a35 ("ARM: mvebu: sort DT nodes by address")
missed placing the system-controller in the correct order.

Fixes: a095b1c78a35 ("ARM: mvebu: sort DT nodes by address")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lkml.kernel.org/r/20141114204333.GS27002@pengutronix.de
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: tegra: Re-add removed SoC id macro to tegra_resume()

commit e4a680099a6e97ecdbb81081cff9e4a489a4dc44 upstream.

Commit d127e9c ("ARM: tegra: make tegra_resume can work with current and later
chips") removed tegra_get_soc_id macro leaving used cpu register corrupted after
branching to v7_invalidate_l1() and as result causing execution of unintended
code on tegra20. Possibly it was expected that r6 would be SoC id func argument
since common cpu reset handler is setting r6 before branching to tegra_resume(),
but neither tegra20_lp1_reset() nor tegra30_lp1_reset() aren't setting r6
register before jumping to resume function. Fix it by re-adding macro.

Fixes: d127e9c (ARM: tegra: make tegra_resume can work with current and later chips)
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: Add COMPAT_HWCAP_LPAE

commit 7d57511d2dba03a8046c8b428dd9192a4bfc1e73 upstream.

Commit a469abd0f868 (ARM: elf: add new hwcap for identifying atomic
ldrd/strd instructions) introduces HWCAP_ELF for 32-bit ARM
applications. As LPAE is always present on arm64, report the
corresponding compat HWCAP to user space.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm thin: fix missing out-of-data-space to write mode transition if blocks are released

commit 2c43fd26e46734430122b8d2ad3024bb532df3ef upstream.

Discard bios and thin device deletion have the potential to release data
blocks. If the thin-pool is in out-of-data-space mode, and blocks were
released, transition the thin-pool back to full write mode.

The correct time to do this is just after the thin-pool metadata commit.
It cannot be done before the commit because the space maps will not
allow immediate reuse of the data blocks in case there's a rollback
following power failure.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm thin: fix inability to discard blocks when in out-of-data-space mode

commit 45ec9bd0fd7abf8705e7cf12205ff69fe9d51181 upstream.

When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
function pointer was incorrectly being set to
process_prepared_discard_passdown rather than process_prepared_discard.

This incorrect function pointer meant the discard was being passed down,
but not effecting the mapping. As such any discard that was issued, in
an attempt to reclaim blocks, would not successfully free data space.

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm space map metadata: fix sm_bootstrap_get_nr_blocks()

commit c1c6156fe4d4577444b769d7edd5dd503e57bbc9 upstream.

This function isn't right and it causes a static checker warning:

drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
error: potentially using uninitialized 'sb_data_size'.

It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.

Fixes: 3241b1d3e0aa ('dm: add persistent data library')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm cache: dirty flag was mistakenly being cleared when promoting via overwrite

commit 1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5 upstream.

If the incoming bio is a WRITE and completely covers a block then we
don't bother to do any copying for a promotion operation. Once this is
done the cache block and origin block will be different, so we need to
set it to 'dirty'.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm cache: only use overwrite optimisation for promotion when in writeback mode

commit f29a3147e251d7ae20d3194ff67f109d71e501b4 upstream.

Overwrite causes the cache block and origin blocks to diverge, which
is only allowed in writeback mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm crypt: use memzero_explicit for on-stack buffer

commit 1a71d6ffe18c0d0f03fc8531949cc8ed41d702ee upstream.

Use memzero_explicit to cleanup sensitive data allocated on stack
to prevent the compiler from optimizing and removing memset() calls.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm bufio: fix memleak when using a dm_buffer's inline bio

commit 445559cdcb98a141f5de415b94fd6eaccab87e6d upstream.

When dm-bufio sets out to use the bio built into a struct dm_buffer to
issue an IO, it needs to call bio_reset after it's done with the bio
so that we can free things attached to the bio such as the integrity
payload. Therefore, inject our own endio callback to take care of
the bio_reset after calling submit_io's end_io callback.

Test case:
1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300
2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device
3. Repeatedly read metadata and watch kmalloc-192 leak!

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfs41: fix nfs4_proc_layoutget error handling

commit 4bd5a980de87d2b5af417485bde97b8eb3d6cf6a upstream.

nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt
early so that it is safe to call .release in case nfs4_alloc_pages
fails.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: a47970ff78147 ("NFSv4.1: Hold reference to layout hdr in layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: correct return values for .eh_abort_handler implementations

commit b6c92b7e0af575e2b8b05bdf33633cf9e1661cbf upstream.

The .eh_abort_handler needs to return SUCCESS, FAILED, or
FAST_IO_FAIL. So fixup all callers to adhere to this requirement.

Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

megaraid_sas: corrected return of wait_event from abort frame path

commit 170c238701ec38b1829321b17c70671c101bac55 upstream.

Corrected wait_event() call which was waiting for wrong completion
status (0xFF).

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-pci-o2micro: Fix Dell E5440 issue

commit 6380ea099cdd46d7377b6fbec0291cf2aa387bad upstream.

Fix Dell E5440 when reboot Linux, can't find o2micro sd host chip issue.

Fixes: 01acf6917aed (mmc: sdhci-pci: add support of O2Micro/BayHubTech SD hosts)
Signed-off-by: Peter Guo <peter.guo@bayhubtech.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: block: add newline to sysfs display of force_ro

commit 0031a98a85e9fca282624bfc887f9531b2768396 upstream.

Make force_ro consistent with other sysfs entries.

Fixes: 371a689f64b0d ('mmc: MMC boot partitions support')
Cc: Andrei Warkentin <andrey.warkentin@gmail.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: dw_mmc: avoid write to CDTHRCTL on older versions

commit 66dfd10173159cafa9cb0d39936b8daeaab8e3e0 upstream.

Commit f1d2736c8156 (mmc: dw_mmc: control card read threshold) added
dw_mci_ctrl_rd_thld() with an unconditional write to the CDTHRCTL
register at offset 0x100. However before version 240a, the FIFO region
started at 0x100, so the write messes with the FIFO and completely
breaks the driver.

If the version id < 240A, return early from dw_mci_ctl_rd_thld() so as
not to hit this problem.

Fixes: f1d2736c8156 (mmc: dw_mmc: control card read threshold)
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mfd: tc6393xb: Fail ohci suspend if full state restore is required

commit 1a5fb99de4850cba710d91becfa2c65653048589 upstream.

Some boards with TC6393XB chip require full state restore during system
resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000
tosa is one of them). Failing to do so would result in ohci Oops on
resume due to internal memory contentes being changed. Fail ohci suspend
on tc6393xb is full state restore is required.

Recommended workaround is to unbind tmio-ohci driver before suspend and
rebind it after resume.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/bitmap: always wait for writes on unplug.

commit 4b5060ddae2b03c5387321fafc089d242225697a upstream.

If two threads call bitmap_unplug at the same time, then
one might schedule all the writes, and the other might
decide that it doesn't need to wait.  But really it does.

It rarely hurts to wait when it isn't absolutely necessary,
and the current code doesn't really focus on 'absolutely necessary'
anyway.  So just wait always.

This can potentially lead to data corruption if a crash happens
at an awkward time and data was written before the bitmap was
updated.  It is very unlikely, but this should go to -stable
just to be safe.  Appropriate for any -stable.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit

commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream.

paravirt_enabled has the following effects:

- Disables the F00F bug workaround warning.  There is no F00F bug
   workaround any more because Linux's standard IDT handling already
   works around the F00F bug, but the warning still exists.  This
   is only cosmetic, and, in any event, there is no such thing as
   KVM on a CPU with the F00F bug.

- Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
   there should be no APM BIOS anyway.

- Disables tboot.  I think that the tboot code should check the
   CPUID hypervisor bit directly if it matters.

- paravirt_enabled disables espfix32.  espfix32 should *not* be
   disabled under KVM paravirt.

The last point is the purpose of this patch.  It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests.  Fixes CVE-2014-8134.

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86_64, switch_to(): Load TLS descriptors before switching DS and ES

commit f647d7c155f069c1a068030255c300663516420e upstream.

Otherwise, if buggy user code points DS or ES into the TLS
array, they would be corrupted after a context switch.

This also significantly improves the comments and documents some
gotchas in the code.

Before this patch, the both tests below failed.  With this
patch, the es test passes, although the gsbase test still fails.

----- begin es test -----

/*
* Copyright (c) 2014 Andy Lutomirski
* GPL v2
*/

static unsigned short GDT3(int idx)
{
return (idx << 3) | 3;
}

static int create_tls(int idx, unsigned int base)
{
struct user_desc desc = {
.entry_number    = idx,
.base_addr       = base,
.limit           = 0xfffff,
.seg_32bit       = 1,
.contents        = 0, /* Data, grow-up */
.read_exec_only  = 0,
.limit_in_pages  = 1,
.seg_not_present = 0,
.useable         = 0,
};

if (syscall(SYS_set_thread_area, &desc) != 0)
err(1, "set_thread_area");

return desc.entry_number;
}

int main()
{
int idx = create_tls(-1, 0);
printf("Allocated GDT index %d\n", idx);

unsigned short orig_es;
asm volatile ("mov %%es,%0" : "=rm" (orig_es));

int errors = 0;
int total = 1000;
for (int i = 0; i < total; i++) {
asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
usleep(100);

unsigned short es;
asm volatile ("mov %%es,%0" : "=rm" (es));
asm volatile ("mov %0,%%es" : : "rm" (orig_es));
if (es != GDT3(idx)) {
if (errors == 0)
printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
       GDT3(idx), es);
errors++;
}
}

if (errors) {
printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
return 1;
} else {
printf("[OK]\tES was preserved\n");
return 0;
}
}

----- end es test -----

----- begin gsbase test -----

/*
* gsbase.c, a gsbase test
* Copyright (c) 2014 Andy Lutomirski
* GPL v2
*/

static unsigned char *testptr, *testptr2;

static unsigned char read_gs_testvals(void)
{
unsigned char ret;
asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
return ret;
}

int main()
{
int errors = 0;

testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
if (testptr == MAP_FAILED)
err(1, "mmap");

testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
if (testptr2 == MAP_FAILED)
err(1, "mmap");

*testptr = 0;
*testptr2 = 1;

if (syscall(SYS_arch_prctl, ARCH_SET_GS,
    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
err(1, "ARCH_SET_GS");

usleep(100);

if (read_gs_testvals() == 1) {
printf("[OK]\tARCH_SET_GS worked\n");
} else {
printf("[FAIL]\tARCH_SET_GS failed\n");
errors++;
}

asm volatile ("mov %0,%%gs" : : "r" (0));

if (read_gs_testvals() == 0) {
printf("[OK]\tWriting 0 to gs worked\n");
} else {
printf("[FAIL]\tWriting 0 to gs failed\n");
errors++;
}

usleep(100);

if (read_gs_testvals() == 0) {
printf("[OK]\tgsbase is still zero\n");
} else {
printf("[FAIL]\tgsbase was corrupted\n");
errors++;
}

return errors == 0 ? 0 : 1;
}

----- end gsbase test -----

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/tls: Disallow unusual TLS segments

commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream.

Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.

For completeness, block attempts to set the L bit.  (Prior to
this patch, the L bit would have been silently dropped.)

This is an ABI break.  I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.

Note to stable maintainers: this is a hardening patch that fixes
no known bugs.  Given the possibility of ABI issues, this
probably shouldn't be backported quickly.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/tls: Validate TLS entries to protect espfix

commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream.

Installing a 16-bit RW data segment into the GDT defeats espfix.
AFAICT this will not affect glibc, Wine, or dosemu at all.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

isofs: Fix infinite looping over CE entries

commit f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream.

Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.

Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.14.27

ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery

commit 66139a48cee1530c91f37c145384b4ee7043f0b7 upstream.

In snd_usbmidi_error_timer(), the driver tries to resubmit MIDI input
URBs to reactivate the MIDI stream, but this causes the error when
some of URBs are still pending like:

WARNING: CPU: 0 PID: 0 at ../drivers/usb/core/urb.c:339 usb_submit_urb+0x5f/0x70()
URB ef705c40 submitted while active
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.6-2-desktop #1
Hardware name: FOXCONN TPS01/TPS01, BIOS 080015  03/23/2010
  c0984bfa f4009ed4 c078deaf f4009ee4 c024c884 c09a135c f4009f00 00000000
  c0984bfa 00000153 c061ac4f c061ac4f 00000009 00000001 ef705c40 e854d1c0
  f4009eec c024c8d3 00000009 f4009ee4 c09a135c f4009f00 f4009f04 c061ac4f
Call Trace:
  [<c0205df6>] try_stack_unwind+0x156/0x170
  [<c020482a>] dump_trace+0x5a/0x1b0
  [<c0205e56>] show_trace_log_lvl+0x46/0x50
  [<c02049d1>] show_stack_log_lvl+0x51/0xe0
  [<c0205eb7>] show_stack+0x27/0x50
  [<c078deaf>] dump_stack+0x45/0x65
  [<c024c884>] warn_slowpath_common+0x84/0xa0
  [<c024c8d3>] warn_slowpath_fmt+0x33/0x40
  [<c061ac4f>] usb_submit_urb+0x5f/0x70
  [<f7974104>] snd_usbmidi_submit_urb+0x14/0x60 [snd_usbmidi_lib]
  [<f797483a>] snd_usbmidi_error_timer+0x6a/0xa0 [snd_usbmidi_lib]
  [<c02570c0>] call_timer_fn+0x30/0x130
  [<c0257442>] run_timer_softirq+0x1c2/0x260
  [<c0251493>] __do_softirq+0xc3/0x270
  [<c0204732>] do_softirq_own_stack+0x22/0x30
  [<c025186d>] irq_exit+0x8d/0xa0
  [<c0795228>] smp_apic_timer_interrupt+0x38/0x50
  [<c0794a3c>] apic_timer_interrupt+0x34/0x3c
  [<c0673d9e>] cpuidle_enter_state+0x3e/0xd0
  [<c028bb8d>] cpu_idle_loop+0x29d/0x3e0
  [<c028bd23>] cpu_startup_entry+0x53/0x60
  [<c0bfac1e>] start_kernel+0x415/0x41a

For avoiding these errors, check the pending URBs and skip
resubmitting such ones.

Reported-and-tested-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix built-in mic at resume on Lenovo Ideapad S210

commit fedb2245cbb8d823e449ebdd48ba9bb35c071ce0 upstream.

The built-in mic boost volume gets almost muted after suspend/resume
on Lenovo Ideapad S210.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88121
Reported-and-tested-by: Roman Kagan <rkagan@mail.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Add EAPD fixup for ASUS Z99He laptop

commit f62f5eff3d40a56ad1cf0d81a6cac8dd8743e8a1 upstream.

The same fixup to enable EAPD is needed for ASUS Z99He with AD1986A
codec like another ASUS machine.

Reported-and-tested-by: Dmitry V. Zimin <pfzim@mail.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: Fix regression that triggers a kernel BUG with CCMP

commit 4f031fa9f188b2b0641ac20087d9e16bcfb4e49d upstream.

Commit 7ec7c4a9a686c608315739ab6a2b0527a240883c (mac80211: port CCMP to
cryptoapi's CCM driver) introduced a regression when decrypting empty
packets (data_len == 0). This will lead to backtraces like:

(scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8)
(scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c)
(crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170)
(ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238)
(ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24)
(ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c)
(ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730)
(ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20)
(__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320)
(carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8)
(carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc)
(tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200)
(__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0)
(irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80)
(handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c)
(__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34)

Such packets can appear for example when using the carl9170 wireless driver
because hardware sometimes generates garbage when the internal FIFO overruns.

This patch adds an additional length check.

Fixes: 7ec7c4a9a686 ("mac80211: port CCMP to cryptoapi's CCM driver")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: 32 bit getcpu VDSO function uses 64 bit instructions

commit 152d44a853e42952f6c8a504fb1f8eefd21fd5fd upstream.

I used some 64 bit instructions when adding the 32 bit getcpu VDSO
function. Fix it.

Fixes: 18ad51dd342a ("powerpc: Add VDSO version of getcpu")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

igb: bring link up when PHY is powered up

commit aec653c43b0c55667355e26d7de1236bda9fb4e3 upstream.

Call igb_setup_link() when the PHY is powered up.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Reported-by: Jeff Westfahl <jeff.westfahl@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/intel: Protect LBR and extra_regs against KVM lying

commit 338b522ca43cfd32d11a370f4203bcd089c6c877 upstream.

With -cpu host, KVM reports LBR and extra_regs support, if the host has
support.

When the guest perf driver tries to access LBR or extra_regs MSR,
it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
So check the related MSRs access right once at initialization time to avoid
the error access at runtime.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y
(for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
Start the guest with -cpu host.
Run perf record with --branch-any or --branch-filter in guest to trigger LBR
Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to
trigger offcore_rsp #GP

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1405365957-20202-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: sctp: use MAX_HEADER for headroom reserve in output path

[ Upstream commit 9772b54c55266ce80c639a80aa68eeb908f8ecf5 ]

To accomodate for enough headroom for tunnels, use MAX_HEADER instead
of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs
of trinity an skb_under_panic() via SCTP output path (see reference).
I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere
in other protocols might be one possible cause for this.

In any case, it looks like accounting on chunks themself seems to look
good as the skb already passed the SCTP output path and did not hit
any skb_over_panic(). Given tunneling was enabled in his .config, the
headroom would have been expanded by MAX_HEADER in this case.

Reported-by: Robert Święcki <robert@swiecki.net>
Reference: https://lkml.org/lkml/2014/12/1/507
Fixes: 594ccc14dfe4d ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: mvneta: fix race condition in mvneta_tx()

[ Upstream commit 5f478b41033606d325e420df693162e2524c2b94 ]

mvneta_tx() dereferences skb to get skb->len too late,
as hardware might have completed the transmit and TX completion
could have freed the skb from another cpu.

Fixes: 71f6d1b31fb1 ("net: mvneta: replace Tx timer with a real interrupt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: mvneta: fix Tx interrupt delay

[ Upstream commit aebea2ba0f7495e1a1c9ea5e753d146cb2f6b845 ]

The mvneta driver sets the amount of Tx coalesce packets to 16 by
default. Normally that does not cause any trouble since the driver
uses a much larger Tx ring size (532 packets). But some sockets
might run with very small buffers, much smaller than the equivalent
of 16 packets. This is what ping is doing for example, by setting
SNDBUF to 324 bytes rounded up to 2kB by the kernel.

The problem is that there is no documented method to force a specific
packet to emit an interrupt (eg: the last of the ring) nor is it
possible to make the NIC emit an interrupt after a given delay.

In this case, it causes trouble, because when ping sends packets over
its raw socket, the few first packets leave the system, and the first
15 packets will be emitted without an IRQ being generated, so without
the skbs being freed. And since the socket's buffer is small, there's
no way to reach that amount of packets, and the ping ends up with
"send: no buffer available" after sending 6 packets. Running with 3
instances of ping in parallel is enough to hide the problem, because
with 6 packets per instance, that's 18 packets total, which is enough
to grant a Tx interrupt before all are sent.

The original driver in the LSP kernel worked around this design flaw
by using a software timer to clean up the Tx descriptors. This timer
was slow and caused terrible network performance on some Tx-bound
workloads (such as routing) but was enough to make tools like ping
work correctly.

Instead here, we simply set the packet counts before interrupt to 1.
This ensures that each packet sent will produce an interrupt. NAPI
takes care of coalescing interrupts since the interrupt is disabled
once generated.

No measurable performance impact nor CPU usage were observed on small
nor large packets, including when saturating the link on Tx, and this
fixes tools like ping which rely on too small a send buffer. If one
wants to increase this value for certain workloads where it is safe
to do so, "ethtool -C $dev tx-frames" will override this default
setting.

This fix needs to be applied to stable kernels starting with 3.10.

Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gre: Set inner mac header in gro complete

[ Upstream commit 6fb2a756739aa507c1fd5b8126f0bfc2f070dc46 ]

Set the inner mac header to point to the GRE payload when
doing GRO. This is needed if we proceed to send the packet
through GRE GSO which now uses the inner mac header instead
of inner network header to determine the length of encapsulation
headers.

Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length")
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtnetlink: release net refcnt on error in do_setlink()

[ Upstream commit e0ebde0e131b529fd721b24f62872def5ec3718c ]

rtnl_link_get_net() holds a reference on the 'struct net', we need to release
it in case of error.

CC: Eric W. Biederman <ebiederm@xmission.com>
Fixes: b51642f6d77b ("net: Enable a userns root rtnl calls that are safe for unprivilged users")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/mlx4_core: Limit count field to 24 bits in qp_alloc_res

[ Upstream commit 2d5c57d7fbfaa642fb7f0673df24f32b83d9066c ]

Some VF drivers use the upper byte of "param1" (the qp count field)
in mlx4_qp_reserve_range() to pass flags which are used to optimize
the range allocation.

Under the current code, if any of these flags are set, the 32-bit
count field yields a count greater than 2^24, which is out of range,
and this VF fails.

As these flags represent a "best-effort" allocation hint anyway, they may
safely be ignored. Therefore, the PF driver may simply mask out the bits.

Fixes: c82e9aa0a8 "mlx4_core: resource tracking for HCA resources used by guests"
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>