sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.
Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.
Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).
Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.
It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.
By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.
The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.
Tested-by: Damien Wyart <damien.wyart@free.fr> Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Doug Smythies <dsmythies@telus.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes") Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
In calculate_imbalance() load_above_capacity currently has the unit
[capacity] while it is used as being [load/capacity]. Not only is it
wrong it also makes it unlikely that load_above_capacity is ever used
as the subsequent code picks the smaller of load_above_capacity and
the avg_load
This patch ensures that load_above_capacity has the right unit
[load/capacity].
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[ Changed changelog to note it was in capacity unit; +rebase. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 6 May 2016 10:21:23 +0000 (12:21 +0200)]
sched/fair: Clean up scale confusion
Wanpeng noted that the scale_load_down() in calculate_imbalance() was
weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
to compare against busiest->group_capacity, which is in [capacity]
units.
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Wanpeng Li [Wed, 4 May 2016 06:45:34 +0000 (14:45 +0800)]
sched/nohz: Fix affine unpinned timers mess
The following commit:
9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")'
intended to affine unpinned timers to housekeepers:
unpinned timers(full dynaticks, idle) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(full dynaticks, busy) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(houserkeepers, idle) => nearest busy housekeepers(otherwise, fallback to itself)
However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
intention to:
unpinned timers(full dynaticks, idle) => any housekeepers(no mattter cpu topology)
unpinned timers(full dynaticks, busy) => any housekeepers(no mattter cpu topology)
unpinned timers(housekeepers, idle) => any busy cpus(otherwise, fallback to any housekeepers)
This patch fixes it by checking if there are busy housekeepers nearby,
otherwise falls to any housekeepers/itself. After the patch:
unpinned timers(full dynaticks, idle) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(full dynaticks, busy) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(housekeepers, idle) => nearest busy housekeepers(otherwise, fallback to itself)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the changelog. ] Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 'commit 9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")' Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 11 May 2016 17:27:56 +0000 (19:27 +0200)]
sched/fair: Fix fairness issue on migration
Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.
The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).
This is done because update_curr() needs the current task, and we
might be placing the current task.
The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.
This commit re-introduces the previously reverted commit:
3a47d5124a95 ("sched/fair: Fix fairness issue on migration")
... which is now safe to do, after we've also fixed another
underlying bug first, in:
sched/fair: Prepare to fix fairness problems on migration
and cleaned up other details in the migration code:
sched/core: Kill sched_class::task_waking
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 10 May 2016 16:24:37 +0000 (18:24 +0200)]
sched/core: Kill sched_class::task_waking to clean up the migration logic
With sched_class::task_waking being called only when we do
set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
and eliminate sched_class::task_waking entirely.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Pavan Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 11 May 2016 14:10:34 +0000 (16:10 +0200)]
sched/fair: Prepare to fix fairness problems on migration
Mike reported that our recent attempt to fix migration problems:
3a47d5124a95 ("sched/fair: Fix fairness issue on migration")
broke interactivity and the signal starve test. We reverted that
commit and now let's try it again more carefully, with some other
underlying problems fixed first.
One problem is that I assumed ENQUEUE_WAKING was only set when we do a
cross-cpu wakeup (migration), which isn't true. This means we now
destroy the vruntime history of tasks and wakeup-preemption suffers.
Cure this by making my assumption true, only call
sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
the indirect call in the case we do a local wakeup.
Reported-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Pavan Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: linux-kernel@vger.kernel.org Fixes: 3a47d5124a95 ("sched/fair: Fix fairness issue on migration") Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 12 May 2016 07:19:59 +0000 (09:19 +0200)]
sched/fair: Move record_wakee()
Since I want to make ->task_woken() conditional on the task getting
migrated, we cannot use it to call record_wakee().
Move it to select_task_rq_fair(), which gets called in almost all the
same conditions. The only exception is if the woken task (@p) is
CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Pavan Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yazen Ghannam [Wed, 11 May 2016 12:58:24 +0000 (14:58 +0200)]
x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systems
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for
SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR}
on those systems.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Simplify, drop SMCA_MCAX_EN_OFF define too. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yazen Ghannam [Wed, 11 May 2016 12:58:23 +0000 (14:58 +0200)]
x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registers
Scalable MCA provides new registers for all banks for logging deferred
errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to
these registers.
Update the AMD deferred error handler to use these registers, if
available.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Sanity-check __log_error() args, massage a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michael Chan [Tue, 10 May 2016 23:18:00 +0000 (19:18 -0400)]
bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)
Add detection and recovery code when the hardware returned opaque value
does not match the expected consumer index. Once the issue is detected,
we skip the processing of all RX and LRO/GRO packets. These completion
entries are discarded without sending the SKB to the stack and without
producing new buffers. The function will be reset from a workqueue.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 10 May 2016 23:17:59 +0000 (19:17 -0400)]
bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)
There is a rare hardware bug that can cause a bad opaque value in the RX
or TPA completion. When this happens, the hardware may have used the
same buffer twice for 2 rx packets. In addition, the driver will also
crash later using the bad opaque as the index into the ring.
The rx opaque value is predictable and is always monotonically increasing.
The workaround is to keep track of the expected next opaque value and
compare it with the one returned by hardware during RX and TPA start
completions. If they miscompare, we will not process any more RX and
TPA completions and exit NAPI. We will then schedule a workqueue to
reset the function.
This patch adds the logic to keep track of the next rx consumer index.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 10 May 2016 19:20:04 +0000 (22:20 +0300)]
qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()
If qlcnic_fw_cmd_get_minidump_temp() fails then "fw_dump->tmpl_hdr" is
NULL or possibly freed. It can lead to an oops later.
Fixes: d01a6d3c8ae1 ('qlcnic: Add support to enable capability to extend minidump for iSCSI') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Thu, 12 May 2016 00:05:36 +0000 (10:05 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Two some radeon display fixes.
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix PLL sharing on DCE6.1 (v2)
drm/radeon: fix DP link training issue with second 4K monitor
Dave Airlie [Thu, 12 May 2016 00:05:06 +0000 (10:05 +1000)]
Merge tag 'drm-intel-fixes-2016-05-11' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Misc intel fixes, reverting MST audio which was causing oops for now.
* tag 'drm-intel-fixes-2016-05-11' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Bail out of pipe config compute loop on LPT
Revert "drm/i915: start adding dp mst audio"
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915/lvds: separate border enable readout from panel fitter
drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
Linus Torvalds [Wed, 11 May 2016 20:17:12 +0000 (13:17 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a couple of small fixes: one is a potential uninitialised
error variable in the alua code, potentially causing spurious failures
and the other is a problem caused by the conversion of SCSI to
hostwide tags which resulted in the qla1280 driver always failing in
host initialisation"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
qla1280: Don't allocate 512kb of host tags
scsi_dh_alua: uninitialized variable in alua_rtpg()
Pull networking fixes from David Miller:
"Hopefully the last round of fixes this release, fingers crossed :)
1) Initialize static nf_conntrack_locks_all_lock properly, from
Florian Westphal.
2) Need to cancel pending work when destroying IDLETIMER entries,
from Liping Zhang.
3) Fix TX param usage when sending TSO over iwlwifi devices, from
Emmanuel Grumbach.
4) NFACCT quota params not validated properly, from Phil Turnbull.
5) Resolve more glibc vs. kernel header conflicts, from Mikko
Tapeli.
6) Missing IRQ free in ravb_close(), from Geert Uytterhoeven.
7) Fix infoleak in x25, from Kangjie Lu.
8) Similarly in thunderx driver, from Heinrich Schuchardt.
9) tc_ife.h uapi header not exported properly, from Jamal Hadi Salim.
10) Don't reenable PHY interreupts if device is in polling mode, from
Shaohui Xie.
11) Packet scheduler actions late binding was not being handled
properly at all, from Jamal Hadi Salim.
12) Fix binding of conntrack entries to helpers in openvswitch, from
Joe Stringer"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
gre: do not keep the GRE header around in collect medata mode
openvswitch: Fix cached ct with helper.
net sched: ife action fix late binding
net sched: skbedit action fix late binding
net sched: simple action fix late binding
net sched: mirred action fix late binding
net sched: ipt action fix late binding
net sched: vlan action fix late binding
net: phylib: fix interrupts re-enablement in phy_start
tcp: refresh skb timestamp at retransmit time
net: nps_enet: bug fix - handle lost tx interrupts
net: nps_enet: Tx handler synchronization
export tc ife uapi header
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
ravb: Add missing free_irq() call to ravb_close()
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h
netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter
iwlwifi: mvm: don't override the rate with the AMSDU len
netfilter: IDLETIMER: fix race condition when destroy the target
...
Jiri Benc [Wed, 11 May 2016 13:53:57 +0000 (15:53 +0200)]
gre: do not keep the GRE header around in collect medata mode
For ipgre interface in collect metadata mode, it doesn't make sense for the
interface to be of ARPHRD_IPGRE type. The outer header of received packets
is not needed, as all the information from it is present in metadata_dst. We
already don't set ipgre_header_ops for collect metadata interfaces, which is
the only consumer of mac_header pointing to the outer IP header.
Just set the interface type to ARPHRD_NONE in collect metadata mode for
ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset
mac_header.
Fixes: a64b04d86d14 ("gre: do not assign header_ops in collect metadata mode") Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Wed, 11 May 2016 17:29:26 +0000 (10:29 -0700)]
openvswitch: Fix cached ct with helper.
When using conntrack helpers from OVS, a common configuration is to
perform a lookup without specifying a helper, then go through a
firewalling policy, only to decide to attach a helper afterwards.
In this case, the initial lookup will cause a ct entry to be attached to
the skb, then the later commit with helper should attach the helper and
confirm the connection. However, the helper attachment has been missing.
If the user has enabled automatic helper attachment, then this issue
will be masked as it will be applied in init_conntrack(). It is also
masked if the action is executed from ovs_packet_cmd_execute() as that
will construct a fresh skb.
This patch fixes the issue by making an explicit call to try to assign
the helper if there is a discrepancy between the action's helper and the
current skb->nfct.
Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Tue, 10 May 2016 21:07:02 +0000 (23:07 +0200)]
x86/extable: ensure entries are swapped completely when sorting
The x86 exception table sorting was changed in commit 29934b0fb8ff
("x86/extable: use generic search and sort routines") to use the arch
independent code in lib/extable.c. However, the patch was mangled
somehow on its way into the kernel from the last version posted at [1].
The committed version kind of attempted to incorporate the changes of
commit 548acf19234d ("x86/mm: Expand the exception table logic to allow
new handling options") as in _completely_ _ignoring_ the x86 specific
'handler' member of struct exception_table_entry. This effectively
broke the sorting as entries will only partly be swapped now.
Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the
exception table doesn't need to be sorted at runtime. However, in case
that ever changes, we better not break the exception table sorting just
because of that.
[ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the
core image only, but we still rely on the sorting routines for modules
in that case - Linus ]
Fix this by providing a swap_ex_entry_fixup() macro that takes care of
the 'handler' member.
[1] https://lkml.org/lkml/2016/1/27/232
Signed-off-by: Mathias Krause <minipli@googlemail.com> Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines") Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 11 May 2016 17:21:16 +0000 (10:21 -0700)]
Merge tag 'spi-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A bunch of small driver specific fixes that have come up, none of them
remarkable in themselves. One fixes a regression introduced in the
merge window and another two are targetted at stable"
* tag 'spi-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: omap2-mcspi: Undo broken fix for dma transfer of vmalloced buffer
spi: spi-fsl-dspi: Fix cs_change handling in message transfer
Linus Torvalds [Wed, 11 May 2016 17:11:44 +0000 (10:11 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Two small x86 patches, improving "make kvmconfig" and fixing an
objtool warning for CONFIG_PROFILE_ALL_BRANCHES"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvmconfig: add more virtio drivers
x86/kvm: Add stack frame dependency to fastop() inline asm
Masami Hiramatsu [Wed, 11 May 2016 13:52:08 +0000 (22:52 +0900)]
perf symbols: Use lsdir() for the search in kcore cache directory
Use lsdir() to search in kcore cache directory. This also avoids
checking hidden dot directory entries, because kcore cache directories
must always have the name from timestamps when taking the kcore
snapshots, and it never start with dot.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160511135208.23943.68071.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 11 May 2016 13:51:27 +0000 (22:51 +0900)]
perf tools: Fix lsdir to set errno correctly
Fix lsdir() to set correct positive error number (ENOMEM). Since
"errno" must have a positive error number instead of negative number,
fix lsdir to set it correctly.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: e1ce726e1db2 ("perf tools: Add lsdir() helper to read a directory") Link: http://lkml.kernel.org/r/20160511135127.23943.40644.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
He Kuang [Tue, 10 May 2016 07:40:31 +0000 (07:40 +0000)]
perf build: Add build-test for libunwind cross-platforms support
Currently only test for local libunwind. We should check all supported
platforms so we can use them to parse perf.data with callchain info on
different machines.
Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1462866037-30382-4-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Chris Phlipot [Wed, 11 May 2016 03:26:49 +0000 (20:26 -0700)]
perf script: Fix export of callchains with recursion in db-export
When an IP with an unresolved symbol occurs in the callchain more than
once (ie. recursion), then duplicate symbols can be created because
the callchain nodes are never updated after they are first created.
To fix this issue we call dso__find_symbol whenever we encounter a NULL
symbol, in case we already added a symbol at that IP since we started
traversing the callchain.
This change prevents duplicate symbols from being exported when duplicate
IPs are present in the callchain.
Calling it a second time can result in incorrect addresses being used.
This can have effects such as duplicate symbols being created and
exported.
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1462937209-6032-4-git-send-email-cphlipot0@gmail.com
[ Show the callchain where it is done, to help reviewing this change down the line ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Chris Phlipot [Wed, 11 May 2016 03:26:46 +0000 (20:26 -0700)]
perf symbols: Add dso__insert_symbol function
The current method for inserting symbols is to use the symbols__insert()
function. However symbols__insert() does not update the dso symbol
cache. This causes problems in the following scenario:
1. symbol not found at addr using dso__find_symbol
2. symbol inserted at addr using the existing symbols__insert function
3. symbol still not found at addr using dso__find_symbol() because cache isn't
updated. This is undesired behavior.
The undesired behavior in (3) is addressed by creating a new function,
dso__insert_symbol() to both insert the symbol and update the symbol
cache if necessary.
If dso__insert_symbol() is used in (2) instead of symbols__insert(),
then the undesired behavior in (3) is avoided.
perf scripting python: Use Py_FatalError instead of die()
It probably is equivalent, but that seems to be the "pythonic" way of
dieing? Anyway, one less die() in the tools/perf codebase.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Chris Phlipot <cphlipot0@gmail.com> Link: http://lkml.kernel.org/n/tip-nlzgepdv2818zs4e7faif9tu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 10 May 2016 14:26:24 +0000 (11:26 -0300)]
perf diff: Fix duplicated output column
The commit b97511c5bc94 ("perf tools: Add overhead/overhead_children
keys defaults via string") moved initialization of column headers but it
missed to check the sort__mode. As 'perf diff' doesn't call
perf_hpp__init(), the setup_overhead() also should not be called.
Boris Brezillon [Wed, 11 May 2016 09:00:02 +0000 (11:00 +0200)]
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
The memory range assigned to the PMC (Power Management Controller) was
not including the PMC_PCR register which are used to control peripheral
clocks.
This was working fine thanks to the page granularity of ioremap(), but
started to fail when we switched to syscon/regmap, because regmap is
making sure that all accesses are falling into the reserved range.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: Richard Genoud <richard.genoud@gmail.com> Tested-by: Richard Genoud <richard.genoud@gmail.com> Fixes: 863a81c3be1d ("clk: at91: make use of syscon to share PMC registers in several drivers") Cc: <stable@vger.kernel.org> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Takashi Iwai [Wed, 11 May 2016 12:56:12 +0000 (14:56 +0200)]
ALSA: hda - Fix regression on ATI HDMI audio
The HDMI/DP audio output on ATI/AMD chips got broken due to the recent
restructuring of chmap. Fortunately, Daniel Exner could bisect, and
pointed the culprit commit [739ffee97ed5: ALSA: hda - Add hdmi chmap
verb programming ops to chmap object].
This commit moved some ops from hdmi_ops to chmap_ops, and reassigned
the ops in the embedded chmap object in hdmi_spec instead.
Unfortunately, the reassignment of these ops in patch_atihdmi() were
moved into an if block that is performed only for old chips. Thus, on
newer chips, the generic ops is still used, which doesn't work for
such ATI/AMD chips.
This patch addresses the regression, simply by moving the assignment
of chmap ops to the right place.
Miklos Szeredi [Tue, 10 May 2016 23:16:37 +0000 (01:16 +0200)]
ovl: ignore permissions on underlying lookup
Generally permission checking is not necessary when overlayfs looks up a
dentry on one of the underlying layers, since search permission on base
directory was already checked in ovl_permission().
More specifically using lookup_one_len() causes a problem when the lower
directory lacks search permission for a specific user while the upper
directory does have search permission. Since lookups are cached, this
causes inconsistency in behavior: success depends on who did the first
lookup.
So instead use lookup_hash() which doesn't do the permission check.
Miklos Szeredi [Tue, 10 May 2016 23:16:37 +0000 (01:16 +0200)]
vfs: add lookup_hash() helper
Overlayfs needs lookup without inode_permission() and already has the name
hash (in form of dentry->d_name on overlayfs dentry). It also doesn't
support filesystems with d_op->d_hash() so basically it only needs
the actual hashed lookup from lookup_one_len_unlocked()
So add a new helper that does unlocked lookup of a hashed name.
Miklos Szeredi [Tue, 10 May 2016 23:16:37 +0000 (01:16 +0200)]
vfs: rename: check backing inode being equal
If a file is renamed to a hardlink of itself POSIX specifies that rename(2)
should do nothing and return success.
This condition is checked in vfs_rename(). However it won't detect hard
links on overlayfs where these are given separate inodes on the overlayfs
layer.
Overlayfs itself detects this condition and returns success without doing
anything, but then vfs_rename() will proceed as if this was a successful
rename (detach_mounts(), d_move()).
The correct thing to do is to detect this condition before even calling
into overlayfs. This patch does this by calling vfs_select_inode() to get
the underlying inodes.
David S. Miller [Wed, 11 May 2016 03:50:16 +0000 (23:50 -0400)]
Merge branch 'net-sched-fixes'
Jamal Hadi Salim says:
====================
Some actions were broken in allowing for late binding of actions.
Late binding workflow is as follows:
a) create an action and provide all necessary parameters for it
Optionally provide an index or let the kernel give you one.
Example:
sudo tc actions add action police rate 1kbit burst 90k drop index 1
b) later on bind to the pre-created action from a filter definition
by merely specifying the index.
Example:
sudo tc filter add dev lo parent ffff: protocol ip prio 8 \
u32 match ip src 127.0.0.8/32 flowid 1:8 action police index 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:31 +0000 (16:49 -0400)]
net sched: ife action fix late binding
The process below was broken and is fixed with this patch.
//add an ife action and give it an instance id of 1
sudo tc actions add action ife encode \
type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1
//create a filter which binds to ife action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:11 action ife index 1
Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:30 +0000 (16:49 -0400)]
net sched: skbedit action fix late binding
The process below was broken and is fixed with this patch.
//add a skbedit action and give it an instance id of 1
sudo tc actions add action skbedit mark 10 index 1
//create a filter which binds to skbedit action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action skbedit index 1
Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:29 +0000 (16:49 -0400)]
net sched: simple action fix late binding
The process below was broken and is fixed with this patch.
//add a simple action and give it an instance id of 1
sudo tc actions add action simple sdata "foobar" index 1
//create a filter which binds to simple action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple index 1
Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:28 +0000 (16:49 -0400)]
net sched: mirred action fix late binding
The process below was broken and is fixed with this patch.
//add an mirred action and give it an instance id of 1
sudo tc actions add action mirred egress mirror dev $MDEV index 1
//create a filter which binds to mirred action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action mirred index 1
Message before bug fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:27 +0000 (16:49 -0400)]
net sched: ipt action fix late binding
This was broken and is fixed with this patch.
//add an ipt action and give it an instance id of 1
sudo tc actions add action ipt -j mark --set-mark 2 index 1
//create a filter which binds to ipt action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action ipt index 1
Message before bug fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Tue, 10 May 2016 20:49:26 +0000 (16:49 -0400)]
net sched: vlan action fix late binding
Late vlan action binding was broken and is fixed with this patch.
//add a vlan action to pop and give it an instance id of 1
sudo tc actions add action vlan pop index 1
//create filter which binds to vlan action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 \
match ip dst 17.0.0.1/32 flowid 1:1 action vlan index 1
current message(before bug fix) was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shaohui Xie [Tue, 10 May 2016 09:42:26 +0000 (17:42 +0800)]
net: phylib: fix interrupts re-enablement in phy_start
If phy was suspended and is starting, current driver always enable
phy's interrupts, if phy works in polling, phy can raise unexpected
interrupt which will not be handled, the interrupt will block system
enter suspend again. So interrupts should only be re-enabled if phy
works in interrupt.
Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 May 2016 03:55:16 +0000 (20:55 -0700)]
tcp: refresh skb timestamp at retransmit time
In the very unlikely case __tcp_retransmit_skb() can not use the cloning
done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing
the copy and transmit, otherwise TCP TS val will be an exact copy of
original transmit.
Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arm64/sunxi: 4.6-rc1: Add dependency on generic irq chip
Commit ce3dd55b99b1 ("arm64: Introduce Allwinner SoC config option"),
added support for ARCH_SUNXI on arm64, but failed to select
GENERIC_IRQ_CHIP, which is required for drivers/irqchip/irq-sunxi-nmi.c
and causes build failures like :
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
drivers/built-in.o: In function `sunxi_sc_nmi_set_type':
drivers/irqchip/irq-sunxi-nmi.c:114: undefined reference to `irq_setup_alt_chip'
drivers/built-in.o: In function `irq_domain_add_linear':
include/linux/irqdomain.h:253: undefined reference to `irq_generic_chip_ops'
include/linux/irqdomain.h:253: undefined reference to `irq_generic_chip_ops'
drivers/built-in.o: In function `sunxi_sc_nmi_irq_init':
drivers/irqchip/irq-sunxi-nmi.c:146: undefined reference to `irq_alloc_domain_generic_chips'
drivers/irqchip/irq-sunxi-nmi.c:161: undefined reference to `irq_get_domain_generic_chip'
drivers/irqchip/irq-sunxi-nmi.c:170: undefined reference to `irq_gc_mask_clr_bit'
drivers/irqchip/irq-sunxi-nmi.c:171: undefined reference to `irq_gc_mask_set_bit'
drivers/irqchip/irq-sunxi-nmi.c:172: undefined reference to `irq_gc_ack_set_bit'
drivers/irqchip/irq-sunxi-nmi.c:170: undefined reference to `irq_gc_mask_clr_bit'
Fixes: commit ce3dd55b99b1 ("arm64: Introduce Allwinner SoC config option") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
David S. Miller [Tue, 10 May 2016 19:04:57 +0000 (15:04 -0400)]
Merge branch 'nps_enet-fixes'
Elad Kanfi says:
====================
nps_enet: Net driver bugs fix
v3:
tx_packet_sent flag is not necessary, use socket buffer pointer
instead.
Use wmb() instead of smp_wmb().
v2:
Remove code style commit for now.
Code style commit will be added after the bugs fix will be approved.
Summary:
1. Bug description: TX done interrupts that arrives while interrupts
are masked, during NAPI poll, will not trigger an interrupt handling.
Since TX interrupt is of level edge we will lose the TX done interrupt.
As a result all pending tx frames will get no service.
Solution: Check if there is a pending tx request after unmasking the
interrupt and if answer is yes then re-add ourselves to
the NAPI poll list.
2. Bug description: CPU-A before sending a frame will set a variable
to true. CPU-B that executes the tx done interrupt service routine
might read a non valid value of that variable.
Solution: Use the socket buffer pointer instead of the variable,
and add a write memory barrier at the tx sending function after
the pointer is set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Kanfi [Mon, 9 May 2016 17:13:20 +0000 (20:13 +0300)]
net: nps_enet: bug fix - handle lost tx interrupts
The tx interrupt is of edge type, and in case such interrupt is triggered
while it is masked it will not be handled even after tx interrupts are
re-enabled in the end of NAPI poll.
This will cause tx network to stop in the following scenario:
* Rx is being handled, hence interrupts are masked.
* Tx interrupt is triggered after checking if there is some tx to handle
and before re-enabling the interrupts.
In this situation only rx transaction will release tx requests.
In order to handle the tx that was missed( if there was one ),
a NAPI reschdule was added after enabling the interrupts.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Acked-by: Gilad Ben-Yossef <giladby@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Kanfi [Mon, 9 May 2016 17:13:19 +0000 (20:13 +0300)]
net: nps_enet: Tx handler synchronization
Below is a description of a possible problematic
sequence. CPU-A is sending a frame and CPU-B handles
the interrupt that indicates the frame was sent. CPU-B
reads an invalid value of tx_packet_sent.
end memory transaction
(tx_packet_sent actually
written)
Furthermore there is a dependency between tx_skb and tx_packet_sent.
There is no assurance that tx_skb contains a valid pointer at CPU B
when it sees tx_packet_sent == true.
Solution:
Initialize tx_skb to NULL and use it to indicate that packet was sent,
in this way tx_packet_sent can be removed.
Add a write memory barrier after setting tx_skb in order to make sure
that it is valid before HW is informed and IRQ is fired.
Fixed sequence will be:
CPU-A CPU-B
----- -----
tx_skb = skb
wmb()
.
.
order HW to start tx
.
.
HW complete tx
------> get tx complete interrupt
.
.
if(tx_skb != NULL)
handle tx_skb
tx_skb = NULL
Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Acked-by: Gilad Ben-Yossef <giladby@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 10 May 2016 19:04:40 +0000 (12:04 -0700)]
Merge tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Since v4.5, we've WARNed during resume if a PCI device, including a
Thunderbolt device, was added while we were suspended. A change we
merged for v4.6-rc1 turned that warning into a system hang. These
enumeration patches from Lukas Wunner fix this issue:
- Fix BUG on device attach failure
- Do not treat EPROBE_DEFER as device attach failure"
* tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Do not treat EPROBE_DEFER as device attach failure
PCI: Fix BUG on device attach failure
Linus Torvalds [Tue, 10 May 2016 18:41:05 +0000 (11:41 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two topology corner case fixes, and a MAINTAINERS file update for
mmiotrace maintenance"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n
MAINTAINERS: Add mmiotrace entry
x86/topology: Handle CPUID bogosity gracefully
Andrey Utkin [Mon, 9 May 2016 15:04:19 +0000 (18:04 +0300)]
kvmconfig: add more virtio drivers
"make defconfig kvmconfig" is supposed to end up with usable kernel for
KVM guest. In practice, it won't work for e.g. Hetzner VPS (KVM-based)
unless you add these options.
Signed-off-by: Andrey Utkin <andrey_utkin@fastmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Josh Poimboeuf [Wed, 9 Mar 2016 18:59:50 +0000 (12:59 -0600)]
x86/kvm: Add stack frame dependency to fastop() inline asm
The kbuild test robot reported this objtool warning [1]:
arch/x86/kvm/emulate.o: warning: objtool: fastop()+0x69: call without frame pointer save/setup
The issue seems to be caused by CONFIG_PROFILE_ALL_BRANCHES. With that
option, for some reason gcc decides not to create a stack frame in
fastop() before doing the inline asm call, which can result in a bad
stack trace.
Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statement.
This change has no effect for !CONFIG_PROFILE_ALL_BRANCHES.
Masami Hiramatsu [Tue, 10 May 2016 05:47:26 +0000 (14:47 +0900)]
perf tools: Make alias handler to check return value of strbuf
Make alias handler and sq_quote_argv to check the return value of strbuf
APIs.
In sq_quote_argv() calls die(), but this fix handles strbuf failure as a
special case and returns to caller, since the caller - handle_alias()
also has to check the return value of other strbuf APIs and those checks
can be merged to one if() statement.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054725.6158.84597.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yura Pakhuchiy [Sat, 7 May 2016 16:53:36 +0000 (23:53 +0700)]
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
Subwoofer does not work out of the box on ASUS N751/N551 laptops. This
patch fixes it. Patch tested on N751 laptop. N551 part is not tested,
but according to [1] and [2] this laptop requires similar changes, so I
included them in the patch.
Arnd Bergmann [Tue, 10 May 2016 09:27:45 +0000 (11:27 +0200)]
Merge tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes
Merge "at91: fixes for 4.6 #1" from Nicolas Ferre:
Here is a late fix for AT91. Sorry to have figure it out so late in the
development cycle but we had to confirm it was an error with the documentation
of two products.
So, as the compatibility string is in since 4.6-rc1 and that the previous one
works okay, it's a good opportunity to switch back to the one that works without
introducing a intermediary bug.
The revert on driver code and the removal of the useless additional
compatibility string will be queued for 4.7 through NAND/MTD.
* tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc
Takashi Iwai [Tue, 10 May 2016 08:24:02 +0000 (10:24 +0200)]
ALSA: hda - Fix broken reconfig
The HD-audio reconfig function got broken in the recent kernels,
typically resulting in a failure like:
snd_hda_intel 0000:00:1b.0: control 3:0:0:Playback Channel Map:0 is already present
This is because of the code restructuring to move the PCM and control
instantiation into the codec drive probe, by the commit [bcd96557bd0a:
ALSA: hda - Build PCMs and controls at codec driver probe]. Although
the commit above removed the calls of snd_hda_codec_build_pcms() and
*_build_controls() at the controller driver probe, the similar calls
in the reconfig were still left forgotten. This caused the
conflicting and duplicated PCMs and controls.
The fix is trivial: just remove these superfluous calls from
reconfig_codec().
Kees Cook [Mon, 9 May 2016 20:22:09 +0000 (13:22 -0700)]
x86/KASLR: Clarify purpose of each get_random_long()
KASLR will be calling get_random_long() twice, but the debug output
won't distinguishing between them. This patch adds a report on when it
is fetching the physical vs virtual address. With this, once the virtual
offset is separate, the report changes from:
KASLR using RDTSC...
KASLR using RDTSC...
into:
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Baoquan He [Mon, 9 May 2016 20:22:08 +0000 (13:22 -0700)]
x86/KASLR: Add virtual address choosing function
To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.
For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kees Cook [Mon, 9 May 2016 20:22:07 +0000 (13:22 -0700)]
x86/KASLR: Return earliest overlap when avoiding regions
In preparation for being able to detect where to split up contiguous
memory regions that overlap with memory regions to avoid, we need to
pass back what the earliest overlapping region was. This modifies the
overlap checker to return that information.
Based on a separate mem_min_overlap() implementation by Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Baoquan He [Mon, 9 May 2016 20:22:06 +0000 (13:22 -0700)]
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.
In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.
This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.
Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.
The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, squashed with new functions. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kees Cook [Mon, 9 May 2016 20:22:05 +0000 (13:22 -0700)]
x86/boot: Add missing file header comments
There were some files with missing header comments. Since they are
included from both compressed and regular kernels, make note of that.
Also corrects a typo in the mem_avoid comments.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kees Cook [Mon, 9 May 2016 20:22:04 +0000 (13:22 -0700)]
x86/KASLR: Initialize mapping_info every time
As it turns out, mapping_info DOES need to be initialized every
time, because pgt_data address could be changed during kernel
relocation. So it can not be build time assigned.
Without this, page tables were not being corrected updated, which
could cause reboots when a physical address beyond 2G was chosen.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Sat, 7 May 2016 09:59:40 +0000 (11:59 +0200)]
x86/boot: Comment what finalize_identity_maps() does
So it is not really obvious that finalize_identity_maps() doesn't do any
finalization but it *actually* writes CR3 with the ident PGD. Comment
that at the call site.
It happens because in find_lock_later_rq(), the task whose scheduling
class was changed to fair class is still pushed away as if it were
a deadline task ...
So, check in find_lock_later_rq() after double_lock_balance(), if the
scheduling class of the deadline task was changed, break and retry.
Apply the same logic to RT tasks.
Signed-off-by: Xunlei Pang <xlpang@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@arm.com> Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Daniel Vetter [Tue, 3 May 2016 08:33:01 +0000 (10:33 +0200)]
drm/i915: Bail out of pipe config compute loop on LPT
LPT is pch, so might run into the fdi bandwidth constraint (especially
since it has only 2 lanes). But right now we just force pipe_bpp back
to 24, resulting in a nice loop (which we bail out with a loud
WARN_ON). Fix this.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=93477 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1462264381-7573-1-git-send-email-daniel.vetter@ffwll.ch
(cherry picked from commit f58a1acc7e4a1f37d26124ce4c875c647fbcc61f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Thomas Gleixner [Tue, 10 May 2016 07:20:33 +0000 (09:20 +0200)]
x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n
Josef reported that the uncore driver trips over with CONFIG_SMP=n because
x86_max_cores is 16 instead of 12.
The reason is, that for SMP=n the extended topology detection is a NOOP and
the cache leaf is used to determine the number of cores. That's wrong in two
aspects:
1) The cache leaf enumerates the maximum addressable number of cores in the
package, which is obviously not correct
2) UP has no business with topology bits at all.
Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n
Reported-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team <Kernel-team@fb.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
Wenyou Yang [Mon, 9 May 2016 06:51:19 +0000 (14:51 +0800)]
ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc
An error in documentation of the NAND Flash Controller (NFC) led to choose
another compatibility string for sama5d2 with an impact on the NAND flash
ready/busy information. It was producing the error message:
atmel_nand 80000000.nand: Time out to wait for interrupt: 0x08000000
and had an impact on performance.
So, switch back to the classical "atmel,sama5d3-nfc" compatibility string for
this SoC which gives the proper ready/busy bit information. The NAND flash
driver will be updated to remove the support for this different
implementation.
Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com> Acked-by: Romain Izard <romain.izard.pro@gmail.com>
[nicolas.ferre@atmel.com: change commit message] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>