Chao Yu [Wed, 23 Dec 2015 09:11:43 +0000 (17:11 +0800)]
f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 22 Dec 2015 20:59:54 +0000 (12:59 -0800)]
f2fs: record node block allocation in dnode_of_data
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.
Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 17 Dec 2015 09:13:28 +0000 (17:13 +0800)]
f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.
But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.
In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.
When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 16 Dec 2015 05:09:20 +0000 (13:09 +0800)]
f2fs: record dirty status of regular/symlink inode
Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Tue, 15 Dec 2015 09:02:41 +0000 (17:02 +0800)]
f2fs: fix to reset variable correctlly
f2fs_map_blocks will set m_flags and m_len to 0, so we don't need to
reset m_flags ourselves, but have to reset m_len to correct value
before use it again.
Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 15 Dec 2015 05:30:45 +0000 (13:30 +0800)]
f2fs: introduce dirty list node in inode info
Add a new dirt list node member in inode info for linking the inode to
global dirty list in superblock, instead of old implementation which
allocate slab cache memory as an entry to inode.
It avoids memory pressure due to slab cache allocation, and also makes
codes more clean.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 15 Dec 2015 05:29:47 +0000 (13:29 +0800)]
f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry
remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.
Here rename ino management related functions for readability, also in
order to avoid name conflict.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Mon, 14 Dec 2015 05:34:00 +0000 (13:34 +0800)]
f2fs: write only the pages in range during defragment
@lend of filemap_write_and_wait_range is supposed to be a "offset
in bytes where the range ends (inclusive)". Subtract 1 to avoid
writing an extra page.
Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 11 Dec 2015 08:08:22 +0000 (16:08 +0800)]
f2fs: clean up node page updating flow
If read_node_page return LOCKED_PAGE, in its caller it's better a) skip
unneeded 'Update' flag and mapping info verfication; b) check nid value
stored in footer structure of node page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 7 Dec 2015 18:16:58 +0000 (10:16 -0800)]
f2fs: refactor f2fs_commit_super
Previously, f2fs_commit_super hacks the bh->blocknr to write the broken
alternate superblock.
Instead of it, we should use the correct logic to retrieve its buffer head
with locking it appropriately.
Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 1 Dec 2015 03:36:16 +0000 (11:36 +0800)]
f2fs: fix to convert inline inode in ->setattr
In commit 3c4541452748 ("f2fs: do not trim preallocated blocks when
truncating after i_size"), in order to follow the regulation: "truncate(x)
where x > i_size will not trim all blocks past i_size." like other file
systems, in ->setattr we invoked truncate_setsize instead of f2fs_truncate
to avoid unneeded block trimming in such case, but forgot to call
f2fs_convert_inline_inode keep consistency of inline data conversion rule.
This patch fixes to convert inline data if necessary.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 1 Dec 2015 03:47:33 +0000 (11:47 +0800)]
f2fs: kill f2fs_drop_largest_extent
For direct IO, f2fs only allocate new address for the block which is not
exist in the disk before, its mapping info should not exist in extent
cache previously, so here we do not need to call f2fs_drop_largest_extent
to drop related cache.
Due to no more callers for f2fs_drop_largest_extent now, kill it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 1 Dec 2015 00:26:44 +0000 (16:26 -0800)]
f2fs: avoid deadlock in f2fs_shrink_extent_tree
While handling extent trees, we can enter into a reclaiming path anytime.
If it tries to release some extent nodes in the same extent tree,
write_lock(&et->lock) would be hanged.
In order to avoid the deadlock, we can just skip it.
Note that, if it is an unreferenced tree, we should get write_lock(&et->lock)
successfully and release all of therein nodes.
Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 19 Nov 2015 08:09:07 +0000 (16:09 +0800)]
f2fs: fix to report error in f2fs_readdir
get_lock_data_page in f2fs_readdir can fail due to a lot of reasons (i.e.
no memory or IO error...), it's better to report this kind of error to
user rather than ignoring it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 13 Nov 2015 10:27:35 +0000 (18:27 +0800)]
f2fs: clear page uptodate when dropping cache for atomic write
We should clear uptodate flag for all pages atomic written when we drop
them, otherwise before these cached pages were reclaimed or invalidated
eventually, we will see invalid data when hitting them again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Thu, 12 Nov 2015 00:43:04 +0000 (08:43 +0800)]
f2fs: optimize __find_rev_next_bit
1. Skip __reverse_ulong if the bitmap is empty.
2. Reduce branches and codes.
According to my test, the performance of this new version is 5% higher on
an empty bitmap of 64bytes, and remains about the same in the worst scenario.
Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 10 Nov 2015 10:45:07 +0000 (18:45 +0800)]
f2fs: fix to remove directory inode from dirty list
If last dirty dentry page was writebacked in reclaim path, we should
remove its directory inode from global dirty list to avoid unnecessary
flush for this inode when doing checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 10 Nov 2015 10:44:20 +0000 (18:44 +0800)]
f2fs: fix to enable missing ioctl interfaces in ->compat_ioctl
In 64-bit kernel f2fs can supports 32-bit ioctl system call by identifying
encoded code which is converted from 32-bit one to 64-bit one in
->compat_ioctl.
When we introduced new interfaces in ->ioctl, we forgot to enable them in
->compat_ioctl, so enable them for fixing.
Chao Yu [Tue, 27 Oct 2015 01:53:45 +0000 (09:53 +0800)]
f2fs: support file defragment
This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file
defragment in a specified range of regular file.
This ioctl can be used in very limited workload: if user expects high
sequential read performance in randomly written file, this interface
can be used for defragmentation, after that file can be written as
continuous as possible in the device.
Meanwhile, it has side-effect, it will make holes in segments where
blocks located originally, so it's better to trigger GC to eliminate
fragment in segments.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 28 Oct 2015 09:56:14 +0000 (17:56 +0800)]
f2fs: commit atomic written page in LFS mode
We should always commit atomic written pages in LFS mode, otherwise data
will become corrupted if we encounter suddent power cut after partial
pages committed in IPU mode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Fri, 4 Dec 2015 19:30:45 +0000 (11:30 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- NFIT parsing regression fixes from Linda. The nvdimm hot-add
implementation merged in 4.4-rc1 interpreted the specification in a
way that breaks actual HPE platforms. We are also closing the loop
with the ACPI Working Group to get this clarification added to the
spec.
- Andy pointed out that his laptop without nvdimm resources is loading
the e820-nvdimm module by default, fix that up to only load the
module when an e820-type-12 range is present.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: Adjust for different _FIT and NFIT headers
nfit: Fix the check for a successful NFIT merge
nfit: Account for table size length variation
libnvdimm, e820: skip module loading when no type-12
Linus Torvalds [Fri, 4 Dec 2015 18:17:20 +0000 (10:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull ARM KVM fixes from Paolo Bonzini:
- a series of fixes to deal with the aliasing between the sp and xzr
register
- a fix for the cache flush fix that went in -rc3
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
ARM/arm64: KVM: correct PTE uncachedness check
arm64: KVM: Get rid of old vcpu_reg()
arm64: KVM: Correctly handle zero register in system register accesses
arm64: KVM: Remove const from struct sys_reg_params
arm64: KVM: Correctly handle zero register during MMIO
Linus Torvalds [Fri, 4 Dec 2015 17:16:26 +0000 (09:16 -0800)]
Merge tag 'sound-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This time we've got a larger number of updates, mainly from ASoC
world. The only significant LOCs found here are for Realtek codecs,
where most of changes are quite systematic replacements.
There are also a few fixes in ASoC core side: one is the PM call order
fix to ensure the DPAM resume working properly. Another is the proper
cleanup call after freeing DAPM widgets, and the correction of the
wrong callback set in topology API.
The rest are a wide range of driver-specific small fixes, including
HD-audio"
* tag 'sound-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
ALSA: hda - Add Conexant CX8200 (14f1:2008) codec entry
ALSA: hda - Correct codec names for 14f1:50f1 and 14f1:50f3
ALSA: hda - Skip ELD notification during system suspend
ASoC: core: Change power state before rechecking endpoint
ASoC: fix kernel-doc warnings in sound/soc/soc-ops.c
ASoC: rt5645: Add dmi_system_id "Google Terra"
ASoC: rockchip: Fix incorrect VDW value for 24 bit
ASoC: fsl: clarify ac97 dependency
ASoC: Intel: Skylake: fix memory leak
ASoC: davinci-mcasp: Fix master capture only mode
ASoC: es8328: Fix shifts for mixer switches
ASoC: rt5645: Add dmi_system_id "Google Wizpig"
ASoC: sti: set player private data
ASoC: sti: rename ST proprietary DT properties
ASoC: sti: remove wrong error message
ASoC: Intel: Skylake: Add I2C depends for SKL machine
ASoC: topology: fix info callback for TLV byte control
ASoC: rt5670: fix wrong bit def for pll src
ASoC: nau8825: add pm function
ASoC: rt5645: Add struct dmi_system_id "Google Edgar" for Chrome OS
...
Linus Torvalds [Fri, 4 Dec 2015 16:59:10 +0000 (08:59 -0800)]
Merge tag 'pm+acpi-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix a recent regression in the ACPI PCI host bridge
initialization code, clean up some recent changes (generic power
domains framework, ACPI AML debugger support), fix three older but
annoying bugs (PCI power management. generic power domains framework,
cpufreq) and a build problem (device properties framework), and update
a stale MAINTAINERS entry (ACPI backlight driver).
Specifics:
- Fix a regression in the ACPI PCI host bridge initialization code
introduced by the recent consolidation of the host bridge handling
on x86 and ia64 that forgot to take one special piece of code
related to NUMA on x86 into account (Liu Jiang).
- Improve the Kconfig help description of the new ACPI AML debugger
support option to avoid possible confusion (Peter Zijlstra).
- Remove a piece of code in the generic power domains framework that
should have been removed by one of the recent commits modifying
that code (Ulf Hansson).
- Reduce the log level of a PCI PM message that generates a lot of
false-positive log noise for some drivers and improve the message
itself while at it (Imre Deak).
- Fix the OF-based domain lookup code in the generic power domains
framework to make it drop references to DT nodes correctly (Eric
Anholt).
- Prevent the cpufreq core from setting the policy back to the
default after a CPU offline/online cycle for cpufreq drivers
providing the ->setpolicy callback (Srinivas Pandruvada).
- Fix a build problem for CONFIG_ACPI unset in the device properties
framework (Hanjun Guo).
- Fix a stale file path in the ACPI backlight driver entry in
MAINTAINERS (Dan Carpenter)"
* tag 'pm+acpi-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
cpufreq: use last policy after online for drivers with ->setpolicy
PCI / PM: Tune down retryable runtime suspend error messages
PM / Domains: Validate cases of a non-bound driver in genpd governor
MAINTAINERS: ACPI / video: update a file name in drivers/acpi/
ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n
x86/PCI/ACPI: Fix regression caused by commit 4d6b4e69a245
ACPI: Better describe ACPI_DEBUGGER
Ard Biesheuvel [Thu, 3 Dec 2015 08:25:22 +0000 (09:25 +0100)]
ARM/arm64: KVM: correct PTE uncachedness check
Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's
uncachedness") modified the logic to test whether a HYP or stage-2
mapping needs flushing, from [incorrectly] interpreting the page table
attributes to [incorrectly] checking whether the PFN that backs the
mapping is covered by host system RAM. The PFN number is part of the
output of the translation, not the input, so we have to use pte_pfn()
on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
address or stage-2 intermediate physical address.
Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness") Cc: stable@vger.kernel.org Tested-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pavel Fedin [Fri, 4 Dec 2015 12:03:14 +0000 (15:03 +0300)]
arm64: KVM: Get rid of old vcpu_reg()
Using oldstyle vcpu_reg() accessor is proven to be inappropriate and
unsafe on ARM64. This patch converts the rest of use cases to new
accessors and completely removes vcpu_reg() on ARM64.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pavel Fedin [Fri, 4 Dec 2015 12:03:13 +0000 (15:03 +0300)]
arm64: KVM: Correctly handle zero register in system register accesses
System register accesses also use zero register for Rt == 31, and
therefore using it will also result in getting SP value instead. This
patch makes them also using new accessors, introduced by the previous
patch. Since register value is no longer directly associated with storage
inside vCPU context structure, we introduce a dedicated storage for it in
struct sys_reg_params.
This refactor also gets rid of "massive hack" in kvm_handle_cp_64().
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pavel Fedin [Fri, 4 Dec 2015 12:03:12 +0000 (15:03 +0300)]
arm64: KVM: Remove const from struct sys_reg_params
Further rework is going to introduce a dedicated storage for transfer
register value in struct sys_reg_params. Before doing this we have to
remove 'const' modifiers from it in all accessor functions and their
callers.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pavel Fedin [Fri, 4 Dec 2015 12:03:11 +0000 (15:03 +0300)]
arm64: KVM: Correctly handle zero register during MMIO
On ARM64 register index of 31 corresponds to both zero register and SP.
However, all memory access instructions, use ZR as transfer register. SP
is used only as a base register in indirect memory addressing, or by
register-register arithmetics, which cannot be trapped here.
Correct emulation is achieved by introducing new register accessor
functions, which can do special handling for reg_num == 31. These new
accessors intentionally do not rely on old vcpu_reg() on ARM64, because
it is to be removed. Since the affected code is shared by both ARM
flavours, implementations of these accessors are also added to ARM32 code.
This patch fixes setting MMIO register to a random value (actually SP)
instead of zero by something like:
*((volatile int *)reg) = 0;
compilers tend to generate "str wzr, [xx]" here
[Marc: Fixed 32bit splat]
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* pm-domains:
PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
PM / Domains: Validate cases of a non-bound driver in genpd governor
* pm-cpufreq:
cpufreq: use last policy after online for drivers with ->setpolicy
Eric Anholt [Tue, 1 Dec 2015 17:39:31 +0000 (09:39 -0800)]
PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
It looks like these meant to be unreffing the
of_parse_phandle_with_args() node, since the error paths above it
don't do of_node_put. That function returns a new ref in pd_args.np,
though, not a new ref on dev->of_node. Also, it would have leaked the
ref in the success case.
Fixes "ERROR: Bad of_node_put()" on bcm2835 in the -EPROBE_DEFER case.
Fixes: aa42240ab254 (PM / Domains: Add generic OF-based PM domain look-up) Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> Cc: 3.18+ <stable@vger.kernel.org> # 3.18+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Thu, 3 Dec 2015 23:45:16 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
Linus Torvalds [Thu, 3 Dec 2015 23:23:17 +0000 (15:23 -0800)]
Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"During the merge window I added a new file that is used to filter
trace events on pids. It filters all events where only tasks with
their pid in that file exists. It also handles the sched_switch and
sched_wakeup trace events where the current task does not have its pid
in the file, but the task either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both
of these tracepoints use the same class as the sched_wakeup
tracepoint, and they too should be included in what gets filtered by
the set_event_pid file"
* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
David S. Miller [Thu, 3 Dec 2015 20:56:22 +0000 (15:56 -0500)]
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Michael Chan [Wed, 2 Dec 2015 06:54:08 +0000 (01:54 -0500)]
bnxt_en: Setup uc_list mac filters after resetting the chip.
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:07 +0000 (01:54 -0500)]
bnxt_en: enforce proper storing of MAC address
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:06 +0000 (01:54 -0500)]
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: lpc_eth: remove irq > NR_IRQS check from probe()
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.
This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.
Fixes a runtime problem:
lpc-eth 31060000.ethernet: error getting resources.
lpc_eth: lpc-eth: not found (-6).
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 1 Dec 2015 17:33:36 +0000 (18:33 +0100)]
openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 1 Dec 2015 15:31:08 +0000 (16:31 +0100)]
ipv4: igmp: Allow removing groups from a removed interface
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group") Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 1 Dec 2015 15:20:07 +0000 (07:20 -0800)]
ipv6: sctp: implement sctp_v6_destroy_sock()
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 30 Nov 2015 22:24:07 +0000 (14:24 -0800)]
arm64: bpf: add 'store immediate' instruction
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:
Load immediate to register
Store register
Signed-off-by: Yang Shi <yang.shi@linaro.org> CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 3 Dec 2015 05:53:57 +0000 (21:53 -0800)]
ipv6: kill sk_dst_lock
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.
ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.
__ip6_dst_store() and ip6_dst_store() share same implementation.
sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()
Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
This is because ip6_dst_store() can be called from process context,
without any lock held.
As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()
Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/neighbour: fix crash at dumping device-agnostic proxy entries
Proxy entries could have null pointer to net-device.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2") Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.
This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: convert sack_needed and sack_generation to bits
They don't need to be any bigger than that and with this we start a new
bitfield for tracking association runtime stuff, like zero window
situation.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 30 Nov 2015 03:37:57 +0000 (19:37 -0800)]
ipv6: add complete rcu protection around np->opt
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid never succeeding kmalloc with order >= MAX_ORDER check that
elem->value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
so keep those kmalloc-s as-is.
Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.
Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 04:35:22 +0000 (23:35 -0500)]
Merge branch 'mvneta-fixes'
Marcin Wojtas says:
====================
Marvell Armada XP/370/38X Neta fixes
I'm sending v4 with corrected commit log of the last patch, in order to
avoid possible conflicts between the branches as suggested by Gregory
Clement.
Best regards,
Marcin Wojtas
Changes from v4:
* Correct commit log of patch 6/6
Changes from v2:
* Style fixes in patch updating mbus protection
* Remove redundant stable notifications except for patch 4/6
Changes from v1:
* update MBUS windows access protection register once, after whole loop
* add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
* add fixing error path for skb_build()
* add possibility of setting custom TX IP checksum limit in DT property
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:46 +0000 (13:27 +0100)]
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
The Ethernet controller found in the Armada 38x SoC's family support
TCP/IP checksumming with frame sizes larger than 1600 bytes, however
only on port 0.
This commit enables it by setting 'tx-csum-limit' to 9800B in
'ethernet@70000' node.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:45 +0000 (13:27 +0100)]
net: mvneta: enable setting custom TX IP checksum limit
Since Armada 38x SoC can support IP checksum for jumbo frames only on
a single port, it means that this feature should be enabled per-port,
rather than for the whole SoC.
This patch enables setting custom TX IP checksum limit by adding new
optional property to the mvneta device tree node. If not used, by
default 1600B is set for "marvell,armada-370-neta" and 9800B for other
strings, which ensures backward compatibility. Binding documentation
is updated accordingly.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:44 +0000 (13:27 +0100)]
net: mvneta: fix error path for building skb
In the actual RX processing, there is same error path for both descriptor
ring refilling and building skb fails. This is not correct, because after
successful refill, the ring is already updated with newly allocated
buffer. Then, in case of build_skb() fail, hitherto code left the original
buffer unmapped.
This patch fixes above situation by swapping error check of skb build with
DMA-unmap of original buffer.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Acked-by: Simon Guinot <simon.guinot@sequanux.org> Cc: <stable@vger.kernel.org> # v4.2+
Fixes a84e32894191 ("net: mvneta: fix refilling for Rx DMA buffers") Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:43 +0000 (13:27 +0100)]
net: mvneta: fix bit assignment for RX packet irq enable
A value originally defined in the driver was inappropriate. Even though
the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK
to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16]
are reserved and read-only.
This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with
the controller's documentation.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network
unit") Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:42 +0000 (13:27 +0100)]
net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG
MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer
allocation was mistakenly set as BIT(1). This commit fixes the assignment.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network
unit") Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Mon, 30 Nov 2015 12:27:41 +0000 (13:27 +0100)]
net: mvneta: add configuration for MBUS windows access protection
This commit adds missing configuration of MBUS windows access protection
in mvneta_conf_mbus_windows function - a dedicated variable for that
purpose remained there unused since v3.8 initial mvneta support. Because
of that the register contents were inherited from the bootloader.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network
unit") Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 3 Dec 2015 00:45:56 +0000 (16:45 -0800)]
Merge tag 'spi-fix-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's one fix for the core here, we weren't reinitialising the
actual transferred length in messages when they get reused which meant
that we'd just keep adding to the length if a message is reused. This
has limited impact since it's only used in error handling cases but
will really mess anything that tries to use it up when it triggers.
As ever there's a small collection of driver specific fixes too"
* tag 'spi-fix-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bugfix: spi_message.transfer_length does not get reset
spi: pl022: handle EPROBE_DEFER for dma
spi: bcm63xx: use correct format string for printing a resource
spi: mediatek: single device does not require cs_gpios
spi: Add missing kerneldoc description for parameter
cpufreq: use last policy after online for drivers with ->setpolicy
For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.
For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the last change here, I neglected to update the cookie in one code
path: when a mgmt-tx has no real cookie sent to userspace as it doesn't
wait for a response, but is off-channel. The original code used the SKB
pointer as the cookie and always assigned the cookie to the TX SKB in
ieee80211_start_roc_work(), but my change turned this around and made
the code rely on a valid cookie being passed in.
Unfortunately, the off-channel no-wait TX path wasn't assigning one at
all, resulting in an uninitialized stack value being used. This wasn't
handed back to userspace as a cookie (since in the no-wait case there
isn't a cookie), but it was tested for non-zero to distinguish between
mgmt-tx and off-channel.
Fix this by assigning a dummy non-zero cookie unconditionally, and get
rid of a misleading comment and some dead code while at it. I'll clean
up the ACK SKB handling separately later.
Fixes: 3b79af973cf4 ("mac80211: stop using pointers as userspace cookies") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
DFS channels should not be actively scanned as we can't be sure
if we are allowed or not.
If the current channel is in the DFS band, active scan might be
performed after CSA, but we have no guarantee about other channels,
therefore it is safer to prevent active scanning at all.
Cc: stable@vger.kernel.org Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Eliad Peller [Tue, 17 Nov 2015 08:24:40 +0000 (10:24 +0200)]
mac80211: don't teardown sdata on sdata stop
Interfaces are being initialized (setup) on addition,
and torn down on removal.
However, p2p device is being torn down when stopped,
resulting in the next p2p start operation being done
on uninitialized interface.
Solve it by calling ieee80211_teardown_sdata() only
on interface removal (for the non-netdev case).
Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[squashed in fix to call teardown after unregister] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch series contains fixes for various issues observed
with BGX and NIC drivers.
Changes from v1:
- Fixed comment syle in the first patch of the series
- Removed 'Increase transmit queue length' patch from the series,
will recheck if it's a driver or system issue and resubmit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Wed, 2 Dec 2015 10:06:17 +0000 (15:36 +0530)]
net: thunderx: Enable BGX LMAC's RX/TX only after VF is up
Enable or disable BGX LMAC's RX/TX based on corresponding VF's
status. If otherwise, when multiple LMAC's physical link is up
then packets from all LMAC's whose corresponding VF is not yet
initialized will get forwarded to VF0. This is due to VNIC's default
configuration where CPI, RSSI e.t.c point to VF0/QSET0/RQ0.
This patch will prevent multiple copies of packets on VF0.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Wed, 2 Dec 2015 10:06:16 +0000 (15:36 +0530)]
net: thunderx: Switchon carrier only upon interface link up
Call netif_carrier_on() only if interface's link is up. Switching this on
upon IFF_UP by default, is causing issues with ethernet channel bonding
in LACP mode. Initial NETDEV_CHANGE notification was being skipped.
Also fixed some issues with link/speed/duplex reporting via ethtool.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>