There is a disagreement between drivers/tty/vt/keyboard.c and
drivers/input/input-leds.c with regard to what is a Scroll Lock LED
trigger name: input calls it "kbd-scrolllock", but vt calls it
"kbd-scrollock" (two l's).
This prevents Scroll Lock LED trigger from binding to this LED by default.
Since it is a scroLL Lock LED, this interface was introduced only about a
year ago and in an Internet search people seem to reference this trigger
only to set it to this LED let's simply rename it to "kbd-scrolllock".
Also, it looks like this was supposed to be changed before this code was
merged: https://lkml.org/lkml/2015/6/9/697 but it was done only on
the input side.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):
The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():
while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.
Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pm_runtime_autosuspend can take synchronous or asynchronous
paths, Because we are calling pm_runtime_mark_last_busy just before
this most of the cases it takes the asynchronous way. However,
when the FW or driver resets during already running runtime suspend,
the call will result in calling to the driver's rpm callback and results
in a deadlock on device_lock.
The simplest fix is to replace pm_runtime_autosuspend with
asynchronous pm_request_autosuspend.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ast_get_dram_info() configures a window in order to access BMC memory.
A BMC register can be configured to disallow this, and if so, causes
an infinite loop in the ast driver which renders the system unusable.
Fix this by erroring out if an error is detected. On powerpc systems with
EEH, this leads to the device being fenced and the system continuing to
operate.
If vBIOS noFan bit is set, the fan table parameters in thermal controller
will not get initialized. The driver should avoid to use these uninitialized
parameter to do calculation. Otherwise, it may trigger divide 0 error.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hook up drm_compat_ioctl to support 32-bit userspace on 64-bit kernels.
It turns out that N2600 and N2800 comes with 64-bit enabled. We
previously assumed there where no such systems out there.
This avoids an issue that occurs when we're attempting to preempt multiple
channels simultaneously. HW seems to ignore preempt requests while it's
still processing a previous one, which, well, makes sense.
Fixes random "fifo: SCHED_ERROR 0d []" + GPCCS page faults during parallel
piglit runs on (at least) GM107.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TTM was changed a while back to allow for pipelining of buffer moves, and
part of this was the removal of waiting for a BO to idle before calling
move(), placing the responsibility on the driver to do this if required.
That's all well and good, except, we make use of move_notify() to handle
mapping/unmapping from the GPU VMM as move() isn't called on all paths.
This commit adds a wait before unmapping from a VMM in move_notify(), to
prevent GPU page faults where a buffer is still being accessed.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Look for firmware files using the legacy ("nouveau/nvxx_fucxxxx") path
if they cannot be found in the new, "official" path. User setups were
broken by the switch, which is bad.
There are only 4 firmware files we may want to look up that way, so
hardcode them into the lookup function. All new firmware files should
use the standard "nvidia/<chip>/gr/" path.
Fixes: 8539b37acef7 ("drm/nouveau/gr: use NVIDIA-provided external firmwares") Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GUI idle interrupts should be enabled only after we
have enabled coarse grain clock gating (CGCG). This
prevents GFX engine generating idle interrupt even
though CGCG is not completely enabled.
Most of the time this goes un-noticed, but on some
Stoney ASICs this results in GFX engine hang after
system resumes from suspend. The issue is not
particular to Stoney though and could have occured
on any ASIC. The patch fixes this issue.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Sunil Uttarwar <Sunil.Uttarwar1@amd.com> Signed-off-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We were storing viewport relative coordinates. However, crtc_cursor_set2
and cursor_reset pass amdgpu_crtc->cursor_x/y as the x/y parameters of
cursor_move_locked, which would break if the CRTC isn't located at
(0, 0).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The HP Pavilion dv6 has a non-working acpi_video0 backlight interface
and an intel_backlight interface which works fine. Add a force_native
quirk for it so that the non-working acpi_video0 interface does not get
registered.
Note that there are quite a few HP Pavilion dv6 variants, some
woth ATI and some with NVIDIA hybrid gfx, both seem to need this
quirk to have working backlight control. There are also some versions
with only Intel integrated gfx, these may not need this quirk, but it
should not hurt there.
The Dell XPS 17 L702X has a non-working acpi_video0 backlight interface
and an intel_backlight interface which works fine. Add a force_native
quirk for it so that the non-working acpi_video0 interface does not get
registered.
Note that there also is an issue with the brightnesskeys on this laptop,
they do not generate key-press events in anyway. That is not solved by
this patch.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1123661 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0557344e2149 ("staging: comedi: ni_mio_common: fix local var for
32-bit read") changed the type of local variable `d` from `unsigned
short` to `unsigned int` to fix a bug introduced in
commit 9c340ac934db ("staging: comedi: ni_stc.h: add read/write
callbacks to struct ni_private") when reading AI data for NI PCI-6110
and PCI-6111 cards. Unfortunately, other parts of the function rely on
the variable being `unsigned short` when an offset value in local
variable `signbits` is added to `d` before writing the value to the
`data` array:
d += signbits;
data[n] = d;
The `signbits` variable will be non-zero in bipolar mode, and is used to
convert the hardware's 2's complement, 16-bit numbers to Comedi's
straight binary sample format (with 0 representing the most negative
voltage). This breaks because `d` is now 32 bits wide instead of 16
bits wide, so after the addition of `signbits`, `data[n]` ends up being
set to values above 65536 for negative voltages. This affects all
supported "E series" cards except PCI-6143 (and PXI-6143). Fix it by
ANDing the value written to the `data[n]` with the mask 0xffff.
Fixes: 0557344e2149 ("staging: comedi: ni_mio_common: fix local var for 32-bit read") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For NI M Series cards, the Comedi `insn_read` handler for the AI
subdevice is broken due to ANDing the value read from the AI FIFO data
register with an incorrect mask. The incorrect mask clears all but the
most significant bit of the sample data. It should preserve all the
sample data bits. Correct it.
Fixes: 817144ae7fda ("staging: comedi: ni_mio_common: remove unnecessary use of 'board->adbits'") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Simmons reports:
> The ldlm_pool field pl_recalc_time is set to the current
> monotonic clock value but the interval period is calculated
> with the wall clock. This means the interval period will
> always be far larger than the pl_recalc_period, which is
> just a small interval time period. The correct thing to
> do is to use monotomic clock current value instead of the
> wall clocks value when calculating recalc_interval_sec.
This broke when I converted the 32-bit get_seconds() into
ktime_get_{real_,}seconds() inconsistently. Either
one of those two would have worked, but mixing them
does not.
Staying with the original intention of the patch, this
changes the ktime_get_seconds() calls into ktime_get_real_seconds(),
using real time instead of mononic time.
Fixes: 8f83409cf238 ("staging/lustre: use 64-bit time for pl_recalc") Reported-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have been having a lot of unexplainable crashes in osc_lru_shrink
lately that I could not see a good explanation for and then I found
this patch that slip under the radar somehow that incorrectly
converted while loop for lru list iteration into
list_for_each_entry_safe totally ignoring that in the body of
the loop we drop spinlocks guarding this list and move list entries
around.
Not sure why it was not showing up right away, perhaps some of the
more recent LRU changes committed caused some extra pressure on this
code that finally highlighted the breakage.
"kernel BUG at drivers/hv/channel_mgmt.c:350!" is observed when hv_vmbus
module is unloaded. BUG_ON() was introduced in commit 85d9aa705184
("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") as
vmbus_free_channels() codepath was apparently forgotten.
Fixes: 85d9aa705184 ("Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In docutils 0.13, the return type of get_column_widths method of the
Table directive has changed [1], which breaks our flat-table directive
and leads to a TypeError when trying to build the docs [2].
This patch adds support for the new return type, while keeping support
for older docutils versions too.
In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space. It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.
For example:
/sys/class/hwmon/hwmon0/temp1_crit:50000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
/sys/class/thermal/thermal_zone0/trip_point_0_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
/sys/class/thermal/thermal_zone0/trip_point_3_type:critical
Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided. However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().
bcm2835_pll_divider_off() is resetting the divider field in the A2W reg
to zero when disabling the clock.
Make sure we preserve this value by reading the previous a2w_reg value
first and ORing the result with A2W_PLL_CHANNEL_DISABLE.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks") Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the VDD_GPU regulator (a GPIO-enabled PWM regulator) to the Jetson
TX1 board. This addition allows the GPU to be used provided the
bootloader properly enabled the GPU node.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
[as pointed out by Thierry on IRC, nobody has reported a bug
in the field, but using a new bootloader with a .dtb that
has the incorrect data, it will crash on boot] Fixes: 336f79c7b6d7 ("arm64: tegra: Add NVIDIA Jetson TX1 Developer Kit support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The GPIO chardev is used for management tasks (allocating line and event
handles) and does neither support read() nor write() operations. Hence it
does not make much sense to allow seek operations.
Currently the chardev uses noop_llseek() for its seek implementation. This
function does not move the pointer and simply returns the current position
(always 0 for the GPIO chardev). noop_llseek() is primarily meant for
devices that can not support seek, but where there might be a user that
depends on the seek() operation succeeding. For newly added devices that
can not support seek operations it is recommended to use no_llseek(), which
will return an error. For more information see commit 6038f373a3dc
("llseek: automatically add .llseek fop").
Unfortunately this was overlooked when the GPIO chardev ABI was introduced.
But it is highly unlikely that since then userspace applications have
appeared that rely on being able to perform non-failing seek operations on
a GPIO chardev file descriptor. So it should be safe to change from
noop_llseel() to no_seek(). Also use nonseekable_open() in the chardev
open() callback to clear the FMODE_SEEK, FMODE_PREAD and FMODE_PWRITE flags
from the file. Neither of these should be set on a file that does not
support seek operations.
Fixes: 3c702e9987e2 ("gpio: add a userspace chardev ABI for GPIOs") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43db289d00c6 ("gpio: stmpe: Rework registers access")
reworked the STMPE register access so as to use
[STMPE_IDX_*_LSB + i] to access the 8bit register for a
certain bank, assuming the CSB and MSB will follow after
the enumerator. For this to work the index needs to go from
(size-1) to 0 not 0 to (size-1).
However for the GPIO IRQ handler, the status registers we read
register MSB + 3 bytes ahead for the 24 bit GPIOs and index
registers from MSB upwards and run an index i over the
registers UNLESS we are STMPE1600.
This is not working when we get to clearing the interrupt
EDGE status register STMPE_IDX_GPEDR_[LCM]SB: it is indexed
like all other registers [STMPE_IDX_*_LSB + i] but in this
loop we index from 0 to get the right bank index for the
calculations, and we need to just add i to the MSB.
Before this, interrupts on the STMPE2401 were broken, this
patch fixes it so it works again.
The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:
Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:
- Time jumps backwards
- __iter_div_u64_rem() which is used in one of the calling code pathes
will take forever to piecewise calculate the seconds/nanoseconds part.
This has been reported by several people with different scenarios:
David observed that when stopping a VM with a debugger:
"It was essentially the stopped by debugger case. I forget exactly why,
but the guest was being explicitly stopped from outside, it wasn't just
scheduling lag. I think it was something in the vicinity of 10 minutes
stopped."
When lifting the stop the machine went dead.
The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.
But this was also observed on a live system by Liav:
"When the OS is too overloaded, delta will get a high enough value for the
msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
after the shift the nsec variable will gain a value similar to
0xffffffffff000000."
Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").
Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:
s64 nsec;
- nsec = (d * m + n) >> s:
+ nsec = d * m + n;
+ nsec >>= s;
d being type of cycle_t adds another level of obfuscation.
This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.
There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.
Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.
Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.
Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.
This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.
It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.
I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.
Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") Reported-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Laurent Vivier <lvivier@redhat.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mmc_read_ssr() function results in DMA to the raw_ssr member of
struct mmc_card, which is not guaranteed to be cache line aligned & thus
might not meet the requirements set out in Documentation/DMA-API.txt:
Warnings: Memory coherency operates at a granularity called the cache
line width. In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one (to prevent two separately mapped
regions from sharing a single cache line). Since the cache line size
may not be known at compile time, the API will not enforce this
requirement. Therefore, it is recommended that driver writers who
don't take special care to determine the cache line size at run time
only map virtual regions that begin and end on page boundaries (which
are guaranteed also to be cache line boundaries).
On some systems where DMA is non-coherent this can lead to us losing
data that shares cache lines with the raw_ssr array.
Fix this by kmalloc'ing a temporary buffer to perform DMA into. kmalloc
will ensure the buffer is suitably aligned, allowing the DMA to be
performed without any loss of data.
Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 5275a652d296 ("mmc: sd: Export SD Status via “ssr” device attribute") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The regulator has never been properly enabled, it has been
dormant all the time. It's strange that MMC was working
at all, but it likely worked by the signals going through
the levelshifter and reaching the card anyways.
Fixes: 3615a34ea1a6 ("regulator: add STw481x VMMC driver") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clearing the tuning bits should reset the tuning circuit. However there is
more to do. Reset the command and data lines for good measure, and then
for eMMC ensure the card is not still trying to process a tuning command by
sending a stop command.
Note the JEDEC eMMC specification says the stop command (CMD12) can be used
to stop a tuning command (CMD21) whereas the SD specification is silent on
the subject with respect to the SD tuning command (CMD19). Considering that
CMD12 is not a valid SDIO command, the stop command is sent only when the
tuning command is CMD21 i.e. for eMMC. That addresses cases seen so far
which have been on eMMC.
Note that this replaces the commit fe5fb2e3b58f ("mmc: sdhci: Reset cmd and
data circuits after tuning failure") which is being reverted for v4.9+.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Dan O'Donovan <dan@emutex.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb
where possible") the driver uses rcu_read_lock() && rcu_read_unlock(), yet on
returning early in ath_tx_edma_tasklet() the unlock is missing leading to stalls
and suspicious RCU usage:
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Fixes: d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb where possible") Acked-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The active_high LED of my Wistron DNMA-92 is still being recognized as
active_low on 4.7.6 mainline. When I was preparing my former commit 0f9edcdd88a9 ("ath9k: Fix LED polarity for some Mini PCI AR9220 MB92
cards.") to fix that I must have somehow messed up with testing, because
I tested the final version of that patch before sending it, and it was
apparently working; but now it is not working on 4.7.6 mainline.
I initially added the PCI_DEVICE_SUB section for 0x0029/0x2096 above the
PCI_VDEVICE section for 0x0029; but then I moved the former below the
latter after seeing how 0x002A sections were sorted in the file.
This turned out to be wrong: if a generic PCI_VDEVICE entry (that has
both subvendor and subdevice IDs set to PCI_ANY_ID) is put before a more
specific one (PCI_DEVICE_SUB), then the generic PCI_VDEVICE entry will
match first and will be used.
With this patch, 0x0029/0x2096 has finally got active_high LED on 4.7.6.
While I'm at it, let's fix 0x002A too by also moving its generic definition
below its specific ones.
Fixes: 0f9edcdd88a9 ("ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.") Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
[kvalo@qca.qualcomm.com: improve the commit log based on email discussions] Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b2d70d4944c1 ("ath9k: make GPIO API to support both of WMAC and
SOC") refactored ath9k_hw_gpio_get() to support both WMAC and SOC GPIOs,
changing the return on success from 1 to BIT(gpio). This broke some callers
like ath_is_rfkill_set(). This doesn't fix any known bug in mainline at the
moment, but should be fixed anyway.
Instead of fixing all callers, change ath9k_hw_gpio_get() back to only
return 0 or 1.
Fixes: b2d70d4944c1 ("ath9k: make GPIO API to support both of WMAC and SOC") Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
[kvalo@qca.qualcomm.com: mention that doesn't fix any known bug] Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When mac80211 abandons an association attempt, it may free
all the data structures, but inform cfg80211 and userspace
about it only by sending the deauth frame it received, in
which case cfg80211 has no link to the BSS struct that was
used and will not cfg80211_unhold_bss() it.
Fix this by providing a way to inform cfg80211 of this with
the BSS entry passed, so that it can clean up properly, and
use this ability in the appropriate places in mac80211.
This isn't ideal: some code is more or less duplicated and
tracing is missing. However, it's a fairly small change and
it's thus easier to backport - cleanups can come later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The H2C MEDIA_STATUS_RPT command for some reason causes 8192eu and
8723bu devices not being able to reconnect.
Reported-by: Barry Day <briselec@gmail.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'perf report --tui' exits with error when it finds a sample of zero
length symbol (i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.
Reported-and-Tested-by: Anton Blanchard <anton@samba.org> Link: https://lkml.org/lkml/2016/10/8/189 Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Riyder <chris.ryder@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An earlier patch allowed enabling PT and LBR at the same
time on Goldmont. However it also allowed enabling BTS and LBR
at the same time, which is still not supported. Fix this by
bypassing the check only for PT.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.shishkin@intel.com Cc: kan.liang@intel.com Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit a5ffbe0a1993 ("rtlwifi: Fix scheduling while atomic bug") and
commit a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter()
to use work queue"), an error was introduced in the power-save routines
due to the fact that leaving PS was delayed by the use of a work queue.
This problem is fixed by detecting if the enter or leave routines are
in interrupt mode. If so, the workqueue is used to place the request.
If in normal mode, the enter or leave routines are called directly.
Fixes: a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue") Reported-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During firmware crash (or) user requested manual restart
the system gets into a soft lock up state because of the
below root cause.
During user requested hardware restart / firmware crash
the system goes into a soft lockup state as 'napi_synchronize'
is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
bit) and it sleeps into infinite loop as it waits for
'NAPI_STATE_SCHED' to be cleared. This condition is hit because
'ath10k_hif_stop' is called twice as below (resulting in calling
'napi_synchronize' after 'napi_disable')
Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
as it makes more sense before informing mac80211 to restart h/w
Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
Fixes: 3c97f5de1f28 ("ath10k: implement NAPI support") Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When there is a CRC error in the SPROM read from the device, the code
attempts to handle a fallback SPROM. When this also fails, the driver
returns zero rather than an error code.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds
modversion support for symbols exported from asm files. Architectures
must include C-style declarations for those symbols in asm/asm-prototypes.h
in order for them to be versioned.
Add these declarations for x86, and an architecture-independent file that
can be used for common symbols.
With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we
produce a scary warning on x86, this commit fixes that.
Signed-off-by: Adam Borowski <kilobyte@angband.pl> Tested-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both Debian and kernel archs are "arm64" but UTS_MACHINE and gcc say
"aarch64". Recognizing just the latter should be enough but let's
accept both in case something regresses again or an user sets
UTS_MACHINE=arm64.
Regressed in cfa88c7: arm64: Set UTS_MACHINE in the Makefile.
Signed-off-by: Adam Borowski <kilobyte@angband.pl> Acked-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xlog_recover_clear_agi_bucket didn't set the
type to XFS_BLFT_AGI_BUF, so we got a warning during log
replay (or an ASSERT on a debug build).
XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
Fix this, as was done in f19b872b for 2 other locations
with the same problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:
As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate. When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.
This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.
Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.
Thanks to dchinner for looking at this with me and spotting the
root cause.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function xen_guest_init is using __alloc_percpu with an alignment
which are not power of two.
However, the percpu allocator never supported alignments which are not power
of two and has always behaved incorectly in thise case.
Commit 3ca45a4 "percpu: ensure requested alignment is power of two"
introduced a check which trigger a warning [1] when booting linux-next
on Xen. But in reality this bug was always present.
This can be fixed by replacing the call to __alloc_percpu with
alloc_percpu. The latter will use an alignment which are a power of two.
Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
NUMA balancing") set VM_IO flag to prevent grant maps from being
subjected to NUMA balancing.
It was discovered recently that this flag causes get_user_pages() to
always fail with -EFAULT.
We've got a delay loop waiting for secondary CPUs. That loop uses
loops_per_jiffy. However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures. It is quite
common to see things like this in the boot log:
Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=24000)
In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb> prompt was written.
Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout. We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.
[akpm@linux-foundation.org: simplifications, per Daniel] Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Brian Norris <briannorris@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a off-by-one in the "watchdog: qcom: add option for
standalone watchdog not in timer block" patch that causes the
following panic on boot:
Systemd on reboot enables shutdown watchdog that leaves the watchdog
device open to ensure that even if power down process get stuck the
platform reboots nonetheless.
The iamt_wdt is an alarm-only watchdog and can't reboot system, but the
FW will generate an alarm event reboot was completed in time, as the
watchdog is not automatically disabled during power cycle.
So we should request stop watchdog on reboot to eliminate wrong alarm
from the FW.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ.
Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP
pointing to the code interrupted by IRQ which was interrupted by NMI.
NULL isn't a problem: in this case watchdog calls dump_stack() and
prints full stack trace including NMI. But if we're stuck in IRQ
handler then NMI watchlog will print stack trace without IRQ part at
all.
This patch uses registers snapshot passed into NMI handler as arguments:
these registers point exactly to the instruction interrupted by NMI.
Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the current code it is possible to lock a mutex twice when
a subsequent reconnects are triggered. On the 1st reconnect we
reconnect sessions and tcons and then persistent file handles.
If the 2nd reconnect happens during the reconnecting of persistent
file handles then the following sequence of calls is observed:
So, we are trying to acquire the same cfile->fh_mutex twice which
is wrong. Fix this by moving reconnecting of persistent handles to
the delayed work (smb2_reconnect_server) and submitting this work
every time we reconnect tcon in SMB2 commands handling codepath.
This can also lead to corruption of a temporary file list in
cifs_reopen_persistent_file_handles() because we can recursively
call this function twice.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can not unlock/lock cifs_tcp_ses_lock while walking through ses
and tcon lists because it can corrupt list iterator pointers and
a tcon structure can be released if we don't hold an extra reference.
Fix it by moving a reconnect process to a separate delayed work
and acquiring a reference to every tcon that needs to be reconnected.
Also do not send an echo request on newly established connections.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
smbencrypt() points a scatterlist to the stack, which is breaks if
CONFIG_VMAP_STACK=y.
Fix it by switching to crypto_cipher_encrypt_one(). The new code
should be considerably faster as an added benefit.
This code is nearly identical to some code that Eric Biggers
suggested.
Reported-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).
If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').
Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ecbfb9f118 ("dm raid: add raid level takeover support") moved the
configure_discard_support() call from raid_ctr() to raid_preresume().
Enabling/disabling discard _must_ happen during table load (through the
.ctr hook). Fix this regression by moving the
configure_discard_support() call back to raid_ctr().
It is required to hold the queue lock when calling blk_run_queue_async()
to avoid that a race between blk_run_queue_async() and
blk_cleanup_queue() is triggered.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In crypt_set_key(), if a failure occurs while replacing the old key
(e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
set. Otherwise, the crypto layer would have an invalid key that still
has DM_CRYPT_KEY_VALID flag set.
Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix to return error code -EINVAL instead of 0, as is done elsewhere in
this function.
Fixes: e80d1c805a3b ("dm: do not override error code returned from dm_get_device()") Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When dm_table_set_type() is used by a target to establish a DM table's
type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the
DM core must go on to verify that the devices in the table are
compatible with the established type.
An earlier DM multipath table could have been build ontop of underlying
devices that were all using blk-mq. In that case, if that active
multipath table is replaced with an empty DM multipath table (that
reflects all paths have failed) then it is important that the
'all_blk_mq' state of the active table is transfered to the new empty DM
table. Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a
request that isn't needed by the DM multipath target when it is to issue
IO to an underlying blk-mq device.
Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.
Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joonyoung Shim reported an interesting problem on his ARM octa-core
Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator()
was failing for a struct device for which dev_pm_opp_set_regulator() is
called earlier.
This happened because an earlier call to
dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file)
removed all the entries from opp_table->dev_list apart from the last CPU
device in the cpumask of CPUs sharing the OPP.
But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator()
routines get CPU device for the first CPU in the cpumask. And so the OPP
core failed to find the OPP table for the struct device.
This patch attempts to fix this problem by returning a pointer to the
opp_table from dev_pm_opp_set_regulator() and using that as the
parameter to dev_pm_opp_put_regulator(). This ensures that the
dev_pm_opp_put_regulator() doesn't fail to find the opp table.
Note that similar design problem also exists with other
dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and
so we don't need to update them for now.
Reported-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ Viresh: Wrote commit log and tested on exynos 5250 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ep->mult is supposed to be set to Isochronous and
Interrupt Endapoint's multiplier value. This value
is computed from different places depending on the
link speed.
If we're dealing with HighSpeed, then it's part of
bits [12:11] of wMaxPacketSize. This case wasn't
taken into consideration before.
While at that, also make sure the ep->mult defaults
to one so drivers can use it unconditionally and
assume they'll never multiply ep->maxpacket to zero.
Cc: <stable@vger.kernel.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc:
defer debugging checks of pages allocated from the PCP") will allow the
per-cpu list counter to be out of sync with the per-cpu list contents if
a struct page is corrupted.
The consequence is an infinite loop if the per-cpu lists get fully
drained by free_pcppages_bulk because all the lists are empty but the
count is positive. The infinite loop occurs here
do {
batch_free++;
if (++migratetype == MIGRATE_PCPTYPES)
migratetype = 0;
list = &pcp->lists[migratetype];
} while (list_empty(list));
What the user sees is a bad page warning followed by a soft lockup with
interrupts disabled in free_pcppages_bulk().
This patch keeps the accounting in sync.
Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Our system uses significantly more slab memory with memcg enabled with
the latest kernel. With 3.10 kernel, slab uses 2G memory, while with
4.6 kernel, 6G memory is used. The shrinker has problem. Let's see we
have two memcg for one shrinker. In do_shrink_slab:
1. Check cg1. nr_deferred = 0, assume total_scan = 700. batch size
is 1024, then no memory is freed. nr_deferred = 700
2. Check cg2. nr_deferred = 700. Assume freeable = 20, then
total_scan = 10 or 40. Let's assume it's 10. No memory is freed.
nr_deferred = 10.
The deferred share of cg1 is lost in this case. kswapd will free no
memory even run above steps again and again.
The fix makes sure one memcg's deferred share isn't lost.
Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.
The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.
By changing that, we cannot rely on ns->dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.
In the last ilen case, i was already increased, resulting in accessing out-
of-boundary entry of do_replace and blkaddr.
Fix to check ilen first to exit the loop.
Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The struct file_operations instance serving the f2fs/status debugfs file
lacks an initialization of its ->owner.
This means that although that file might have been opened, the f2fs module
can still get removed. Any further operation on that opened file, releasing
included, will cause accesses to unmapped memory.
Indeed, Mike Marshall reported the following:
BUG: unable to handle kernel paging request at ffffffffa0307430
IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
<...>
Call Trace:
[] __fput+0xdf/0x1d0
[] ____fput+0xe/0x10
[] task_work_run+0x8e/0xc0
[] do_exit+0x2ae/0xae0
[] ? __audit_syscall_entry+0xae/0x100
[] ? syscall_trace_enter+0x1ca/0x310
[] do_group_exit+0x44/0xc0
[] SyS_exit_group+0x14/0x20
[] do_syscall_64+0x61/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
<...>
---[ end trace f22ae883fa3ea6b8 ]---
Fixing recursive fault but reboot is needed!
Fix this by initializing the f2fs/status file_operations' ->owner with
THIS_MODULE.
This will allow debugfs to grab a reference to the f2fs module upon any
open on that file, thus preventing it from getting removed.
Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently data journalling is incompatible with encryption: enabling both
at the same time has never been supported by design, and would result in
unpredictable behavior. However, users are not precluded from turning on
both features simultaneously. This change programmatically replaces data
journaling for encrypted regular files with ordered data journaling mode.
Background:
Journaling encrypted data has not been supported because it operates on
buffer heads of the page in the page cache. Namely, when the commit
happens, which could be up to five seconds after caching, the commit
thread uses the buffer heads attached to the page to copy the contents of
the page to the journal. With encryption, it would have been required to
keep the bounce buffer with ciphertext for up to the aforementioned five
seconds, since the page cache can only hold plaintext and could not be
used for journaling. Alternatively, it would be required to setup the
journal to initiate a callback at the commit time to perform deferred
encryption - in this case, not only would the data have to be written
twice, but it would also have to be encrypted twice. This level of
complexity was not justified for a mode that in practice is very rarely
used because of the overhead from the data journalling.
Solution:
If data=journaled has been set as a mount option for a filesystem, or if
journaling is enabled on a regular file, do not perform journaling if the
file is also encrypted, instead fall back to the data=ordered mode for the
file.
Rationale:
The intent is to allow seamless and proper filesystem operation when
journaling and encryption have both been enabled, and have these two
conflicting features gracefully resolved by the filesystem.
Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit "ext4: sanity check the block and cluster size at mount
time" should prevent any problems, but in case the superblock is
modified while the file system is mounted, add an extra safety check
to make sure we won't overrun the allocated buffer.
Fix a large number of problems with how we handle mount options in the
superblock. For one, if the string in the superblock is long enough
that it is not null terminated, we could run off the end of the string
and try to interpret superblocks fields as characters. It's unlikely
this will cause a security problem, but it could result in an invalid
parse. Also, parse_options is destructive to the string, so in some
cases if there is a comma-separated string, it would be modified in
the superblock. (Fortunately it only happens on file systems with a
1k block size.)
The number of 'counters' elements needed in 'struct sg' is
super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
elements in the array. This is insufficient for block sizes >= 32k. In
such cases the memcpy operation performed in ext4_mb_seq_groups_show()
would cause stack memory corruption.
Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
'border' variable is set to a value of 2 times the block size of the
underlying filesystem. With 64k block size, the resulting value won't
fit into a 16-bit variable. Hence this commit changes the data type of
'border' to 'unsigned int'.