Javier González [Fri, 6 May 2016 18:03:21 +0000 (20:03 +0200)]
lightnvm: reserved space calculation incorrect
The nvm_dev->max_pages_per_blk variable was removed in favor of the new
nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
used in rrpc_capacity, reporting the reserved capacity to zero. Replace
with ->sec_per_blk to calculate the reserved area again.
Signed-off-by: Javier González <javier@cnexlabs.com>
Updated patch description. Was "lightnvm: eliminate redundant variable" Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Javier González [Fri, 6 May 2016 18:03:20 +0000 (20:03 +0200)]
lightnvm: rename nr_pages to nr_ppas on nvm_rq
The number of ppas contained on a request is not necessarily the number
of pages that it maps to neither on the target nor on the device side.
In order to avoid confusion, rename nr_pages to nr_ppas since it is what
the variable actually contains.
Matias Bjørling [Fri, 6 May 2016 18:03:19 +0000 (20:03 +0200)]
lightnvm: add is_cached entry to struct ppa_addr
A target requires a method to identify PPAs that are either cached in
memory or on disk. This can efficiently be maintained within the PPA.
The target host-side translation table can then lookup a PPA and know
from the PPA if it is cached or on disk. In the case it is cached, it is
the responsibility of the target to maintain this cache.
Matias Bjørling [Fri, 6 May 2016 18:03:18 +0000 (20:03 +0200)]
lightnvm: expose gennvm_mark_blk to targets
Targets can update a block state when having a reference to an
in-memory virtual block. In the case that a target does not keep the
block metadata in memory, it does not have a way to update this
structure.
Therefore, expose gennvm_mark_blk() through the media managers
->mark_blk() callback and let targets update the state structure through
this callback.
Matias Bjørling [Fri, 6 May 2016 18:03:17 +0000 (20:03 +0200)]
lightnvm: remove mgt targets on mgt removal
Targets associated with a device manager are not freed on device
removal. They have to be manually removed before shutdown. Make sure
any outstanding targets are freed upon shutdown.
Arnd Bergmann [Fri, 6 May 2016 18:03:16 +0000 (20:03 +0200)]
lightnvm: pass dma address to hardware rather than pointer
A recent change to lightnvm added code to pass a kernel pointer
to the hardware, which gcc complained about:
drivers/nvme/host/lightnvm.c: In function 'nvme_nvm_rqtocmd':
drivers/nvme/host/lightnvm.c:472:32: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
c->ph_rw.metadata = cpu_to_le64(rqd->meta_list);
It looks like this has no way of working anyway, so this changes
the code to pass the dma_address instead. This was most likely
what was intended here. Neither of the two are currently ever
written to, so the effect is the same for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: a34b1eb78e21 ("lightnvm: enable metadata to be sent to device") Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Javier González [Fri, 6 May 2016 18:03:15 +0000 (20:03 +0200)]
lightnvm: do not assume sequential lun alloc.
When doing GC, rrpc calculates the physical LUN to which the rrpc block
belongs too. This calculation is based on the assumption that LUNs are
assigned sequentially to the LUN list. Use the reference to the LUN
instead. This saves us the calculation and allows us to align LUNs in a
different manner to, for example, take advantage of devide parallelism.
Javier González [Fri, 6 May 2016 18:03:13 +0000 (20:03 +0200)]
lightnvm: rename dma helper functions
Until now, the dma pool have been exclusively used to allocate the ppa
list being sent to the device. In pblk (upcoming), we use these pools to
allocate metadata too. Thus, we generalize the names of some variables
on the dma helper functions to make the code more readable.
Javier González [Fri, 6 May 2016 18:03:12 +0000 (20:03 +0200)]
lightnvm: enable metadata to be sent to device
Enable metadata buffer to be sent to the device through the metadata
field on the physical rw nvme command. The size of the metadata buffer
must follow dev->oob_size * # of PPAs.
Matias Bjørling [Fri, 6 May 2016 18:03:10 +0000 (20:03 +0200)]
lightnvm: fix out of bound ppa lun id on bb tbl
The ppa configured for retrieving the bad block table uses the internal
lun id to setup the get bad block ppa. This increases monotonically
with the number luns available. When configuring a ppa, the channel and
lun must be specified separately, leading to an out of bound memory
access in gennvm_block_bb when lun id goes beyond the luns available
within a channel.
Additional, remove out of bound check in gennvm_block_bb(), as it was a
buggy to begin with.
Matias Bjørling [Fri, 6 May 2016 18:03:09 +0000 (20:03 +0200)]
lightnvm: refactor set_bb_tbl for accepting ppa list
The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
nr_pages internally. Instead, make these two variables explicit.
This allows a user to call it without initializing a struct nvm_rq
first.
Matias Bjørling [Fri, 6 May 2016 18:03:08 +0000 (20:03 +0200)]
lightnvm: move responsibility for bad blk mgmt to target
We move the responsibility of managing the persistent bad block table to
the target. The target may choose to mark a block bad or retry writing
to it. Never the less, it should be the target that makes the decision
and not the media manager.
Matias Bjørling [Fri, 6 May 2016 18:03:07 +0000 (20:03 +0200)]
lightnvm: make nvm_set_rqd_ppalist() aware of vblks
A virtual block enables a block to identify multiple physical blocks.
This is useful for metadata where a device media supports multiple
planes. In that case, a block, with multiple planes can be managed
as a single vblk. Reducing the metadata required by one forth.
nvm_set_rqd_ppalist() takes care of expanding a ppa_list with vblks
automatically. However, for some use-cases, where only a single physical
block is required, the ppa_list should not be expanded.
Therefore, add a vblk parameter to nvm_set_rqd_ppalist(), and only
expand the ppa_list if vblk is set.
Matias Bjørling [Fri, 6 May 2016 18:03:05 +0000 (20:03 +0200)]
lightnvm: refactor device ops->get_bb_tbl()
The device ops->get_bb_tbl() takes a callback, that allows the caller
to use its own callback function to update its data structures in the
returning function.
This makes it difficult to send parameters to the callback, and usually
is circumvented by small private structures, that both carry the callers
state and any flags needed to fulfill the update.
Refactor ops->get_bb_tbl() to fill a data buffer with the status of the
blocks returned, and let the user call the callback function manually.
That will provide the necessary flags and data structures and simplify
the logic around ops->get_bb_tbl().
Matias Bjørling [Fri, 6 May 2016 18:03:04 +0000 (20:03 +0200)]
lightnvm: introduce nvm_for_each_lun_ppa() macro
Users that wish to iterate all luns on a device. Must create a
struct ppa_addr and separate iterators for channels and luns. To set the
iterators, two loops are required, one to iterate channels, and another
to iterate luns. This leads to decrease in readability.
Introduce nvm_for_each_lun_ppa, which implements the nested loop and
sets ppa, channel, and lun variable for each loop body, eliminating
the boilerplate code.
lightnvm: refactor dev->online_target to global nvm_targets
A target name must be unique. However, a per-device registration of
targets is maintained on a dev->online_targets list, with a per-device
search for targets upon registration.
This results in a name collision when two targets, with the same name,
are created on two different targets, where the per-device list is not
shared.
Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
The functions nvm_register_target(), nvm_unregister_target() and
associated list refers to a target type that is being registered by a
target type module. Rename nvm_*_targets() to nvm_*_tgt_type(), so that
the intension is clear.
This enables target instances to use the _nvm_*_targets() naming.
Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Wenwei Tao [Fri, 6 May 2016 18:03:01 +0000 (20:03 +0200)]
lightnvm: store rrpc->soffset in device sector size
Since we mainly use soffset in device sector size, we therefore store
this value in rrpc->soffset, instead of the offset in 512byte sector
size. This eliminates the "(ilog2(dev->sec_size) - 9)" calculation on
each I/O.
Wenwei Tao [Fri, 6 May 2016 18:03:00 +0000 (20:03 +0200)]
lightnvm: calculate rrpc total blocks and sectors up front
Calculate rrpc total blocks and sectors up front, make sense
to use them. For example, we use rrpc->nr_sects to calculate rrpc
area size, but it makes no sense if we don't initialize it up front,
since it would be zero until we finish rrpc luns init.
Matias Bjørling [Fri, 6 May 2016 18:02:58 +0000 (20:02 +0200)]
lightnvm: move block fold outside of get_bb_tbl()
The get block table command returns a list of blocks and planes
with their associated state. Users, such as gennvm and sysblk,
manages all planes as a single virtual block.
It was therefore natural to fold the bad block list before it is
returned. However, to allow users, which manages on a per-plane
block level, to also use the interface, the get_bb_tbl interface is
changed to not fold by default and instead let the caller fold if
necessary.
Matias Bjørling [Fri, 6 May 2016 18:02:57 +0000 (20:02 +0200)]
lightnvm: add fpg_size and pfpg_size to struct nvm_dev
The flash page size (fpg) and size across planes (pfpg) are convenient
to know when allocating buffer sizes. This has previously been a
calculated in various places. Replace with the pre-calculated values.
Matias Bjørling [Fri, 6 May 2016 18:02:56 +0000 (20:02 +0200)]
lightnvm: implement nvm_submit_ppa_list
The nvm_submit_ppa function assumes that users manage all plane
blocks as a single block. Extend the API with nvm_submit_ppa_list
to allow the user to send its own ppa list. If the user submits more
than a single PPA, the user must take care to allocate and free
the corresponding ppa list.
Matias Bjørling [Fri, 6 May 2016 18:02:55 +0000 (20:02 +0200)]
lightnvm: handle submit_io failure
The device ->submit_io() callback might fail to submit I/O to device.
In that case, the nvm_submit_ppa function should not wait for
completion. Instead return the ->submit_io() error.
Keith Busch [Fri, 8 Apr 2016 22:11:02 +0000 (16:11 -0600)]
NVMe: Fix reset/remove race
This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.
A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:
modprobe nvme && modprobe -r nvme
To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.
The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.
[Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43] Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Mon, 25 Apr 2016 21:20:19 +0000 (14:20 -0700)]
nvme: fix nvme_ns_remove() deadlock
On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.
All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().
Reported-by: Sunad Bhandary S <sunad.s@samsung.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimerg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Mon, 25 Apr 2016 21:20:18 +0000 (14:20 -0700)]
nvme: switch to RCU freeing the namespace
Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimerg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Mon, 25 Apr 2016 21:33:20 +0000 (14:33 -0700)]
nvme: add helper nvme_cleanup_cmd()
This hides command cleanup into nvme.h and fabrics drivers will
also use it.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Move the scan work item and surrounding code to the common code. For now
we need a new finish_scan method to allow the PCI driver to set the
irq affinity hints, but I have plans in the works to obsolete this as well.
Note that this moves the namespace scanning from nvme_wq to the system
workqueue, but as we don't rely on namespace scanning to finish from reset
or I/O this should be fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
nvme: tighten up state check for namespace scanning
We only should be scanning namespaces if the controller is live. Currently
we call the function just before setting it live, so fix the code up to
move the call to nvme_queue_scan to just below the state change.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Wed, 27 Apr 2016 21:51:18 +0000 (15:51 -0600)]
NVMe: Fix check_flush_dependency warning
If the controller fails and is degraded after a reset, we need to kill
off all requests queues before removing the inaccessble namespaces. This
will prevent del_gendisk from syncing dirty data, which we can't due
from a WQ_MEM_RECLAIM work queue.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Controller IDs in NVMe are unsigned 16-bit types. In the Fabrics driver we
actually pass ctrl->id by reference, so we need it to have the correct type.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Jeff Moyer [Tue, 26 Apr 2016 01:12:38 +0000 (19:12 -0600)]
skd: remove broken discard support
Simply creating a file system on an skd device, followed by mount and
fstrim will result in errors in the logs and then a BUG(). Let's remove
discard support from that driver. As far as I can tell, it hasn't
worked right since it was merged. This patch also has a side-effect of
cleaning up an unintentional shadowed declaration inside of
skd_end_request.
I tested to ensure that I can still do I/O to the device using xfstests
./check -g quick. I didn't do anything more extensive than that,
though.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
nvme: Avoid reset work on watchdog timer function during error recovery
This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.
If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).
In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.
Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
block: add ability to flag write back caching on a device
Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.
This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Keith Busch [Tue, 12 Apr 2016 17:13:11 +0000 (11:13 -0600)]
NVMe: Skip async events for degraded controllers
If the controller is degraded, the driver should stay out of the way so
the user can recover the drive. This patch skips driver initiated async
event requests when the drive is in this state.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Tue, 12 Apr 2016 19:10:14 +0000 (13:10 -0600)]
nvme: add helper nvme_setup_cmd()
This moves nvme_setup_{flush,discard,rw} calls into a common
nvme_setup_cmd() helper. So we can eventually hide all the command
setup in the core module and don't even need to update the fabrics
drivers for any specific command type.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lin [Tue, 5 Apr 2016 17:32:04 +0000 (10:32 -0700)]
nvme: add missing lock nesting notation
When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
that sends nvme_admin_delete_cq command to admin sq. So when the command
completed, the lock acquired by nvme_irq() actually belongs to admin queue.
While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
So it will not deadlock.
This patch adds lock nesting notation to fix following report.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Keith Busch [Fri, 8 Apr 2016 22:09:10 +0000 (16:09 -0600)]
NVMe: Always use MSI/MSI-x interrupts
Multiple users have reported device initialization failure due the driver
not receiving legacy PCI interrupts. This is not unique to any particular
controller, but has been observed on multiple platforms.
There have been no issues reported or observed when with message signaled
interrupts, so this patch attempts to use MSI-x during initialization,
falling back to MSI. If that fails, legacy would become the default.
The setup_io_queues error handling had to change as a result: the admin
queue's msix_entry used to be initialized to the legacy IRQ. The case
where nr_io_queues is 0 would fail request_irq when setting up the admin
queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
the admin queue's msix_entry invalid. Instead, return success immediately.
Reported-by: Tim Muhlemmer <muhlemmer@gmail.com> Reported-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Howard Cochran [Thu, 10 Mar 2016 06:12:39 +0000 (01:12 -0500)]
writeback: Fix performance regression in wb_over_bg_thresh()
Commit 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.
This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.
For example, in a system having 1GB of RAM, a single spinning disk, and
a "pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.
Because FUSE uses strictlimit and has a default max_ratio of only 1%,
in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a
background writeback worker. But the worker tests the wrong threshold
(200 instead of 100), so it does not initiate writeback and just
returns.
Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero
throughput through the FUSE BDI.
The fix is to call the parameterized variant of wb_calc_thresh, so that
the worker will do writeback if the bg_thresh is exceeded which was the
bahavior before the referenced commit.
Fixes: 947e9762a8dd ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
Merge tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.6 rc3:
MMC host:
- sdhci: Fix regression setting power on Trats2 board
- sdhci-pci: Add support and PCI IDs for more Broxton host controllers"
* tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers
mmc: sdhci: Fix regression setting power on Trats2 board
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some bugfixes from I2C:
- fix a uevent triggered boot problem by removing a useless debug
print
- fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow
standard kernel behaviour
- fix a potential division-by-zero error (needed two takes)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: jz4780: really prevent potential division by zero
Revert "i2c: jz4780: prevent potential division by zero"
i2c: jz4780: prevent potential division by zero
i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes
i2c: mux: demux-pinctrl: Clean up sysfs attributes
i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE
It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.
You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.
I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.
So my track record is not good enough, and the stars will have to align
better before that one gets committed. And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.
IOW, let's just revert the commit that caused the problem for now.
Merge branch 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Since commit 0de798584bde ("parisc: Use generic extable search and
sort routines") module loading is boken on parisc, because the parisc
module loader wasn't prepared for the new R_PARISC_PCREL32 relocations.
In addition, due to that breakage, Mikulas Patocka noticed that
handling exceptions from modules probably never worked on parisc. It
was just masked by the fact that exceptions from modules don't happen
during normal use.
This patch series fixes those issues and survives the tests of the
lib/test_user_copy kernel module test. Some patches are tagged for
stable"
* 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Update comment regarding relative extable support
parisc: Unbreak handling exceptions from kernel modules
parisc: Fix kernel crash with reversed copy_from_user()
parisc: Avoid function pointers for kernel exception routines
parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
Merge tag 'gpio-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here is a set of four GPIO fixes. The two fixes to the core are
serious as they are regressing minor architectures.
Core fixes:
- Defer GPIO device setup until after gpiolib is initialized.
It turns out that a few very tightly integrated GPIO platform
drivers initialize so early (befor core_initcall()) so that the
gpiolib isn't even initialized itself. That limits what the
library can do, and we cannot reference uninitialized fields until
later.
Defer some of the initialization until right after the gpiolib is
initialized in these (rare) cases.
- As a consequence: do not use devm_* resources when allocating the
states in the initial set-up of the gpiochip.
Driver fixes:
- In ACPI retrieveal: ignore GpioInt when looking for output GPIOs.
- Fix legacy builds on the PXA without a backing pin controller.
- Use correct datatype on pca953x register writes"
* tag 'gpio-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: Use correct u16 value for register word write
gpiolib: Defer gpio device setup until after gpiolib initialization
gpiolib: Do not use devm functions when registering gpio chip
gpio: pxa: fix legacy non pinctrl aware builds
gpio / ACPI: ignore GpioInt() GPIOs when requesting GPIO_OUT_*
Merge tag 'usb-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 4.6-rc3.
Nothing major, the normal USB gadget fixes and usb-serial driver ids,
along with some other fixes mixed in. All except the USB serial ids
have been tested in linux-next, the id additions should be fine as
they are 'trivial'"
* tag 'usb-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
USB: option: add "D-Link DWM-221 B1" device id
USB: serial: cp210x: Adding GE Healthcare Device ID
USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices
usb: dwc3: keystone: drop dma_mask configuration
usb: gadget: udc-core: remove manual dma configuration
usb: dwc3: pci: add ID for one more Intel Broxton platform
usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done()
usb: dwc2: do not override forced dr_mode in gadget setup
usb: gadget: f_midi: unlock on error
USB: digi_acceleport: do sanity checking for the number of ports
USB: cypress_m8: add endpoint sanity check
USB: mct_u232: add sanity checking in probe
usb: fix regression in SuperSpeed endpoint descriptor parsing
USB: usbip: fix potential out-of-bounds write
usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer
usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler()
usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize
usb: phy: qcom-8x16: fix regulator API abuse
usb: ch9: Fix SSP Device Cap wFunctionalitySupport type
usb: gadget: composite: Access SSP Dev Cap fields properly
...
Merge tag 'staging-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here are some IIO driver fixes, along with two staging driver fixes
for 4.6-rc3.
One staging driver patch reverts the deletion of a driver that
happened in 4.6-rc1. We thought that laptop.org was dead, but it's
still alive and kicking, and has users that were mad we broke their
hardware by deleting a driver for their machines. So that driver is
added back and everyone is happy again.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert "Staging: olpc_dcon: Remove obsolete driver"
staging/rdma/hfi1: select CRC32
iio: gyro: bmg160: fix buffer read values
iio: gyro: bmg160: fix endianness when reading axes
iio: accel: bmc150: fix endianness when reading axes
iio: st_magn: always define ST_MAGN_TRIGGER_SET_STATE
iio: fix config watermark initial value
iio: health: max30100: correct FIFO check condition
iio: imu: Fix inv_mpu6050 dependencies
iio: adc: Fix build error of missing devm_ioremap_resource on UM
iio: light: apds9960: correct FIFO check condition
iio: adc: max1363: correct reference voltage
iio: adc: max1363: add missing adc to max1363_id
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of eight fixes.
Two are trivial gcc-6 updates (brace additions and unused variable
removal). There's a couple of cxlflash regressions, a correction for
sd being overly chatty on revalidation (causing excess log increases).
A VPD issue which could crash USB devices because they seem very
intolerant to VPD inquiries, an ALUA deadlock fix and a mpt3sas buffer
overrun fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Do not attach VPD to devices that don't support it
sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
scsi_dh_alua: Fix a recently introduced deadlock
scsi: Declare local symbols static
cxlflash: Move to exponential back-off when cmd_room is not available
cxlflash: Fix regression issue with re-ordering patch
mpt3sas: Don't overreach ioc->reply_post[] during initialization
aacraid: add missing curly braces
- fix error handling (Guoqing)
- fix a crash when a disk is hotremoved (me)
- fix a dead loop (Wei Fang)"
* tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: clear bitmap if bitmap_create failed
MD: add rdev reference for super write
md: fix a trivial typo in comments
md:raid1: fix a dead loop when read from a WriteMostly disk
Merge tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).
Specifics:
- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"
* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...
1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
from Haishuang Yan.
2) Fix multicast frame handling in mac80211 AP code, from Felix
Fietkau.
3) mac80211 station hashtable insert errors not handled properly, fix
from Johannes Berg.
4) Fix TX descriptor count limit handling in e1000, from Alexander
Duyck.
5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.
6) Must assign rtnl_link_ops of the device before registering it, fix
in ip6_tunnel from Thadeu Lima de Souza Cascardo.
7) Memory leak fix in tc action net exit, from WANG Cong.
8) Add missing AF_KCM entries to name tables, from Dexuan Cui.
9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
Duyck.
10) Fix memory allocation alignment and congestion map corruption in
RDS, from Shamir Rabinovitch.
11) Fix default qdisc regression in tuntap driver, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge, netem: mark mailing lists as moderated
tuntap: restore default qdisc
mpls: find_outdev: check for err ptr in addition to NULL check
ipv6: Count in extension headers in skb->network_header
RDS: fix congestion map corruption for PAGE_SIZE > 4k
RDS: memory allocated must be align to 8
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
net: add the AF_KCM entries to family name tables
MAINTAINERS: intel-wired-lan list is moderated
lib/test_bpf: Add additional BPF_ADD tests
lib/test_bpf: Add test to check for result of 32-bit add that overflows
lib/test_bpf: Add tests for unsigned BPF_JGT
lib/test_bpf: Fix JMP_JSET tests
VSOCK: Detach QP check should filter out non matching QPs.
stmmac: fix adjust link call in case of a switch is attached
af_packet: tone down the Tx-ring unsupported spew.
net_sched: fix a memory leak in tc action
samples/bpf: Enable powerpc support
samples/bpf: Use llc in PATH, rather than a hardcoded value
samples/bpf: Fix build breakage with map_perf_test_user.c
...
Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are bug fixes, including a really old fsync bug, and a few trace
points to help us track down problems in the quota code"
* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file/data loss caused by fsync after rename and new inode
btrfs: Reset IO error counters before start of device replacing
btrfs: Add qgroup tracing
Btrfs: don't use src fd for printk
btrfs: fallback to vmalloc in btrfs_compare_tree
btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
btrfs: Output more info for enospc_debug mount option
Btrfs: fix invalid reference in replace_path
Btrfs: Improve FL_KEEP_SIZE handling in fallocate
Merge tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fixes from Mike Marshall:
"Orangefs cleanups and a strncpy vulnerability fix.
Cleanups:
- remove an unused variable from orangefs_readdir.
- clean up printk wrapper used for ofs "gossip" debugging.
- clean up truncate ctime and mtime setting in inode.c
- remove a useless null check found by coccinelle.
- optimize some memcpy/memset boilerplate code.
- remove some useless sanity checks from xattr.c
Fix:
- fix a potential strncpy vulnerability"
* tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: remove unused variable
orangefs: Add KERN_<LEVEL> to gossip_<level> macros
orangefs: strncpy -> strscpy
orangefs: clean up truncate ctime and mtime setting
Orangefs: fix ifnullfree.cocci warnings
Orangefs: optimize boilerplate code.
Orangefs: xattr.c cleanup
Merge tag 'iommu-fixes-v4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- compile-time fixes (warnings and failures)
- a bug in iommu core code which could cause the group->domain pointer
to be falsly cleared
- fix in scatterlist handling of the ARM common DMA-API code
- stall detection fix for the Rockchip IOMMU driver
* tag 'iommu-fixes-v4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Silence an uninitialized variable warning
iommu/rockchip: Fix "is stall active" check
iommu: Don't overwrite domain pointer when there is no default_domain
iommu/dma: Restore scatterlist offsets correctly
iommu: provide of_xlate pointer unconditionally