Dave Chinner [Thu, 27 Jun 2013 06:04:56 +0000 (16:04 +1000)]
xfs: Use inode create transaction
Replace the use of buffer based logging of inode initialisation,
uses the new logical form to describe the range to be initialised
in recovery. We continue to "log" the inode buffers to push them
into the AIL and ensure that the inode create transaction is not
removed from the log before the inode buffers are written to disk.
Update the transaction identifier and reservations to match the
changed implementation.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:55 +0000 (16:04 +1000)]
xfs: Inode create item recovery
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.
Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:54 +0000 (16:04 +1000)]
xfs: Inode create transaction reservations
Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:53 +0000 (16:04 +1000)]
xfs: Inode create log items
Introduce the inode create log item type for logical inode create logging.
Instead of logging the changes in buffers, pass the range to be
initialised through the log by a new transaction type. This reduces
the amount of log space required to record initialisation during
allocation from about 128 bytes per inode to a small fixed amount
per inode extent to be initialised.
This requires a new log item type to track it through the log
and the AIL. This is a relatively simple item - most callbacks are
noops as this item has the same life cycle as the transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:52 +0000 (16:04 +1000)]
xfs: Introduce an ordered buffer item
If we have a buffer that we have modified but we do not wish to
physically log in a transaction (e.g. we've logged a logical
change), we still need to ensure that transactional integrity is
maintained. Hence we must not move the tail of the log past the
transaction that the buffer is associated with before the buffer is
written to disk.
This means these special buffers still need to be included in the
transaction and added to the AIL just like a normal buffer, but we
do not want the modifications to the buffer written into the
transaction. IOWs, what we want is an "ordered buffer" that
maintains the same transactional life cycle as a physically logged
buffer, just without the transcribing of the modifications to the
log.
Hence we need to flag the buffer as an "ordered buffer" to avoid
including it in vector size calculations or formatting during the
transaction. Once the transaction is committed, the buffer appears
for all intents to be the same as a physically logged buffer as it
transitions through the log and AIL.
Relogging will also work just fine for such an ordered buffer - the
logical transaction will be replayed before the subsequent
modifications that relog the buffer, so everything will be
reconstructed correctly by recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:51 +0000 (16:04 +1000)]
xfs: Introduce ordered log vector support
And "ordered log vector" is a log vector that is used for
tracking a log item through the CIL and into the AIL as part of the
log checkpointing. These ordered log vectors are special in that
they are not written to to journal in any way, and are not accounted
to the checkpoint being written.
The reason for this behaviour is to allow operations to attach items
to transactions and have them follow the normal transactional
lifecycle without actually having to write them to the journal. This
allows logging of items that track high level logical changes and
writing them to the log, while the physical items being modified
pass through into the AIL and pin the tail of the log (and therefore
the logical item in the log) until all the modified items are
physically written to disk.
IOWs, it allows us to write metadata without physically logging
every individual change but still maintain the full transactional
integrity guarantees we currently have w.r.t. crash recovery.
This change modifies some of the CIL item insertion loops, as
ordered log vectors introduce some new constraints as they don't
track any data. One advantage of this change is that it combines
two log vector chain walks into a single pass, so there is less
overhead in the transaction commit pass as well. It also kills some
unused code in the log vector walk loop when committing the CIL.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:50 +0000 (16:04 +1000)]
xfs: xfs_ifree doesn't need to modify the inode buffer
Long ago, bulkstat used to read inodes directly from the backing
buffer for speed. This had the unfortunate problem of being cache
incoherent with unlinks, and so xfs_ifree() had to mark the inode
as free directly in the backing buffer. bulkstat was changed some
time ago to use inode cache coherent lookups, and so will never see
unlinked inodes in it's lookups. Hence xfs_ifree() does not need to
touch the inode backing buffer anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:49 +0000 (16:04 +1000)]
xfs: don't do IO when creating an new inode
When we are allocating a new inode, we read the inode cluster off
disk to increment the generation number. We are already using a
random generation number for newly allocated inodes, so if we are not
using the ikeep mode, we can just generate a new generation number
when we initialise the newly allocated inode.
This avoids the need for reading the inode buffer during inode
creation. This will speed up allocation of inodes in cold, partially
allocated clusters as they will no longer need to be read from disk
during allocation. It will also reduce the CPU overhead of inode
allocation by not having the process the buffer read, even on cache
hits.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:48 +0000 (16:04 +1000)]
xfs: don't use speculative prealloc for small files
Dedicated small file workloads have been seeing significant free
space fragmentation causing premature inode allocation failure
when large inode sizes are in use. A particular test case showed
that a workload that runs to a real ENOSPC on 256 byte inodes would
fail inode allocation with ENOSPC about about 80% full with 512 byte
inodes, and at about 50% full with 1024 byte inodes.
The same workload, when run with -o allocsize=4096 on 1024 byte
inodes would run to being 100% full before giving ENOSPC. That is,
no freespace fragmentation at all.
The issue was caused by the specific IO pattern the application had
- the framework it was using did not support direct IO, and so it
was emulating it by using fadvise(DONT_NEED). The result was that
the data was getting written back before the speculative prealloc
had been trimmed from memory by the close(), and so small single
block files were being allocated with 2 blocks, and then having one
truncated away. The result was lots of small 4k free space extents,
and hence each new 8k allocation would take another 8k from
contiguous free space and turn it into 4k of allocated space and 4k
of free space.
Hence inode allocation, which requires contiguous, aligned
allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
(1024 byte inodes) can fail to find sufficiently large freespace and
hence fail while there is still lots of free space available.
There's a simple fix for this, and one that has precendence in the
allocator code already - don't do speculative allocation unless the
size of the file is larger than a certain size. In this case, that
size is the minimum default preallocation size:
mp->m_writeio_blocks. And to keep with the concept of being nice to
people when the files are still relatively small, cap the prealloc
to mp->m_writeio_blocks until the file goes over a stripe unit is
size, at which point we'll fall back to the current behaviour based
on the last extent size.
This will effectively turn off speculative prealloc for very small
files, keep preallocation low for small files, and behave as it
currently does for any file larger than a stripe unit. This
completely avoids the freespace fragmentation problem this
particular IO pattern was causing.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:47 +0000 (16:04 +1000)]
xfs: plug directory buffer readahead
Similar to bulkstat inode chunk readahead, we need to plug directory
data buffer readahead during getdents to ensure that we can merge
adjacent readahead requests and sort out of order requests optimally
before they are dispatched. This improves the readahead efficiency
and reduces the IO load it generates as the IO patterns are
significantly better for both contiguous and fragmented directories.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 27 Jun 2013 06:04:46 +0000 (16:04 +1000)]
xfs: add pluging for bulkstat readahead
I was running some tests on bulkstat on CRC enabled filesystems when
I noticed that all the IO being issued was 8k in size, regardless of
the fact taht we are issuing sequential 8k buffers for inodes
clusters. The IO size should be 16k for 256 byte inodes, and 32k for
512 byte inodes, but this wasn't happening.
blktrace showed that there was an explict plug and unplug happening
around each readahead IO from _xfs_buf_ioapply, and the unplug was
causing the IO to be issued immediately. Hence no opportunity was
being given to the elevator to merge adjacent readahead requests and
dispatch them as a single IO.
Add plugging around the inode chunk readahead dispatch loop in
bulkstat to ensure that we don't unplug the queue between adjacent
inode buffer readahead IOs and so we get fewer, larger IO requests
hitting the storage subsystem for bulkstat.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Bob Peterson [Thu, 27 Jun 2013 16:47:51 +0000 (12:47 -0400)]
GFS2: Reserve journal space for quota change in do_grow
If a GFS2 file system is mounted with quotas and a file is grown
in such a way that its free blocks for the allocation are represented
in a secondary bitmap, GFS2 ran out of blocks in the transaction.
That resulted in "fatal: assertion "tr->tr_num_buf <= tr->tr_blocks".
This patch reserves extra blocks for the quota change so the
transaction has enough space.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Kevin Cernekee [Tue, 18 Jun 2013 08:34:31 +0000 (08:34 +0000)]
MIPS: BCM63xx: Add SMP support to prom.c
This involves two changes to the BSP code:
1) register_smp_ops() for BMIPS SMP
2) The CPU1 boot vector on some of the BCM63xx platforms conflicts with
the special interrupt vector (IV). Move it to 0x8000_0380 at boot time,
to resolve the conflict.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
[jogo@openwrt.org: moved SMP ops registration into ifdef guard,
changed ifdef guards to if (IS_ENABLED())] Signed-off-by: Jonas Gorski <jogo@openwrt.org> Cc: linux-mips@linux-mips.org Cc: John Crispin <blogic@openwrt.org> Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/5489/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Fri, 31 May 2013 13:07:44 +0000 (13:07 +0000)]
MIPS: define write{b,w,l,q}_relaxed
MIPS does define read{b,w,l,q}_relaxed but does not define their write
counterparts: write{b,w,l,q}_relaxed. This patch adds the missing
definitions for the write*_relaxed I/O accessors.
The GENERIC_PCI_IOMAP does not depend on CONFIG_PCI so move
it to the CONFIG_MIPS symbol so it's always selected for MIPS.
This fixes the missing pci_iomap declaration for MIPS.
Moreover, the pci_iounmap function was not defined in the
io.h header file if the CONFIG_PCI symbol is not set,
but it should since MIPS is not using CONFIG_GENERIC_IOMAP.
This fixes the following problem on a allyesconfig:
drivers/net/ethernet/3com/3c59x.c:1031:2: error: implicit declaration of
function 'pci_iomap' [-Werror=implicit-function-declaration]
drivers/net/ethernet/3com/3c59x.c:1044:3: error: implicit declaration of
function 'pci_iounmap' [-Werror=implicit-function-declaration]
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5478/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
James Hogan [Mon, 24 Jun 2013 10:12:38 +0000 (11:12 +0100)]
metag: don't check for cache aliasing on smp cpu boot
The cache configuration of the boot cpu is now duplicated to secondary
cpus, so there's no need to check for cache aliasing again when a
secondary cpu is booted. Therefore remove the check from
secondary_start_kernel().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
James Hogan [Mon, 24 Jun 2013 10:05:19 +0000 (11:05 +0100)]
metag: panic if cache aliasing possible
If the cache and page size configuration allows for cache aliasing to
occur we warn on boot, but the log messages are easy to miss and will
result is random crashes occuring in userland. Let's panic too in this
case so that the user immediately knows they need to fix the cache
configuration or configured page size.
Also fix the warning messages which display the cache and page sizes to
include newlines, and add the word "Potential" since an actual cache
alias hasn't been detected.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
James Hogan [Fri, 31 May 2013 15:12:54 +0000 (16:12 +0100)]
metag: *.dts: include using preprocessor
Include *.dtsi files from *.dts using the preprocessor to set a good
example for future device tree files. Files included in the old way
don't get pre-processed.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: devicetree-discuss@lists.ozlabs.org
James Hogan [Fri, 31 May 2013 15:17:24 +0000 (16:17 +0100)]
metag: add <dt-bindings/> symlink
Add symlink to include/dt-bindings from arch/metag/boot/dts/include/ to
match the one in arch/arm/... (see the commit below) so that
preprocessed device tree files can include various useful constant
definitions.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Grant Likely <grant.likely@linaro.org> Reviewed-by: Stephen Warren <swarren@nvidia.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Rob Herring <rob.herring@calxeda.com> Cc: linux-kbuild@vger.kernel.org Cc: devicetree-discuss@lists.ozlabs.org
Leonid Yegoshin [Thu, 20 Jun 2013 14:36:42 +0000 (14:36 +0000)]
MIPS: Malta: Update GCMP detection.
Add GCMP detection for IASim Marvell chip emulation support.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/5529/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Leonid Yegoshin [Thu, 20 Jun 2013 14:36:30 +0000 (14:36 +0000)]
Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
This reverts commit 3f4579252aa166641861a64f1c2883365ca126c2. It is
invalid because the macros CAC_ADDR and UNCAC_ADDR have a kernel
virtual address as an argument and also returns a kernel virtual
address. Using and physical address PHYS_OFFSET is blatantly wrong
for a macro common to multiple platforms.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/5528/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Markos Chandras [Mon, 17 Jun 2013 13:00:40 +0000 (13:00 +0000)]
SSB: Kconfig: Amend SSB_EMBEDDED dependencies
SSB_EMBEDDED needs functions from driver_pcicore which are only
available if SSD_DRIVER_HOSTMODE is selected so make it
depend on that symbol.
Fixes the following linking problem:
drivers/ssb/embedded.c:202:
undefined reference to `ssb_pcicore_plat_dev_init'
drivers/built-in.o: In function `ssb_pcibios_map_irq':
drivers/ssb/embedded.c:247:
undefined reference to `ssb_pcicore_pcibios_map_irq'
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: sibyte-users@bitmover.com Cc: netdev@vger.kernel.org Cc: Michael Buesch <m@bues.ch> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5484/ Acked-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Wed, 5 Jun 2013 21:25:17 +0000 (21:25 +0000)]
MIPS: microMIPS: Fix improper definition of ISA exception bit.
The ISA exception bit selects whether exceptions are taken in classic
or microMIPS mode. This bit is Config3.ISAOnExc and was improperly
defined as bits 16 and 17 instead of just bit 16. A new function was
added so that platforms could set this bit when running a kernel
compiled with only microMIPS instructions.
Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5377/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Fri, 24 May 2013 20:54:09 +0000 (20:54 +0000)]
MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
In mm_isBranchInstr() we can short circuit the entire function if
!cpu_has_mmips.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5326/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Fri, 24 May 2013 20:54:08 +0000 (20:54 +0000)]
MIPS: Declare emulate_load_store_microMIPS as a static function.
It is only used from within a single file, it should not be globally
visible.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5325/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Gortmaker [Mon, 24 Jun 2013 19:30:15 +0000 (15:30 -0400)]
arc: delete __cpuinit usage from all arc files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/arc uses of the __cpuinit macros from
all C files. Currently arc does not have any __CPUINIT used in
assembly files.
Vineet Gupta [Mon, 17 Jun 2013 09:03:15 +0000 (14:33 +0530)]
ARC: [tlb-miss] Fix bug with CONFIG_ARC_DBG_TLB_MISS_COUNT
LOAD_FAULT_PTE macro is expected to set r2 with faulting vaddr.
However in case of CONFIG_ARC_DBG_TLB_MISS_COUNT, it was getting
clobbered with statistics collection code.
Fix latter by using a different register.
Note that only I-TLB Miss handler was potentially affected.
Vineet Gupta [Mon, 17 Jun 2013 06:05:15 +0000 (11:35 +0530)]
ARC: [tlb-miss] Extraneous PTE bit testing/setting
* No need to check for READ access in I-TLB Miss handler
* Redundant PAGE_PRESENT update in PTE
Post TLB entry installation, in updating PTE for software accessed/dity
bits, no need to update PAGE_PRESENT since it will already be set.
Infact the entry won't have installed if !PAGE_PRESENT.
Vineet Gupta [Sat, 25 May 2013 08:33:25 +0000 (14:03 +0530)]
ARC: Adjustments for gcc 4.8
* DWARF unwinder related
+ Force DWARF2 compliant .debug_frame (gcc 4.8 defaults to DWARF4
which kernel unwinder can't grok).
+ Discard the additional .eh_frame generated
+ Discard the dwarf4 debug info generated by -gdwarf-2 for normal
no debug case
* 4.8 already uses arc600 multilibs for -mno-mpy
* switch to using uclibc compiler (to get -mmedium-calls and -mno-sdata)
and also since buildroot can only use 1 toolchain
Florian Fainelli [Wed, 26 Jun 2013 18:11:56 +0000 (18:11 +0000)]
MIPS: BMIPS: support booting from physical CPU other than 0
BMIPS43xx CPUs have two hardware threads, and on some SoCs such as 3368,
the bootloader has configured the system to boot from TP1 instead of the
more usual TP0. Create the physical to logical CPU mapping to cope with
that, do not remap the software interrupts to be cross CPUs such that we
do not have to do use the logical CPU mapping further down the code, and
finally, reset the slave TP1 only if booted from TP0.
David Daney [Fri, 24 May 2013 20:54:10 +0000 (20:54 +0000)]
MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
As Jonas Gorske said in his patch:
Disable cpu_has_mmips for everything but SEAD3 and MALTA. Most of
these platforms are from before the micromips introduction, so they
are very unlikely to implement it.
Reduces an -Os compiled, uncompressed kernel image by 8KiB for
BCM63XX.
This patch taks a different approach than his, we gate the runtime
test for microMIPS by the config symbol SYS_SUPPORTS_MICROMIPS.
Signed-off-by: David Daney <david.daney@cavium.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5327/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tony Wu [Fri, 21 Jun 2013 10:13:08 +0000 (10:13 +0000)]
MIPS: GIC: Fix gic_set_affinity infinite loop
There is an infinite loop in gic_set_affinity. When irq_set_affinity
gets called on gic controller, it blocks forever.
Signed-off-by: Tony Wu <tung7970@gmail.com> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5537/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If you connect up an ipv6 socket to an ipv4 mapped address then an
ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the
route cached in the socket is an ipv6 one. In this case there is an
ipv4 route attached, so it gets stomped on.
Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric
Dumazet.
2) AF_KEY notifications leak some kernel memory to userspace, fix from
Mathias Krause.
3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del
doesn't validate that the device being deleted is actually a DLCI
one. Fixes from Li Zefan.
4) Length check on bluetooth l2cap information responses is wrong, each
response type has a different lenth, so we should make sure it's in
a given range rather than enforce one single valid length. From
Jaganath Kanakkassery.
5) Receive FIFO overflow is really easy to trigger in stress scenerios
in the sh_eth driver, but the event isn't being handled properly at
all. Specifically, the mask of error interrupts doesn't include the
event so we never clear it, resulting in the driver becomming wedged
processing an interrupt that never gets cleared.
Fix from Sergei Shtylyov.
6) qlcnic sleeps while holding a spinlock, use mdelay() instead of
msleep(). From Shahed Shaikh.
7) Missing curly braces causes SIP netfilter NAT module to always drop
packets. Fix from Balazs Peter Odor.
8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing
the timer to dereference crap when it fires. Fix from Gao Feng.
9) Missing RCU protection around txq->axq_acq traversal in
ath_txq_schedule(). Fix from Felix Fietkau.
10) Idle state transition test in ath9k_htc_config() is reversed, fix
from Sujith Manoharan.
11) IPV6 forwarding handles unicast Router Alert packets incorrectly.
It tests the wrong option state. Previously opt->ra being non-zero
indicated a router alert marking in the SKB, but now it's indicated
by a bit in opt->flags. Fix from YOSHIFUJI Hideaki.
12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet.
13) get_user_pages_fast() error handling in TUN and MACVTAP use the same
local variable for the base index and the loop iterator for page
traversal, oops! Fix from Michael S Tsirkin.
14) ipv6_get_lladdr() can fail, and we must therefore check it's return
value in inet6_set_iftoken(). For from Hannes Frederic Sowa.
15) If you change an interface name and meanwhile can sneak in something
that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can
deadlock with CONFIG_PREEMPT=n. Fix this by providing a helper
function that properly uses raw_seqcount_begin(). From Nicolas
Schichan.
16) Chain noise calibration test is inverted in iwlwifi, fix from
Nikolay Martynov.
17) Properly set TX iwlwifi descriptor flags for back requests. Fix
from Emmanuel Grumbach.
18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP
module, fix from Pablo Neira Ayuso.
19) Some crummy APs don't provide the proper High Throughput info in
association response frames. Add a workaround by assume we'll use
whatever is in the beacon/probe. Fix from Johannes Berg.
20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and
channel width). Fix from Simon Wunderlich.
21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented
frames. Fix from Phil Oester.
22) Fix rate control regression causing iwlwifi/iwlegacy chips to use
1Mbit/s on pre-11n networks. From Moshe Benji and Stanslaw Gruszka.
23) Disable brcmsmac power-save functions, they cause regressions. From
Arend van Spriel.
24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can
easily crash. Fix from Anderson Lizardo.
25) If a learning packet arrives during vxlan_stop() we crash, easily
fixed by checking netif_running(). From Stephen Hemminger.
26) Static vxlan FDB entries should not be migrated, also from Stephen.
27) skb_clone() failures not handled in vxlan_xmit(), oops. Also from
Stephen.
28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes
Berg.
29) Fix regression in userspace VLAN acceleration control, added by the
802.1ad support changes. Fix from Fernando Luis Vazquez Cao.
30) Interval selection for MLD queries in the bridging code was
reversed. Fix from Linus Lüssing.
31) ipv6's ndisc_send_redirect() erroneously writes to the packet we
received not the packet we are building to send out. Fix from
Matthias Schiffer.
32) Don't free netdev before unregistering it, in usb_8dev can driver.
From Marc Kleine-Budde.
33) Fix nl80211 attribute buffer races, from Johannes Berg.
34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild.
From Stephen Hemminger.
35) Wrong address and family passed to MD5 key lookups in TCP, from
Aydin Arik.
36) phy_type attribute created by SFC driver should not be writable.
From Ben Hutchings.
37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth
should use kzalloc(). Otherwise if setup fails half-way, we'll
dereference garbage when trying to teardown the rings. From Lubomir
Rintel.
38) Fix double-allocation of dst (resulting in unfreeable net device) in
ipv6's init_loopback(). From Gao Feng.
39) Fix fragmentation handling SKB leak in netfilter conntrack, we were
freeing the wrong skb pointer. From Phil Oester.
40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from
Nikolay Aleksandrov.
41) davinci_cpdma doesn't check for DMA mapping errors, letting the
device scribble to random addresses. From Sebastian Siewior.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
dlci: validate the net device in dlci_del()
dlci: acquire rtnl_lock before calling __dev_get_by_name()
af_key: fix info leaks in notify messages
ipv6: ip6_sk_dst_check() must not assume ipv6 dst
net: fix kernel deadlock with interface rename and netdev name retrieval.
net/tg3: Avoid delay during MMIO access
ipv6: check return value of ipv6_get_lladdr
macvtap: fix recovery from gup errors
tun: fix recovery from gup errors
gre: fix a possible skb leak
ipv6: Process unicast packet with Router Alert by checking flag in skb.
ath9k_htc: Handle IDLE state transition properly
ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
netfilter: ipt_ULOG: fix incorrect setting of ulog timer
netfilter: ctnetlink: send event when conntrack label was modified
netfilter: nf_nat_sip: fix mangling
qlcnic: Do not sleep while holding spinlock
drivers: net: cpsw: fix compilation error with cpsw driver
tcp: doc : fix the syncookies default value
sh_eth: fix misreporting of transmit abort
...
Linus Torvalds [Thu, 27 Jun 2013 05:23:15 +0000 (19:23 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull i915 drm fixes from Dave Airlie:
"These should be the last two fixes for i915, one is for a fence leak
killing X on some older GPUs, and one is a late regression partial
revert for an swiotlb/xen/i915 interaction, Konrad has promised to
figure out the proper answer, and this patch is the best thing to do
at this stage to avoid regressing"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.
drm/i915: Restore fences after resume and GPU resets
Steve French [Thu, 27 Jun 2013 04:45:05 +0000 (23:45 -0500)]
[CIFS] SMB3 Signing enablement
SMB3 uses a much faster method of signing (which is also better in other ways),
AES-CMAC. With the kernel now supporting AES-CMAC since last release, we
are overdue to allow SMB3 signing (today only CIFS and SMB2 and SMB2.1,
but not SMB3 and SMB3.1 can sign) - and we need this also for checking
secure negotation and also per-share encryption (two other new SMB3 features
which we need to implement).
This patch needs some work in a few areas - for example we need to
move signing for SMB2/SMB3 from per-socket to per-user (we may be able to
use the "nosharesock" mount option in the interim for the multiuser case),
and Shirish found a bug in the earlier authentication overhaul
(setting signing flags properly) - but those can be done in followon
patches.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
Jaegeuk Kim [Thu, 27 Jun 2013 04:04:08 +0000 (13:04 +0900)]
f2fs: fix to recover i_size from roll-forward
If user requests many data writes and fsync together, the last updated i_size
should be stored to the inode block consistently.
But, previous write_end just marks the inode as dirty and doesn't update its
metadata into its inode block.
After that, fsync just writes the inode block with newly updated data index
excluding inode metadata updates.
So, this patch introduces write_end in which updates inode block too when the
i_size is changed.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Alexey Brodkin [Wed, 26 Jun 2013 10:02:18 +0000 (14:02 +0400)]
ARC: [plat-arcfpga] Add arc_emac to AA4 board DT description
With this patch we introduce DT "arc_emac" description for ARCangel4.
Description is based on info/example in
"Documentation/devicetree/bindings/net/arc_emac.txt".
Jaegeuk Kim [Thu, 27 Jun 2013 00:59:40 +0000 (09:59 +0900)]
f2fs: remove reusing any prefree segments
This patch removes check_prefree_segments initially designed to enhance the
performance by narrowing the range of LBA usage across the whole block device.
When allocating a new segment, previous f2fs tries to find proper prefree
segments, and then, if finds a segment, it reuses the segment for further
data or node block allocation.
However, I found that this was totally wrong approach since the prefree segments
have several data or node blocks that will be used by the roll-forward mechanism
operated after sudden-power-off.
Let's assume the following scenario.
/* write 8MB with fsync */
for (i = 0; i < 2048; i++) {
offset = i * 4096;
write(fd, offset, 4KB);
fsync(fd);
}
In this case, naive segment allocation sequence will be like:
data segment: x, x+1, x+2, x+3
node segment: y, y+1, y+2, y+3.
But, if we can reuse prefree segments, the sequence can be like:
data segment: x, x+1, y, y+1
node segment: y, y+1, y+2, y+3.
Because, y, y+1, and y+2 became prefree segments one by one, and those are
reused by data allocation.
After conducting this workload, we should consider how to recover the latest
inode with its data.
If we reuse the prefree segments such as y or y+1, we lost the old node blocks
so that f2fs even cannot start roll-forward recovery.
Therefore, I suggest that we should remove reusing prefree segments.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Steve French [Thu, 27 Jun 2013 00:14:55 +0000 (19:14 -0500)]
[CIFS] Do not set DFS flag on SMB2 open
If we would set SMB2_FLAGS_DFS_OPERATIONS on open we also would have
to pass the path on the Open SMB prefixed by \\server\share.
Not sure when we would need to do the augmented path (if ever) and
setting this flag breaks the SMB2 open operation since it is
illegal to send an empty path name (without \\server\share prefix)
when the DFS flag is set in the SMB open header. We could
consider setting the flag on all operations other than open
but it is safer to net set it for now.
Steve French [Wed, 26 Jun 2013 22:52:17 +0000 (17:52 -0500)]
[CIFS] fix static checker warning
Dan Carpenter wrote:
The patch 7f420cee8bd6: "[CIFS] Charge at least one credit, if server
says that it supports multicredit" from Jun 23, 2013, leads to the
following Smatch complaint:
fs/cifs/smb2pdu.c:120 smb2_hdr_assemble()
warn: variable dereferenced before check 'tcon->ses' (see line 115)
CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>
Reported-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Layton [Sun, 26 May 2013 11:01:02 +0000 (07:01 -0400)]
cifs: try to handle the MUST SecurityFlags sanely
The cifs.ko SecurityFlags interface wins my award for worst-designed
interface ever, but we're sort of stuck with it since it's documented
and people do use it (even if it doesn't work correctly).
Case in point -- you can specify multiple sets of "MUST" flags. It makes
absolutely no sense, but you can do it.
What should the effect be in such a case? No one knows or seems to have
considered this so far, so let's define it now. If you try to specify
multiple MUST flags, clear any other MAY or MUST bits except for the
ones that involve signing.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
Steve French [Tue, 25 Jun 2013 20:33:41 +0000 (15:33 -0500)]
When server doesn't provide SecurityBuffer on SMB2Negotiate pick default
According to MS-SMB2 section 2.2.4: if no blob, client picks default which
for us will be
ses->sectype = RawNTLMSSP;
but for time being this is also our only auth choice so doesn't matter
as long as we include this fix (which does not treat the empty
SecurityBuffer as an error as the code had been doing).
We just found a server which sets blob length to zero expecting raw so
this fixes negotiation with that server.
Steve French [Tue, 25 Jun 2013 19:03:16 +0000 (14:03 -0500)]
Handle big endianness in NTLM (ntlmv2) authentication
This is RH bug 970891
Uppercasing of username during calculation of ntlmv2 hash fails
because UniStrupr function does not handle big endian wchars.
Also fix a comment in the same code to reflect its correct usage.
[To make it easier for stable (rather than require 2nd patch) fixed
this patch of Shirish's to remove endian warning generated
by sparse -- steve f.]
Reported-by: steve <sanpatr1@in.ibm.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Cc: <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
Jeff Layton [Tue, 25 Jun 2013 06:32:17 +0000 (01:32 -0500)]
revalidate directories instiantiated via FIND_* in order to handle DFS referrals
We've had a long-standing problem with DFS referral points. CIFS servers
generally try to make them look like directories in FIND_FIRST/NEXT
responses. When you go to try to do a FIND_FIRST on them though, the
server will then (correctly) return STATUS_PATH_NOT_COVERED. Mostly this
manifests as spurious EREMOTE errors back to userland.
This patch attempts to fix this by marking directories that are
discovered via FIND_FIRST/NEXT for revaldiation. When the lookup code
runs across them again, we'll reissue a QPathInfo against them and that
will make it chase the referral properly.
There is some performance penalty involved here and no I haven't
measured it -- it'll be highly dependent upon the workload and contents
of the mounted share. To try and mitigate that though, the code only
marks the inode for revalidation when it's possible to run across a DFS
referral. i.e.: when the kernel has DFS support built in and the share
is "in DFS"
[At the Microsoft plugfest we noted that usually the DFS links had
the REPARSE attribute tag enabled - DFS junctions are reparse points
after all - so I just added a check for that flag too so the
performance impact should be smaller - Steve]
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
Steve French [Tue, 25 Jun 2013 05:20:49 +0000 (00:20 -0500)]
SMB2 FSCTL and IOCTL worker function
This worker function is needed to send SMB2 fsctl
(and ioctl) requests including:
validating negotiation info (secure negotiate)
querying the servers network interfaces
copy offload (refcopy)
Followon patches for the above three will use this.
This patch also does general validation of the response.
In the future, as David Disseldorp notes, for the copychunk ioctl
case, we will want to enhance the response processing to allow
returning the chunk request limits to the caller (even
though the server returns an error, in that case we would
return data that the caller could use - see 2.2.32.1).
See MS-SMB2 Section 2.2.31 for more details on format of fsctl.
Acked-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
Steve French [Sun, 23 Jun 2013 23:43:37 +0000 (18:43 -0500)]
Charge at least one credit, if server says that it supports multicredit
In SMB2.1 and later the server will usually set the large MTU flag, and
we need to charge at least one credit, if server says that since
it supports multicredit. Windows seems to let us get away with putting
a zero there, but they confirmed that it is wrong and the spec says
to put one there (if the request is under 64K and the CAP_LARGE_MTU
was returned during protocol negotiation by the server.
CC: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>