Jayachandran C [Tue, 24 Jul 2012 16:17:47 +0000 (18:17 +0200)]
MIPS: Netlogic: Fix indentation of smpboot.S
[ralf@linux-mips.org: I've folded most segments of this patch into those
patches in -next that originally were causing the whitespace damage.
This is just what's left over]
Jayachandran C [Tue, 24 Jul 2012 15:28:55 +0000 (17:28 +0200)]
MIPS: Netlogic: remove cpu_has_dc_aliases define for XLP
On XLP, the dcache size depends on the number of enabled threads in
core. There are no dcache aliases if the pagesize is large enough or
if enough threads are enabled in the core.
Remove the #define for cpu_has_dc_aliases and leave it to be computed
at runtime.
Probe and add devices on SoC "simple-bus" on startup. This will
in turn add devices like I2C controller that are specified in the
device tree under 'soc'.
The XLP USB controller appears as a device on the internal SoC PCIe
bus, the block has 2 EHCI blocks and 4 OHCI blocks. Change are to:
* Add files netlogic/xlp/usb-init.c and asm/netlogic/xlp-hal/usb.h
to initialize the USB controller and define PCI fixups. The PCI
fixups are to setup interrupts and DMA mask.
* Update include/asm/xlp-hal/{iomap.h,pic.h,xlp.h} to add interrupt
mapping for EHCI/OHCI interrupts.
Adds support for the XLP on-chip PCIe controller. On XLP, the
on-chip devices(including the 4 PCIe links) appear in the PCIe
configuration space of the XLP as PCI devices.
The changes are to initialize and register the PCIe controller,
enable hardware byte swap in the PCIe IO and MEM space, and to
enable PCIe interrupts.
Jayachandran C [Tue, 24 Jul 2012 15:28:53 +0000 (17:28 +0200)]
MIPS: Netlogic: Platform changes for XLS USB
Add USB initialization code, setup resources and add USB platform
driver in mips/netlogic/xlr/platform.c.
Add USB support for XLR/XLS platform in Kconfig.
Jayachandran C [Tue, 24 Jul 2012 15:28:47 +0000 (17:28 +0200)]
MIPS: Netlogic: SMP wakeup code update
Update for core intialization code. Initialize status register
after receiving NMI for CPU wakeup. Add the low level L1D flush
code before enabling threads in core.
Also convert the ehb to _ehb so that it works under more GCC
versions.
Jayachandran C [Tue, 24 Jul 2012 15:26:34 +0000 (17:26 +0200)]
MIPS: Netlogic: Update comments in smpboot.S
No change in logic, comments update and whitespace cleanup.
* A few comments in the file were in assembler style and the rest
int C style, convert all of them to C style.
* Mark workarounds for Ax silicon with a macro XLP_AX_WORKAROUND
* Whitespace fixes - use tabs consistently
* rename __config_lsu macro to xlp_config_lsu
Jonas Gorski [Tue, 24 Jul 2012 14:33:12 +0000 (16:33 +0200)]
MIPS: BCM63XX: Use the Chip ID register for identifying the SoC
Newer BCM63XX SoCs use virtually the same CPU ID, differing only in the
revision bits. But since they all have the Chip ID register at the same
location, we can use that to identify the SoC we are running on.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3955/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Jonas Gorski [Tue, 24 Jul 2012 14:33:11 +0000 (16:33 +0200)]
MIPS: BCM63XX: Add flash type detection
On BCM6358 and BCM6368 the attached flash type is exposed through a
bootstrapping register. Use it for auto detecting the flash type on
those and default to parallel flash for earlier SoCs.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3954/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Kelvin Cheung [Wed, 20 Jun 2012 19:05:32 +0000 (20:05 +0100)]
MIPS: Add CPU support for Loongson1B
Loongson 1B is a 32-bit SoC designed by Institute of Computing Technology
(ICT) and the Chinese Academy of Sciences (CAS), which implements the
MIPS32 release 2 instruction set.
[ralf@linux-mips.org: But which is not strictly a MIPS32 compliant device
which also is why it identifies itself with the Legacy Vendor ID in the
PrID register. When applying the patch I shoveled some code around to
keep things in alphabetical order and avoid forward declarations.]
Thomas Langer [Sun, 20 May 2012 13:46:19 +0000 (15:46 +0200)]
SPI: MIPS: lantiq: add FALCON spi driver
The external bus unit (EBU) found on the FALCON SoC has spi emulation that is
designed for serial flash access. This driver has only been tested with m25p80
type chips. The hardware has no support for other types of spi peripherals.
Signed-off-by: Thomas Langer <thomas.langer@lantiq.com> Signed-off-by: John Crispin <blogic@openwrt.org> Cc: spi-devel-general@lists.sourceforge.net Cc: linux-mips@linux-mips.org Acked-by: Grant Likely <grant.likely@secretlab.ca>
Patchwork: https://patchwork.linux-mips.org/patch/3844/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In hibernation mode only the wakeup logic and the RTC are left running,
so this is what users perceive as power down.
If the counters are not initialized, the corresponding pin (typically
connected to the power button) has to be asserted for two seconds
before the device wakes up. Most users expect a shorter wakeup time.
I took the timing values of 100 ms and 60 ms from BouKiCHi's patch for
the Dingoo A320 kernel.
MTD: NAND: JZ4740: Multi-bank support with autodetection
The platform data can now specify which external memory banks to probe
for NAND chips, and in which order. Banks that contain a NAND are used
and the other banks are freed.
Squashed version of development done in jz-2.6.38 branch.
Original patch by Lars-Peter Clausen with some bug fixes from me.
Thanks to Paul Cercueil for the initial autodetection patch.
Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3560/ Acked-By: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Tue, 19 Jun 2012 18:59:29 +0000 (19:59 +0100)]
MIPS: Fixup ordering of micro assembler instructions.
A number of new instructions have been added to the micro assembler causing
the list to no longer be in alphabetical order. This patch fixes up the name
ordering.
Signed-off-by: Steven J. Hill <sjhill@mips.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3789/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 6 Jun 2012 22:00:31 +0000 (23:00 +0100)]
MIPS: Unify memcpy.S and memcpy-inatomic.S
We can save the 451 lines of code that comprise memcpy-inatomic.S at the
expense of a single instruction in the memcpy prolog. We also use an
additional register (t6), so this may cause increased register pressure in
some places as well. But I think the reduced maintenance burden, of not
having two nearly identical implementations, makes it worth it.
Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Mon, 9 Apr 2012 15:58:39 +0000 (10:58 -0500)]
MIPS: SMTC: Support for Multi-threaded FPUs
Signed-off-by: Steven J. Hill <sjhill@mips.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3603/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Mon, 21 May 2012 15:33:16 +0000 (10:33 -0500)]
MIPS: Remove dead code related to 1004K oprofile support.
Signed-off-by: Steven J. Hill <sjhill@mips.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3854/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:40 +0000 (18:12 +0200)]
MIPS: Octeon: Use device tree to register serial ports.
Switch to using the device tree to register serial ports.
Add all the ports with compatible = "cavium,octeon-3860-uart". Octeon serial
ports have their own device type, required port flags, and I/O
functions, so using of_serial.c is not indicated.
We need to do this as late_initcall, as the 8250 driver must be
initialized before we add any ports. 8250 initialization is done at
device_initcall time.
The OCTEON_IRQ_UART{0,1,2} symbols are removed as they are now unused
and interfere with irq_domain used by the device tree code.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Rob Herring <rob.herring@calxeda.com> Cc: linux-kernel@vger.kernel.org Cc: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3942/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:39 +0000 (18:12 +0200)]
netdev: octeon_mgmt: Convert to use device tree.
The device tree will supply the register bank base addresses, make
register addressing relative to those. PHY connection is now
described by the device tree.
The OCTEON_IRQ_MII{0,1} symbols are also removed as they are now
unused and interfere with the irq_domain used for device tree irq
mapping.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3941/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:39 +0000 (18:12 +0200)]
netdev: mdio-octeon.c: Convert to use device tree.
Get the MDIO bus controller addresses from the device tree, small
clean up in use of devm_*
Remove, now unused, platform device setup code.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3938/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:39 +0000 (18:12 +0200)]
i2c: Convert i2c-octeon.c to use device tree.
There are three parts to this:
1) Remove the definitions of OCTEON_IRQ_TWSI and OCTEON_IRQ_TWSI2.
The interrupts are specified by the device tree and these hard
coded irq numbers block the used of the irq lines by the irq_domain
code.
2) Remove platform device setup code from octeon-platform.c, it is
now unused.
3) Convert i2c-octeon.c to use device tree. Part of this includes
using the devm_* functions instead of the raw counterparts, thus
simplifying error handling. No functionality is changed.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely <grant.likely@secretlab.ca> Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3939/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:38 +0000 (18:12 +0200)]
MIPS: Prune some target specific code out of prom.c
This code is not common enough to be in a shared file. It is also not
used by any existing boards, so just remove it.
[ralf@linux-mips.org: Dropped removal of irq_create_of_mapping which was
already removed by abd2363f6a5f1030b935e0bdc15cf917313b3b10
[irq_domain/mips: Allow irq_domain on MIPS]. Moved device_tree_init() and
dependencies to its sole user, the XLP code.]
David Daney [Thu, 5 Jul 2012 16:12:38 +0000 (18:12 +0200)]
MIPS: Octeon: Add device tree source files.
The two device tree files octeon_3xxx.dts and octeon_68xx.dts are
trimmed by code in a subsequent patch to reflect the hardware actually
present on the board. To this end several properties that are not
part of the declared bindings are added to aid in trimming off
unwanted nodes. Since the device tree and the code that trims it are
bound into the kernel binary, these 'marker' properties never escape
into the wild, and are purely an implementation detail of the kernel
early boot process. This is done for backwards compatibility with
existing boards (identified by a board type enumeration value by their
bootloaders). New boards will always pass a device tree from the
bootloader, the built-in trees are ignored in this case.
Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: linux-kernel@vger.kernel.org Cc: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3937/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:38 +0000 (18:12 +0200)]
of/lib: Allow scripts/dtc/libfdt to be used from kernel code
libfdt is part of the device tree support in scripts/dtc/libfdt. For
some platforms that use the Device Tree, we want to be able to edit
the flattened device tree form.
We don't want to burden kernel builds that do not require it, so we
gate compilation of libfdt files with CONFIG_LIBFDT. So if it is
needed, you need to do this in your Kconfig:
select LIBFDT
And in the Makefile of the code using libfdt something like:
ccflags-y := -I$(src)/../../../scripts/dtc/libfdt
Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: devicetree-discuss@lists.ozlabs.org Cc: Grant Likely <grant.likely@secretlab.ca> Cc: linux-kernel@vger.kernel.org Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 5 Jul 2012 16:12:38 +0000 (18:12 +0200)]
MIPS: OCTEON: Consolidate the edge and level irq_chip structures.
We can consolidate octeon_irq_chip_ciu_edge and octeon_irq_chip_ciu as
they only differ in the .irq_ack element, and that is unused by the
level handler. This gets rid of a bunch of duplicate definitions.
The follow-on patch to add irq_domain support will be the supported
method for using these irq lines, so get these defines out of the way
in preperation for that.
MIPS: BCM63xx: Add stub to register the SPI platform driver
This patch adds the necessary stub to register the SPI platform driver.
Since the registers are shuffled between the 4 BCM63xx CPUs supported by
this SPI driver we also need to generate the internal register layout and
export this layout for the driver to use it properly.
MIPS: BCM63xx: Define internal registers offsets of the SPI controller
BCM6338, BCM6348, BCM6358 and BCM6368 basically use the same SPI controller
though the internal registers are shuffled, which still allows a common
driver to drive that IP block.
This register was introduced with the support of the BCM6368 CPU in the idea
that its internal layout was different from the other CPUs SPI controller.
The controller is actually the same as the one present on BCM6358 so we can
remove this register and use the usual SPI register instead.
Manuel Lauss [Sat, 21 Jan 2012 17:13:15 +0000 (18:13 +0100)]
MIPS: Alchemy: handle db1200 cpld ints as they come in
Remove the loop in the cascade handler and instead unconditionally
handle just the first set interrupt coming from the CPLD.
This gets rid of a lot of spurious interrupts being triggered for
the SMSC91111 ethernet chip especially under high(er) IDE load:
"eth0: spurious interrupt (mask = 0xb3)"
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov:
"The dmesg command appears to be broken after the printk rework. The
old logic in the kdb code makes no sense in terms of current
printk/logging storage format, and KDB simply hangs forever upon
entering 'dmesg' command.
The first patch revives the command by switching to kmsg_dumper
iterator. As a side-effect, the code is now much more simpler.
A few changes were needed in the printk.c: we needed unlocked variant
of the kmsg_dumper iterator, but these can surely wait for 3.6.
It's probably too late even for the first patch to go to 3.5, but I'll
try to convince otherwise. :-) Here we go:
- The current code is broken for sure, and has no hope to work at
all. It is a regression
- The new code works for me, and probably works for everyone else;
- If it compiles (and I urge everyone to compile-test it on your
setup), it hardly can make things worse."
* Merge emailed patches from Anton Vorontsov: (4 commits)
kdb: Switch to nolock variants of kmsg_dump functions
printk: Implement some unlocked kmsg_dump functions
printk: Remove kdb_syslog_data
kdb: Revive dmesg command
Anton Vorontsov [Sat, 21 Jul 2012 00:27:37 +0000 (17:27 -0700)]
kdb: Revive dmesg command
The kgdb dmesg command is broken after the printk rework. The old logic
in kdb code makes no sense in terms of current printk/logging storage
format, and KDB simply hangs forever.
This patch revives the command by switching to kmsg_dumper iterator.
The code is now much more simpler and shorter.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull late MIPS fixes from Ralf Baechle:
"This fixes a number of lose ends in the MIPS code and various bug
fixes.
Aside of dropping some patch that should not be in this pull request
everything has sat in -next for quite a while and there are no known
issues.
The biggest patch in this patch set moves the allocation of an array
that is aliased to a function (for runtime generated code) to
assembler code. This avoids an issue with certain toolchains when
building for microMIPS."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
MIPS: PCI: Move fixups from __init to __devinit.
MIPS: Fix bug.h MIPS build regression
MIPS: sync-r4k: remove redundant irq operation
MIPS: smp: Warn on too early irq enable
MIPS: call set_cpu_online() on cpu being brought up with irq disabled
MIPS: call ->smp_finish() a little late
MIPS: Yosemite: delay irq enable to ->smp_finish()
MIPS: SMTC: delay irq enable to ->smp_finish()
MIPS: BMIPS: delay irq enable to ->smp_finish()
MIPS: Octeon: delay enable irq to ->smp_finish()
MIPS: Oprofile: Fix build as a module.
MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
MIPS: Fix Magic SysRq L kernel crash.
MIPS: BMIPS: Fix duplicate header inclusion.
mips: mark const init data with __initconst instead of __initdata
MIPS: cmpxchg.h: Add missing include
MIPS: Malta may also be equipped with MIPS64 R2 processors.
MIPS: Fix typo multipy -> multiply
MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
...
Merge tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper discard fixes from Alasdair G Kergon:
- avoid a crash in dm-raid1 when discards coincide with mirror
recovery;
- avoid discarding shared data that's still needed in dm-thin;
- don't guarantee that discarded blocks will be wiped in dm-raid1.
* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm raid1: set discard_zeroes_data_unsupported
dm thin: do not send discards to shared blocks
dm raid1: fix crash with mirror recovery and discard
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
"These are catastrophic fixes to the pnfs objects-layout that were just
discovered. They are also destined for @stable.
I have found these and worked on them at around RC1 time but
unfortunately went to the hospital for kidney stones and had a very
slow recovery. I refrained from sending them as is, before proper
testing, and surly I have found a bug just yesterday.
So now they are all well tested, and have my sign-off. Other then
fixing the problem at hand, and assuming there are no bugs at the new
code, there is low risk to any surrounding code. And in anyway they
affect only these paths that are now broken. That is RAID5 in pnfs
objects-layout code. It does also affect exofs (which was not broken)
but I have tested exofs and it is lower priority then objects-layout
because no one is using exofs, but objects-layout has lots of users."
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
pnfs-obj: don't leak objio_state if ore_write/read fails
ore: Unlock r4w pages in exact reverse order of locking
ore: Remove support of partial IO request (NFS crash)
ore: Fix NFS crash by supporting any unaligned RAID IO
and we finally have the fix. I am quite confident the fix is correct
because I could reproduce the problem with nandsim and verify the fix.
It was also verified by Iwo (the reporter).
I am also confident that this is OK to merge the fix so late because
this patch affects only the fixup functionality, which is not used by
most users."
* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
UBIFS: fix a bug in empty space fix-up
We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard. So this patch sets
ti->discard_zeroes_data_unsupported.
For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.
When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.
This patch detects block sharing and ends the discard with success when
sending it to the shared block.
The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.
dm raid1: fix crash with mirror recovery and discard
This patch fixes a crash when a discard request is sent during mirror
recovery.
Firstly, some background. Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
simultaneously recovered regions (by default the semaphore value is 1,
so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
__rh_recovery_prepare asks the log driver for the next region to
recover. Then, it sets the region state to DM_RH_RECOVERING. If there
are no pending I/Os on this region, the region is added to
quiesced_regions list. If there are pending I/Os, the region is not
added to any list. It is added to the quiesced_regions list later (by
dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
flight on this region. The region is popped from the list in
dm_rh_recovery_start function. Then, a kcopyd job is started in the
recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
dm_rh_recovery_end. dm_rh_recovery_end adds the region to
recovered_regions or failed_recovered_regions list (depending on
whether the copy operation was successful or not).
The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.
Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.
There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.
These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).
In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING). However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.
This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.
Boaz Harrosh [Thu, 7 Jun 2012 23:02:30 +0000 (02:02 +0300)]
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.
Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.
Fix both birds, by returning a global zero_page, if offset is beyond
i_size.
TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.
[Bug since 3.2. Should apply the same way to all Kernels since] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
ore: Unlock r4w pages in exact reverse order of locking
The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.
I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.
Boaz Harrosh [Fri, 8 Jun 2012 01:30:40 +0000 (04:30 +0300)]
ore: Remove support of partial IO request (NFS crash)
Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.
Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.
TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets
[Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> CC: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Thu, 7 Jun 2012 22:19:07 +0000 (01:19 +0300)]
ore: Fix NFS crash by supporting any unaligned RAID IO
In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.
Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.
The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.
The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.
The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute
And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.
Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.
This solution is much better for NFS than the previous supposedly
solution because the short write was dealt with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)
NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.
hurray!!
[Stable this is an NFS bug since 3.2 Kernel should apply cleanly] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).
The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.
There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:
UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.
The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull last minute Ceph fixes from Sage Weil:
"The important one fixes a bug in the socket failure handling behavior
that was turned up in some recent failure injection testing. The
other two are minor bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: endian bug in rbd_req_cb()
rbd: Fix ceph_snap_context size calculation
libceph: fix messenger retry
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull three md bugfixes from NeilBrown:
"One of the bugs was introduced in 3.5-rc1. Others have been there for
longer."
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: close some possible races on write errors during resync
md: avoid crash when stopping md array races with closing other open fds.
md: fix bug in handling of new_data_offset