Pull GFS2 updates from Steven Whitehouse:
"The major feature this time is the "rbm" conversion in the resource
group code. The new struct gfs2_rbm specifies the location of an
allocatable block in (resource group, bitmap, offset) form. There are
a number of added helper functions, and later patches then rewrite
some of the resource group code in terms of this new structure. Not
only does this give us a nice code clean up, but it also removes some
of the previous restrictions where extents could not cross bitmap
boundaries, for example.
In addition to that, there are a few bug fixes and clean ups, but the
rbm work is by far the majority of this patch set in terms of number
of changed lines."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (27 commits)
GFS2: Write out dirty inode metadata in delayed deletes
GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
GFS2: Fix infinite loop in rbm_find
GFS2: Consolidate free block searching functions
GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2: Stop block extents at the end of bitmaps
GFS2: Fix unclaimed_blocks() wrapping bug and clean up
GFS2: Improve block reservation tracing
GFS2: Fall back to ignoring reservations, if there are no other blocks left
GFS2: Fix ->show_options() for statfs slow
GFS2: Use rbm for gfs2_setbit()
GFS2: Use rbm for gfs2_testbit()
GFS2: Eliminate unnecessary check for state > 3 in bitfit
GFS2: Eliminate redundant calls to may_grant
GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote
GFS2: Combine functions gfs2_glock_wait and wait_on_holder
GFS2: inline __gfs2_glock_schedule_for_reclaim
GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq
GFS2: rbm code cleanup
GFS2: Fix case where reservation finished at end of rgrp
...
IBM reported a deadlock in select_parent(). This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.
There are two cases when the traversal needs to be restarted:
1) concurrent d_move(); this can only happen when not already locked,
since taking rename_lock protects against concurrent d_move().
2) racing with final d_put() on child just at the moment of ascending
to parent; rename_lock doesn't protect against this rare race, so it
can happen when already locked.
Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held. This patch fixes all three callers of
try_to_ascend().
IBM reported that the deadlock is gone with this patch.
[ I rewrote the patch to be smaller and just do the "goto again" if the
lock was already held, but credit goes to Miklos for the real work.
- Linus ]
Merge tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two small patches:
* One patch to fix the function declarations for
!CONFIG_IOMMU_API. This is causing build errors
in linux-next and should be fixed for v3.6.
* Another patch to fix an IOMMU group related NULL pointer
dereference."
* tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix wrong assumption in iommu-group specific code
iommu: static inline iommu group stub functions
Pull NVMe driver fixes from Matthew Wilcox:
"Now that actual hardware has been released (don't have any yet
myself), people are starting to want some of these fixes merged."
Willy doesn't have hardware? Guys...
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Cancel outstanding IOs on queue deletion
NVMe: Free admin queue memory on initialisation failure
NVMe: Use ida for nvme device instance
NVMe: Fix whitespace damage in nvme_init
NVMe: handle allocation failure in nvme_map_user_pages()
NVMe: Fix uninitialized iod compiler warning
NVMe: Do not set IO queue depth beyond device max
NVMe: Set block queue max sectors
NVMe: use namespace id for nvme_get_features
NVMe: replace nvme_ns with nvme_dev for user admin
NVMe: Fix nvme module init when nvme_major is set
NVMe: Set request queue logical block size
Sasha Levin has been running trinity in a KVM tools guest, and was able
to trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of
the memory type). The call trace showed that it was mtdchar_mmap() that
created an invalid remap_pfn_range().
The problem is that mtdchar_mmap() does various really odd and subtle
things with the vma page offset etc, and uses the wrong types (and the
wrong overflow) detection for it.
For example, the page offset may well be 32-bit on a 32-bit
architecture, but after shifting it up by PAGE_SHIFT, we need to use a
potentially 64-bit resource_size_t to correctly hold the full value.
Also, we need to check that the vma length plus offset doesn't overflow
before we check that it is smaller than the length of the mtdmap region.
This fixes things up and tries to make the code a bit easier to read.
1) Netfilter xt_limit module can use uninitialized rules, from Jan
Engelhardt.
2) Wei Yongjun has found several more spots where error pointers were
treated as NULL/non-NULL and vice versa.
3) bnx2x was converted to pci_io{,un}map() but one remaining plain
iounmap() got missed. From Neil Horman.
4) Due to a fence-post type error in initialization of inetpeer entries
(which is where we store the ICMP rate limiting information), we can
erroneously drop ICMPs if the inetpeer was created right around when
jiffies wraps.
Fix from Nicolas Dichtel.
5) smsc75xx resume fix from Steve Glendinnig.
6) LAN87xx smsc chips need an explicit hardware init, from Marek Vasut.
7) qlcnic uses msleep() with locks held, fix from Narendra K.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netdev: octeon: fix return value check in octeon_mgmt_init_phy()
inetpeer: fix token initialization
qlcnic: Fix scheduling while atomic bug
bnx2: Clean up remaining iounmap
net: phy: smsc: Implement PHY config_init for LAN87xx
smsc75xx: fix resume after device reset
netdev: pasemi: fix return value check in pasemi_mac_phy_init()
team: fix return value check
l2tp: fix return value check
netfilter: xt_limit: have r->cost != 0 case work
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes; one for automount/lazy umount race, another a
classic "we don't protect the refcount transition to zero with the
lock that protects looking for object in hash" kind of crap in lockd."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
close the race in nlmsvc_free_block()
do_add_mount()/umount -l races
Merge branch 'for-linus-3.6-rc-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger.
* 'for-linus-3.6-rc-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Preinclude include/linux/kern_levels.h
um: Fix IPC on um
um: kill thread->forking
um: let signal_delivered() do SIGTRAP on singlestepping into handler
um: don't leak floating point state and segment registers on execve()
um: take cleaning singlestep to start_thread()
Merge tag 'dm-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull dm fixes from Alasdair G Kergon:
"A few fixes for problems discovered during the 3.6 cycle.
Of particular note, are fixes to the thin target's discard support,
which I hope is finally working correctly; and fixes for multipath
ioctls and device limits when there are no paths."
* tag 'dm-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm verity: fix overflow check
dm thin: fix discard support for data devices
dm thin: tidy discard support
dm: retain table limits when swapping to new table with no devices
dm table: clear add_random unless all devices have it set
dm: handle requests beyond end of device instead of using BUG_ON
dm mpath: only retry ioctl when no paths if queue_if_no_path set
dm thin: do not set discard_zeroes_data
Andrea Arcangeli [Fri, 28 Sep 2012 12:35:31 +0000 (14:35 +0200)]
thp: avoid VM_BUG_ON page_count(page) false positives in __collapse_huge_page_copy
Speculative cache pagecache lookups can elevate the refcount from
under us, so avoid the false positive. If the refcount is < 2 we'll be
notified by a VM_BUG_ON in put_page_testzero as there are two
put_page(src_page) in a row before returning from this function.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iommu/amd: Fix wrong assumption in iommu-group specific code
The new IOMMU groups code in the AMD IOMMU driver makes the
assumption that there is a pci_dev struct available for all
device-ids listed in the IVRS ACPI table. Unfortunatly this
assumption is not true and so this code causes a NULL
pointer dereference at boot on some systems.
Fix it by making sure the given pointer is never NULL when
passed to the group specific code. The real fix is larger
and will be queued for v3.7.
netdev: octeon: fix return value check in octeon_mgmt_init_phy()
In case of error, the function of_phy_connect() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"The three nouveau fixes quiten unneeded dmesg spam that people are
seeing and pondering,
The udl fix stops it from trying to driver monitors that are too big,
where we get a black screen.
And a vmware memory alloc problem."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nvc0/fifo: ignore bits in PFIFO_INTR that aren't set in PFIFO_INTR_EN
drm/udl: limit modes to the sku pixel limits.
vmwgfx: corruption in vmw_event_fence_action_create()
drm/nvc0/ltcg: mask off intr 0x10
drm/nouveau: silence a debug message triggered by newer userspace
Merge tag 'usb-3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are two USB bugfixes for your 3.6-rc7 tree.
The OHCI fix has been reported a number of times and is a regression
from 3.5, and the patch that causes the regression was on the way to
the -stable trees before I was reminded (again) that this fix needed
to get to your tree soon.
The host controller bugfix was reported in older kernels as being
pretty easy to trigger, and has been tested by Red Hat and their
customers.
Both have been in the usb-next branch in the -next tree for a while
now, I just cherry-picked them out to get to you in time for the 3.6
release.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: Fix race condition when removing host controllers
USB: ohci-at91: fix null pointer in ohci_hcd_at91_overcurrent_irq
Nicolas Dichtel [Thu, 27 Sep 2012 04:11:00 +0000 (04:11 +0000)]
inetpeer: fix token initialization
When jiffies wraps around (for example, 5 minutes after the boot, see
INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be
< XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus
some icmp packets can be unexpectedly dropped.
Fix this case by initializing last_rate to 60 seconds in the past.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Narendra K [Tue, 25 Sep 2012 07:53:19 +0000 (07:53 +0000)]
qlcnic: Fix scheduling while atomic bug
In the device close path, 'qlcnic_fw_destroy_ctx' and
'qlcnic_poll_rsp' call msleep. But 'qlcnic_fw_destroy_ctx' and
'qlcnic_poll_rsp' are called with 'adapter->tx_clean_lock' spin lock
held resulting in scheduling while atomic bug causing the following
trace.
I observed that the commit 012dc19a45b2b9cc2ebd14aaa401cf782c2abba4
from John Fastabend addresses a similar issue in ixgbevf driver.
Adopting the same approach used in the commit, this patch uses mdelay
to address the issue.
Neil Horman [Wed, 26 Sep 2012 07:22:02 +0000 (07:22 +0000)]
bnx2: Clean up remaining iounmap
commit c0357e975afdbbedab5c662d19bef865f02adc17 modified bnx2 to switch from
using ioremap/iounmap to pci_iomap/pci_iounmap. They missed a spot in the error
path of bnx2_init_one though. This patch just cleans that up.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Michael Chan <mcan@broadcom.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull one more arm-soc bugfix from Olof Johansson:
"Here's a bugfix for orion5x. Without this, PCI doesn't initialize
properly because of too small coherent pool to cover the allocations
needed.
A similar fix has already been done on kirkwood."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: Orion5x: Fix too small coherent pool.
Merge tag 'gpio-fixes-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"A late GPIO fix: Roland Stigge found a problem in the LPC32xx driver
where a callback ignores one of its arguments. It needs to go into
stable too so sending this upstream immediately."
* tag 'gpio-fixes-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio-lpc32xx: Fix value handling of gpio_direction_output()
Merge tag 'md-3.6-fixes' of git://neil.brown.name/md
Pull two md bugfixes from NeilBrown:
"One (missing spinlock init) was only introduced recently. The other
has been present as long as raid10 has been supported, so is tagged
for -stable."
* tag 'md-3.6-fixes' of git://neil.brown.name/md:
md/raid10: fix "enough" function for detecting if array is failed.
md/raid5: add missing spin_lock_init.
Pull EDAC fixes from Mauro Carvalho Chehab:
"Three edac fixes at the memory enumeration logic:
- i3200_edac: Fixes a regression at the memory rank size, when the
memorias are dual-rank;
- i5000_edac: Fix a longstanding bug when calculating the memory
size: before Kernel 3.6, the memory size were right only
with one specific configuration;
- sb_edac: Fixes a bug since the initial release of the driver:
with 16GB DIMMs, there's an overflow at the memory size,
causing the number of pages per dimm (an unsigned value)
to have the highest bit equal to 1, effectively mangling
the memory size.
The third bug can potentially affect the error decoding logic as well."
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: Avoid overflow errors at memory size calculation
i5000: Fix the memory size calculation with 2R memories
i3200_edac: Fix memory rank size
Marek Vasut [Tue, 25 Sep 2012 10:17:42 +0000 (10:17 +0000)]
net: phy: smsc: Implement PHY config_init for LAN87xx
The LAN8710/LAN8720 chips do have broken the "FlexPWR" smart power-saving
capability. Enabling it leads to the PHY not being able to detect Link when
cold-started without cable connected. Thus, make sure this is disabled.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Christian Hohnstaedt <chohnstaedt@innominate.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Otavio Salvador <otavio@ossystems.com.br> Acked-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
The userspace part of UML uses the asm-offsets.h generator mechanism to
create definitions for UM_KERN_<LEVEL> that match the in-kernel
KERN_<LEVEL> constant definitions.
As of commit 04d2c8c83d0e3ac5f78aeede51babb3236200112 ("printk: convert
the format for KERN_<LEVEL> to a 2 byte pattern"), KERN_<LEVEL> is no
longer expanded to the literal '"<LEVEL>"', but to '"\001" "LEVEL"', i.e.
it contains two parts.
However, the combo of DEFINE_STR() in
arch/x86/um/shared/sysdep/kernel-offsets.h and sed-y in Kbuild doesn't
support string literals consisting of multiple parts. Hence for all
UM_KERN_<LEVEL> definitions, only the SOH character is retained in the actual
definition, while the remainder ends up in the comment. E.g. in
include/generated/asm-offsets.h we get
#define UM_KERN_INFO "\001" /* "6" KERN_INFO */
instead of
#define UM_KERN_INFO "\001" "6" /* KERN_INFO */
This causes spurious '^A' output in some kernel messages:
Calibrating delay loop... 4640.76 BogoMIPS (lpj=23203840)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 256
^AChecking that host ptys support output SIGIO...Yes
^AChecking that host ptys support SIGIO on close...No, enabling workaround
^AUsing 2.6 host AIO
NET: Registered protocol family 16
bio: create slab <bio-0> at 0
Switching to clocksource itimer
To fix this:
- Move the mapping from UM_KERN_<LEVEL> to KERN_<LEVEL> from
arch/um/include/shared/common-offsets.h to
arch/um/include/shared/user.h, which is preincluded for all userspace
parts,
- Preinclude include/linux/kern_levels.h for all userspace parts, to
obtain the in-kernel KERN_<LEVEL> constant definitions. This doesn't
violate the kernel/userspace separation, as include/linux/kern_levels.h
is self-contained and doesn't expose any other kernel internals.
- Remove the now unused STR() and DEFINE_STR() macros.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at>
commit c1d7e01d (ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION)
forgot UML and broke IPC on it.
Also UML has to select ARCH_WANT_IPC_PARSE_VERSION usin Kconfig.
Reported-and-tested-by: <Toralf Förster toralf.foerster@gmx.de> Signed-off-by: Richard Weinberger <richard@nod.at>
netdev: pasemi: fix return value check in pasemi_mac_phy_init()
In case of error, the function of_phy_connect() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function genlmsg_put() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function genlmsg_put() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Stern [Wed, 26 Sep 2012 17:09:53 +0000 (13:09 -0400)]
USB: Fix race condition when removing host controllers
This patch (as1607) fixes a race that can occur if a USB host
controller is removed while a process is reading the
/sys/kernel/debug/usb/devices file.
The usb_device_read() routine uses the bus->root_hub pointer to
determine whether or not the root hub is registered. The is not a
valid test, because the pointer is set before the root hub gets
registered and remains set even after the root hub is unregistered and
deallocated. As a result, usb_device_read() or usb_device_dump() can
access freed memory, causing an oops.
The patch changes the test to use the hcd->rh_registered flag, which
does get set and cleared at the appropriate times. It also makes sure
to hold the usb_bus_list_lock mutex while setting the flag, so that
usb_device_read() will become aware of new root hubs as soon as they
are registered.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Don Zickus <dzickus@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6fffb77c (USB: ohci-at91: fix PIO handling in relation with number of
ports) started setting unused pins to EINVAL. But this exposed a bug in the
ohci_hcd_at91_overcurrent_irq function where the gpio was used without being
checked to see if it is valid.
This patches fixed the issue by adding the gpio valid check.
Signed-off-by: Joachim Eastwood <joachim.eastwood@jotron.com> Cc: stable <stable@vger.kernel.org> # [3.4+] whereever 6fffb77c went Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Thu, 20 Sep 2012 13:28:25 +0000 (09:28 -0400)]
um: kill thread->forking
we only use that to tell copy_thread() done by syscall from that
done by kernel_thread(). However, it's easier to do simply by
checking PF_KTHREAD in thread flags.
Merge sys_clone() guts for 32bit and 64bit, while we are at it...
Dave Airlie [Thu, 27 Sep 2012 07:58:53 +0000 (17:58 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
Another spurious dmesg quitening.
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nvc0/fifo: ignore bits in PFIFO_INTR that aren't set in PFIFO_INTR_EN
md/raid10: fix "enough" function for detecting if array is failed.
The 'enough' function is written to work with 'near' arrays only
in that is implicitly assumes that the offset from one 'group' of
devices to the next is the same as the number of copies.
In reality it is the number of 'near' copies.
So change it to make this number explicit.
This bug makes it possible to run arrays without enough drives
present, which is dangerous.
It is appropriate for an -stable kernel, but will almost certainly
need to be modified for some of them.
Cc: stable@vger.kernel.org Reported-by: Jakub Husák <jakub@gooseman.cz> Signed-off-by: NeilBrown <neilb@suse.de>
Andrew Lunn [Mon, 24 Sep 2012 05:54:33 +0000 (07:54 +0200)]
ARM: Orion5x: Fix too small coherent pool.
Some Orion5x devices allocate their coherent buffers from atomic
context. Increase size of atomic coherent pool to make sure such the
allocations won't fail during boot.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Olof Johansson <olof@lixom.net>
Ben Skeggs [Wed, 26 Sep 2012 23:13:43 +0000 (09:13 +1000)]
drm/nvc0/fifo: ignore bits in PFIFO_INTR that aren't set in PFIFO_INTR_EN
PFIFO_INTR = 0x40000000 appears to be a normal case on nvc0/nve0 PFIFO,
the binary driver appears to completely ignore it in its PFIFO interrupt
handler and even masks off the bit (as we do) in PFIFO_INTR_EN at init
time.
The bits still light up in the hardware sometimes though, so lets just
ignore any bits we haven't explicitely requested.
Mike Snitzer [Wed, 26 Sep 2012 22:45:47 +0000 (23:45 +0100)]
dm thin: fix discard support for data devices
The discard limits that get established for a thin-pool or thin device
may be incompatible with the pool's data device. Avoid this by checking
the discard limits of the pool's data device. If an incompatibility is
found then the pool's 'discard passdown' feature is disabled.
Change thin_io_hints to ensure that a thin device always uses the same
queue limits as its pool device.
Introduce requested_pf to track whether or not the table line originally
contained the no_discard_passdown flag and use this directly for table
output. We prepare the correct setting for discard_passdown directly in
bind_control_target (called from pool_io_hints) and store it in
adjusted_pf rather than waiting until we have access to pool->pf in
pool_preresume.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 26 Sep 2012 22:45:46 +0000 (23:45 +0100)]
dm thin: tidy discard support
A little thin discard code refactoring to make the next patch (dm thin:
fix discard support for data devices) more readable.
Pull out a couple of functions (and uses bools instead of unsigned for
features).
No functional changes.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 26 Sep 2012 22:45:45 +0000 (23:45 +0100)]
dm: retain table limits when swapping to new table with no devices
Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices. This is to reduce the chance that requests not
respecting the hardware limits will reach the device.
DM recalculates queue limits based only on devices which currently exist
in the table. This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath. DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored. The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath. This became a much bigger issue after v3.6 commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").
Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Wed, 26 Sep 2012 22:45:43 +0000 (23:45 +0100)]
dm table: clear add_random unless all devices have it set
Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
have it set. Otherwise devices with predictable characteristics may
contribute entropy.
QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
contribute to the random pool.
For bio-based targets this flag is always 0 because such devices have no
real queue.
For request-based devices this flag was always set to 1 by default.
Now set it according to the flags on underlying devices. If there is at
least one device which should not contribute, set the flag to zero: If a
device, such as fast SSD storage, is not suitable for supplying entropy,
a request-based queue stacked over it will not be either.
Because the checking logic is exactly same as for the rotational flag,
share the iteration function with device_is_nonrot().
Signed-off-by: Milan Broz <mbroz@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 26 Sep 2012 22:45:42 +0000 (23:45 +0100)]
dm: handle requests beyond end of device instead of using BUG_ON
The access beyond the end of device BUG_ON that was introduced to
dm_request_fn via commit 29e4013de7ad950280e4b2208 ("dm: implement
REQ_FLUSH/FUA support for request-based dm") was an overly
drastic (but simple) response to this situation.
I have received a report that this BUG_ON was hit and now think
it would be better to use dm_kill_unmapped_request() to fail the clone
and original request with -EIO.
map_request() will assign the valid target returned by
dm_table_find_target to tio->ti. But when the target
isn't valid tio->ti is never assigned (because map_request isn't
called); so add a check for tio->ti != NULL to dm_done().
Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 26 Sep 2012 22:45:41 +0000 (23:45 +0100)]
dm mpath: only retry ioctl when no paths if queue_if_no_path set
When there are no paths and multipath receives an ioctl, it waits until
a path becomes available. This behaviour is incorrect if the
"queue_if_no_path" setting was not specified, as then the ioctl should
be rejected immediately, which this patch now does.
commit 35991652b ("dm mpath: allow ioctls to trigger pg init") should
have checked if queue_if_no_path was configured before queueing IO.
Checking for the queue_if_no_path feature, like is done in map_io(),
allows the following table load to work without blocking in the
multipath_ioctl retry loop:
Mike Snitzer [Wed, 26 Sep 2012 22:45:39 +0000 (23:45 +0100)]
dm thin: do not set discard_zeroes_data
The dm thin pool target claims to support the zeroing of discarded
data areas. This turns out to be incorrect when processing discards
that do not exactly cover a complete number of blocks, so the target
must always set discard_zeroes_data_unsupported.
The thin pool target will zero blocks when they are allocated if the
skip_block_zeroing feature is not specified. The block layer
may send a discard that only partly covers a block. If a thin pool
block is partially discarded then there is no guarantee that the
discarded data will get zeroed before it is accessed again.
Due to this, thin devices cannot claim discards will always zero data.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x arch fixes from Mark Salter:
- Add __NR_kcmp to generic syscall list
- C6X: Use generic asm/barrier.h
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
syscalls: add __NR_kcmp syscall to generic unistd.h
c6x: use asm-generic/barrier.h
Mark Salter [Mon, 24 Sep 2012 19:19:26 +0000 (15:19 -0400)]
syscalls: add __NR_kcmp syscall to generic unistd.h
Commit d97b46a64 ("syscalls, x86: add __NR_kcmp syscall" ) added a new
syscall to support checkpoint restore. It is currently x86-only, but
that restriction will be removed in a subsequent patch. Unfortunately,
the kernel checksyscalls script had a bug which suppressed any warning
to other architectures that the kcmp syscall was not implemented. A
patch to checksyscalls is being tested in linux-next and other
architectures are seeing warnings about kcmp being unimplemented.
This patch adds __NR_kcmp to <asm-generic/unistd.h> so that kcmp is
wired in for architectures using the generic syscall list.
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
Dan Carpenter [Sun, 23 Sep 2012 16:33:55 +0000 (19:33 +0300)]
vmwgfx: corruption in vmw_event_fence_action_create()
We don't allocate enough data for this struct. As soon as we start
modifying event->event on the next lines, then we're going beyond the
end of the memory we allocated.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@gmail.com>
Dave Airlie [Wed, 26 Sep 2012 08:36:55 +0000 (18:36 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
These just silence some printks that we are seeing that we shouldn't
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nvc0/ltcg: mask off intr 0x10
drm/nouveau: silence a debug message triggered by newer userspace
1) Eric Dumazet discovered and fixed what turned out to be a family of
bugs. These functions were using pskb_may_pull() which might need
to reallocate the linear SKB data buffer, but the callers were not
expecting this possibility. The callers have cached pointers to the
packet header areas, and would need to reload them if we were to
continue using pskb_may_pull().
So they could end up reading garbage.
It's easier to just change these RAW4/RAW6/MIP6 routines to use
skb_header_pointer() instead of pskb_may_pull(), which won't modify
the linear SKB data area.
2) Dave Jone's syscall spammer caught a case where a non-TCP socket can
call down into the TCP keepalive code. The case basically involves
creating a raw socket with sk_protocol == IPPROTO_TCP, then calling
setsockopt(sock_fd, SO_KEEPALIVE, ...)
Fixed by Eric Dumazet.
3) Bluetooth devices do not get configured properly while being powered
on, resulting in always using legacy pairing instead of SSP. Fix
from Andrzej Kaczmarek.
4) Bluetooth cancels delayed work erroneously, put stricter checks in
place. From Andrei Emeltchenko.
5) Fix deadlock between cfg80211_mutex and reg_regdb_search_mutex in
cfg80211, from Luis R. Rodriguez.
6) Fix interrupt double release in iwlwifi, from Emmanuel Grumbach.
7) Missing module license in bcm87xx driver, from Peter Huewe.
8) Team driver can lose port changed events when adding devices to a
team, fix from Jiri Pirko.
9) Fix endless loop when trying ot unregister PPPOE device in zombie
state, from Xiaodong Xu.
10) batman-adv layer needs to set MAC address of software device
earlier, otherwise we call tt_local_add with it uninitialized.
11) Fix handling of KSZ8021 PHYs, it's matched currently by KS8051 but
that doesn't program the device properly. From Marek Vasut.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
ipv6: mip6: fix mip6_mh_filter()
ipv6: raw: fix icmpv6_filter()
net: guard tcp_set_keepalive() to tcp sockets
phy/micrel: Add missing header to micrel_phy.h
phy/micrel: Rename KS80xx to KSZ80xx
phy/micrel: Implement support for KSZ8021
batman-adv: Fix symmetry check / route flapping in multi interface setups
batman-adv: Fix change mac address of soft iface.
pppoe: drop PPPOX_ZOMBIEs in pppoe_release
team: send port changed when added
ipv4: raw: fix icmp_filter()
net/phy/bcm87xx: Add MODULE_LICENSE("GPL") to GPL driver
iwlwifi: don't double free the interrupt in failure path
cfg80211: fix possible circular lock on reg_regdb_search()
Bluetooth: Fix not removing power_off delayed work
Bluetooth: Fix freeing uninitialized delayed works
Bluetooth: mgmt: Fix enabling LE while powered off
Bluetooth: mgmt: Fix enabling SSP while powered off
Merge misc fixes from Andrew Morton:
"One maintainer change and three bugfixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (4 commits)
c/r: prctl: fix build error for no-MMU case
lib/flex_proportions.c: fix corruption of denominator in flexible proportions
checksyscalls: fix "here document" handling
pwm-backlight: take over maintenance
Mark Salter [Tue, 25 Sep 2012 00:17:38 +0000 (17:17 -0700)]
c/r: prctl: fix build error for no-MMU case
Commit 1ad75b9e1628 ("c/r: prctl: add minimal address test to
PR_SET_MM") added some address checking to prctl_set_mm() used by
checkpoint-restore. This causes a build error for no-MMU systems:
kernel/sys.c: In function 'prctl_set_mm':
kernel/sys.c:1868:34: error: 'mmap_min_addr' undeclared (first use in this function)
The test for mmap_min_addr doesn't make a lot of sense for no-MMU code
as noted in commit 6e1415467614 ("NOMMU: Optimise away the
{dac_,}mmap_min_addr tests").
This patch defines mmap_min_addr as 0UL in the no-MMU case so that the
compiler will optimize away tests for "addr < mmap_min_addr".
Signed-off-by: Mark Salter <msalter@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: <stable@vger.kernel.org> [3.6.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 25 Sep 2012 00:17:35 +0000 (17:17 -0700)]
lib/flex_proportions.c: fix corruption of denominator in flexible proportions
When racing with CPU hotplug, percpu_counter_sum() can return negative
values for the number of observed events.
This confuses fprop_new_period(), which uses unsigned type and as a
result number of events is set to big *positive* number. From that
moment on, things go pear shaped and can result e.g. in division by
zero as denominator is later truncated to 32-bits.
This bug causes a divide-by-zero oops in bdi_dirty_limit() in Borislav's
3.6.0-rc6 based kernel.
Fix the issue by using a signed type in fprop_new_period(). That makes
us bail out from the function without doing anything (mistakenly)
thinking there are no events to age. That makes aging somewhat
inaccurate but getting accurate data would be rather hard.
Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Borislav Petkov <bp@amd64.org> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"echo" doesn't read from stdin, therefore the checksyscalls script didn't
warn about not implemented system calls anymore since 29dc54c6
("checksyscalls: Use arch/x86/syscalls/syscall_32.tbl as source").
Use "cat" instead of "echo" which handles this correctly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sb_edac: Avoid overflow errors at memory size calculation
Sandy bridge EDAC is calculating the memory size with overflow.
Basically, the size field and the integer calculation is using 32 bits.
More bits are needed, when the DIMM memories have high density.
The net result is that memories are improperly reported there, when
high-density DIMMs are used:
EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
As the number of pages value is handled at the EDAC core as unsigned
ints, the driver shows the 16 GB memories at sysfs interface as 16760832
MB! The fix is simple: calculate the number of pages as unsigned 64-bits
integer.
After the patch, the memory size (16 GB) is properly detected:
EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800
Paul Mundt [Tue, 25 Sep 2012 02:51:05 +0000 (11:51 +0900)]
sh: pfc: Fix up GPIO mux type reconfig case.
Some drivers need to switch pin states between GPIO and pin function at
runtime, which was inadvertently broken in the pinctrl driver for GPIOs
being bound to a specific direction.
This fixes up the request path to ensure that previously configured GPIOs
don't cause us to inadvertently error out with an unsupported mux on
reconfig, which in practice is primarily aimed at trapping pull-up/down
users that have yet to be implemented under the new API.
Fixes up regressions in the TPU PWM driver, amongst others.
David S. Miller [Tue, 25 Sep 2012 02:00:00 +0000 (22:00 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
Please pull this last(?) batch of fixes intended for 3.6...
For the Bluetooth bits, Gustavo says this:
"Here goes probably my last update to 3.6. It includes the two patches
you were ok last week(from Andrzej Kaczmarek), those are critical
ones, and two other fixes one for a system crash and the other for
a missing lockdep annotation."
The referenced fixes from Andrzej prevent attempts to configure devices
that are powered-off.
Along with the Bluetooth fixes, there are a couple of 802.11 fixes.
Emmanuel Grumbach gives us an iwlwifi fix to prevent releasing an
interrupt twice. Luis R. Rodriguez provides a fix for a possible
circular lock dependency in the cfg80211 regulatory enforcement code.
All of these have been in linux-next for a few days. I hope they are
not too late to make the 3.6 release!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull tile gxio ABI fix from Chris Metcalf:
"This fixes a last-minute change in the Tilera hypervisor ABI for TRIO
(PCI root complex) support. We've locked in this ABI going forward
and will make sure no further ABI changes like this occur."
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: gxio iorpc numbering change for TRIO interface
Merge tag 'stable/for-linus-3.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull a Xen fix from Konrad Rzeszutek Wilk:
"It is a bug-fix when we run the initial PV guest on a AMD K8 machine
and have CONFIG_AMD_NUMA enabled and detect the NUMA topology from the
Northbridge.
We end up in the situation where the initial domain gets too much
information and gets confused and crashes - the fix is to restrict the
domain to get the information - and we do it by just disabling NUMA on
the PV guest (the hypervisor is still able to do its proper NUMA
allocations of guests).
It is OK to disable the PV guest from accessing NUMA data as right now
we do not inject any NUMA node information to the PV guests. When we
do get to that point, then this patch will have to be reverted."
* Disable PV NUMA support as we do not do anything with it (yet) and it
can cause bootup crashes on certain AMD machines.
* tag 'stable/for-linus-3.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/boot: Disable NUMA for PV guests.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two ceph fixes from Sage Weil:
"The first fixes a leak in the rbd setup error path, and the second
fixes a more serious problem with mismatched kmap/kunmap that surfaced
after the recent refactoring work."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: only kunmap kmapped pages
rbd: drop dev reference on error in rbd_open()
Roland Stigge [Thu, 20 Sep 2012 08:48:03 +0000 (10:48 +0200)]
gpio-lpc32xx: Fix value handling of gpio_direction_output()
For GPIOs of gpio-lpc32xx, gpio_direction_output() ignores the value argument
(initial value of output). This patch fixes this by setting the level
accordingly.
Cc: stable@kernel.org Signed-off-by: Roland Stigge <stigge@antcom.de> Acked-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Marek Vasut [Sun, 23 Sep 2012 16:58:51 +0000 (16:58 +0000)]
phy/micrel: Add missing header to micrel_phy.h
The license header was missing in micrel_phy.h . This patch adds
one.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: David J. Choi <david.choi@micrel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Sun, 23 Sep 2012 16:58:50 +0000 (16:58 +0000)]
phy/micrel: Rename KS80xx to KSZ80xx
There is no such part as KS8001, KS8041 or KS8051. There are only
KSZ8001, KSZ8041 and KSZ8051. Rename these parts as such to match
the Micrel naming.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: David J. Choi <david.choi@micrel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Cc: Linux ARM kernel <linux-arm-kernel@lists.infradead.org> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Sun, 23 Sep 2012 16:58:49 +0000 (16:58 +0000)]
phy/micrel: Implement support for KSZ8021
The KSZ8021 PHY was previously caught by KS8051, which is not correct.
This PHY needs additional setup if it is strapped for address 0. In such
case an reserved bit must be written in the 0x16, "Operation Mode Strap
Override" register. According to the KS8051 datasheet, that bit means
"PHY Address 0 in non-broadcast" and it indeed behaves as such on KSZ8021.
The issue where the ethernet controller (Freescale FEC) did not communicate
with network is fixed by writing this bit as 1.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: David J. Choi <david.choi@micrel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Metcalf [Mon, 24 Sep 2012 18:57:58 +0000 (14:57 -0400)]
tile: gxio iorpc numbering change for TRIO interface
An ABI numbering change was made in the hypervisor for Tilera's 4.1
MDE release (just shipped). It's incompatible with the previous 4.0
release ABI numbering, so we track the new numbering going forward.
We plan to avoid modifying ABI numbering for these interfaces again.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Mark Salter [Fri, 21 Sep 2012 18:35:49 +0000 (14:35 -0400)]
c6x: use asm-generic/barrier.h
A recent patch in the linux-next tree caused a build failure on
C6X because C6X didn't define a read_barrier_depends() macro. C6X
does not support SMP and the architecture doesn't provide any
special memory ordering instructions, so it makes sense to just
use the generic barrier.h rather than patching the existing c6x
specific header.
The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.
In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:
"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.
This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
so we just disable NUMA scanning by setting numa_off=1.
CC: stable@vger.kernel.org Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
GFS2: Write out dirty inode metadata in delayed deletes
If a dirty GFS2 inode was being deleted but was in use by another node, its
metadata was not getting written out before GFS2 checked for dirty buffers in
gfs2_ail_flush(). GFS2 was relying on inode_go_sync() to write out the
metadata when the other node tried to free the file, but it failed the error
check before it got that far. This patch writes out the metadata before calling
gfs2_ail_flush()
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Eric Sandeen [Tue, 18 Sep 2012 02:50:31 +0000 (21:50 -0500)]
GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
gfs2_ail_empty_gl() contains an "inline version" of gfs2_trans_begin(),
so it needs an explicit sb_start_intwrite() as well, to balance the
sb_end_intwrite() which will be called by gfs2_trans_end().
With this, xfstest 068 passes on lock_nolock local gfs2.
Without it, we reach a writer count of -1 and get stuck.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 12 Sep 2012 13:40:31 +0000 (09:40 -0400)]
GFS2: Fix infinite loop in rbm_find
This patch fixes an infinite loop in gfs2_rbm_find that was introduced
by the previous patch. The problem occurred when the length was less
than 3 but the rbm block was byte-aligned, causing it to improperly
return a extent length of zero, which caused it to spin.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Barry Marson <bmarson@redhat.com>
With the recently added block reservation code, an additional function
was added to search for free blocks. This had a restriction of only being
able to search for aligned extents of free blocks. As a result the
allocation patterns when reserving blocks were suboptimal when the
existing allocation of blocks for an inode was not aligned to the same
boundary.
This patch resolves that problem by adding the ability for gfs2_rbm_find
to search for extents of a particular minimum size. We can then use
gfs2_rbm_find for both looking for reservations, and also looking for
free blocks on an individual basis when we actually come to do the
allocation later on. As a result we only need a single set of code
to deal with both situations.
The function gfs2_rbm_from_block() is moved up rgrp.c so that it
occurs before all of its callers.
Many thanks are due to Bob for helping track down the final issue in
this patch. That fix to the rb_tree traversal and to not share
block reservations from a dirctory to its children is included here.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Jan Kara [Wed, 5 Sep 2012 20:55:11 +0000 (16:55 -0400)]
GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2 uses i_mutex on its system quota inode to synchronize writes to
quota file. Since this is an internal inode to GFS2 (not part of directory
hiearchy or visible by user) we are safe to define locking rules for it. So
let's just get it its own locking class to make it clear.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Tue, 28 Aug 2012 12:45:56 +0000 (08:45 -0400)]
GFS2: Stop block extents at the end of bitmaps
This patch stops multiple block allocations if a nonzero
return code is received from gfs2_rbm_from_block. Without
this patch, if enough pressure is put on the file system,
you get a kernel warning quickly followed by:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2]
With this patch, things run normally.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Fix unclaimed_blocks() wrapping bug and clean up
When rgd->rd_free_clone is less than rgd->rd_reserved, the
unclaimed_blocks() calculation would wrap and produce
incorrect results. This patch checks for this condition
when this function is called from gfs2_mblk_search()
In addition, the use of this particular function in other
places in the code has been dropped by means of a general
clean up of gfs2_inplace_reserve(). This function is now
much easier to follow.
Also the setting of the rgd->rd_last_alloc field is corrected.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch improves the tracing of block reservations by
removing some corner cases and also providing more useful
detail in the traces.
A new field is added to the reservation structure to contain
the inode number. This is used since in certain contexts it is
not possible to access the inode itself to obtain this information.
As a result we can then display the inode number for all tracepoints
and also in case we dump the resource group.
The "del" tracepoint operation has been removed. This could be called
with the reservation rgrp set to NULL. That resulted in not printing
the device number, and thus making the information largely useless
anyway. Also, the conditional on the rgrp being NULL can then be
removed from the tracepoint. After this change, all the block
reservation tracepoint calls will be called with the rgrp information.
The existing ins,clm and tdel calls to the block reservation tracepoint
are sufficient to track the entire life of the block reservation.
In gfs2_block_alloc() the error detection is updated to print out
the inode number of the problematic inode. This can then be compared
against the information in the glock dump,tracepoints, etc.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Fall back to ignoring reservations, if there are no other blocks left
When we get to the stage of allocating blocks, we know that the
resource group in question must contain enough free blocks, otherwise
gfs2_inplace_reserve() would have failed. So if we are left with only
free blocks which are reserved, then we must use those. This can happen
if another node has sneeked in and use some blocks reserved on this
node, for example. Generally this will happen very rarely and only
when the resouce group is nearly full.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>