Mike Christie [Mon, 5 Dec 2011 22:44:01 +0000 (16:44 -0600)]
[SCSI] iscsi class: export pid of process that created
There could be multiple userspace entities creating/destroying/
recoverying sessions and also the kernel's iscsi drivers could
be doing this too. If the userspace apps do try to manage the kernel
ones it can get the driver/fw out of sync and cause the user to
loose the root disk, oopses or ping ponging becasue userspace
wants to do one thing but the kernel manager thought we
are trying to do another.
This patch fixes the problem by just exporting the pid of
the entity that created the session. Userspace programs like
iscsid, iscsiadm, iscsistart, qlogic's tools, etc, can then
figure out which sessions they own and only manage them.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Commit 921cd8024b90 ("[SCSI] mpt2sas: New feature - Fast Load
Support") moved handling of the diag_buffer_enable module parameter
from mpt2sas_base.c to mpt2sas_scsih.c, but it left an old copy of the
parameter in mpt2sas_base.c. Remove the unused stub.
Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
However, on a big machine, ioc->cpu_count could be outside the range
that fits in a u8; eg a system with 256 CPUs will end up
reply_queue_count set to 0.
Fix this by calculating the minimum as ints and then letting the
assignment to reply_queue_count handle integer demotion.
Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Roland Dreier [Thu, 1 Dec 2011 01:14:22 +0000 (17:14 -0800)]
[SCSI] mpt2sas: Fix leak on mpt2sas_base_attach() error path
Commit 911ae9434f83 ("[SCSI] mpt2sas: Added NUNA IO support in driver
which uses multi-reply queue support of the HBA") added new
allocations to the beginning of mpt2sas_base_attach(), which means
directly returning an error on failure of mpt2sas_base_map_resources()
will leak those allocations.
Fix this by doing "goto out_free_resources" in this place too, as the
rest of the function does.
Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] mpt2sas : Fix for memory allocation error for large host credits
The amount of memory required for tracking chain buffers is rather
large, and when the host credit count is big, memory allocation
failure occurs inside __get_free_pages.
The fix is to limit the number of chains to 100,000. In addition,
the number of host credits is limited to 30,000 IOs. However this
limitation can be overridden this using the command line option
max_queue_depth. The algorithm for calculating the
reply_post_queue_depth is changed so that it is equal to
(reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] mpt2sas: Do not retry a timed out direct IO for warpdrive
When an I/O request to a WarpDrive is timed out by SML and if the
I/O request to the WarpDrive is sent as direct I/O then the aborted
direct I/O will be retried as normal Volume I/O and which results
in failure of Target Reset and results in host reset.
The fix is to not retry a failed IO to volume when the original
IO was sent as direct IO with an ioc status
MPI2_IOCSTATUS_SCSI_TASK_TERMINATED.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] mpt2sas: Release spinlock for the raid device list before blocking it
Added code to release the spinlock that is used to protect the
raid device list before calling a function that can block. The
blocking was causing a reschedule, and subsequently it is tried
to acquire the same lock, resulting in a panic (NMI Watchdog
detecting a CPU lockup).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
1)Removed Power Management Control option for PCIe link.
2)Added RAID Action for performing a compatibility check. Added
product-specific range to RAID Action values.
3)Added PhysicalPort field to SAS Device Status Change Event data.
4)Added SpinupFlags field containing a Disable Spin-up bit to the
SpinupGroupParameters fields of SAS IO Unit Page 4.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] mpt2sas: Do not set sas_device->starget to NULL from the slave_destroy callback when all the LUNS have been deleted
If the sas_device->starget to NULL from slave_destroy callback for LUN=1
even though LUN=0 exist, results in entire target getting deleted.
To resolve the issue, the driver should only set sas_device->starget to
NULL when all the LUNS have been deleted from the slave_destroy.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] mpt2sas: Better handling DEAD IOC (PCI-E LInk down) error condition
Detection of Dead IOC has been done in fault_reset_work thread.
If IOC Doorbell is 0xFFFFFFFF, it will be detected as non-operation/DEAD IOC.
When a DEAD IOC is detected, the code is modified to remove that IOC and
all its attached devices from OS.
The PCI layer API pci_remove_bus_device() is called to remove the dead IOC.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Carpenter [Mon, 26 Sep 2011 06:23:37 +0000 (09:23 +0300)]
[SCSI] be2iscsi: cleanup a min_t() call
"sense_len" was declared as int type but actually it only stores a
u16 value that comes from hardware. The cast to u16 in min_t()
confuses static analysis because it truncates the int to u16 so I've
fixed the declaration to reflect that "sense_len" is just a u16.
Also there was a call to cpu_to_be16() which I've changed to
be16_to_cpu(). The functions are equivalent, but obviously the
hardware is big endian and we're doing the min_t() comparison on CPU
endian values.
This whole patch is just a cleanup and doesn't affect how the code
works.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Jayamohan Kallickal <Jayamohan.kallickal@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] hpsa: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler
IRQF_SHARED is required for older controllers that don't support MSI(X)
and which may end up sharing an interrupt. All the controllers hpsa
normally supports have MSI(X) capability, but older controllers may be
encountered via the hpsa_allow_any=1 module parameter.
Also remove deprecated IRQF_DISABLED.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Ferenc Wagner [Thu, 25 Aug 2011 12:44:57 +0000 (14:44 +0200)]
[SCSI] fusion: ensure NUL-termination of MptCallbacksName elements
Using strlcpy instead of memcpy makes the strlen() calls superfluous
and also ensures zero-termination of the copied string.
At the same time get rid of the magic constant 50.
Actually, I can't see why copying the callback name is necessary
instead of simply storing a pointer to it (just like to the callback
function itself), but that goes beyond the scope of this fix.
Signed-off-by: Ferenc Wagner <wferi@niif.hu> Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Andrew Vasquez [Fri, 18 Nov 2011 17:03:20 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Ensure there's enough request-queue space for passthru IOCBs.
The driver should ensure there's a sufficient number of IOCBs
to satisfy the number of scatter-gather entries specified in the
command. Add a 'count' to the control structure, srb_ctx, to use
in qla2x00_alloc_iocbs().
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Joe Carnuccio [Fri, 18 Nov 2011 17:03:13 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Corrections to returned sysfs error codes.
Correct the erroneous return codes introduced by the following patch:
"Return sysfs error codes appropriate to conditions".
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Andrew Vasquez [Fri, 18 Nov 2011 17:03:10 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Limit excessive DPC cycles.
The 'continue' cases neglected to place the thread in an
interruptible state, causing the DPC routine to wake immediately.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Andrew Vasquez [Fri, 18 Nov 2011 17:03:09 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Only read requested mailbox registers.
When reading the incoming mailbox registers, read only the specified ones.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] qla2xxx: Proper cleanup of pass through commands when firmware returns error.
[jejb: fixed up checkpatch and casting errors] Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Chad Dupuis [Fri, 18 Nov 2011 17:03:07 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Update to dynamic logging.
This patch contains minor fixes to our new logging infrastructure:
- Remove extranous messages.
- Re-add 'nexus' and 'hdl' information.
- Adjusted the message ids to fill up the holes.
- Display FCP_CMND priority on update.
- Log only mail box error conditions.
- Do not print "Firmware ready **** FAILED ****" if cable is unplugged.
- Drop noisy 'fw_state...curr time...' message.
- Correct nexus display during abort.
- Add a special case error-logging set to '1'.
- Catagorize I/O exception display handling.
- Correct the bsg msg code printing.
- Dont use dynamic logging after host is removed.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Joe Perches [Fri, 18 Nov 2011 17:03:05 +0000 (09:03 -0800)]
[SCSI] qla2xxx: Use less stack to emit logging messages.
Return early when logging level too low.
Use normal kernel codeing style for function braces.
Add const where appropriate to logging functions.
Remove now unused #define QL_DBG_BUF_LEN.
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Jing Huang [Wed, 16 Nov 2011 20:29:26 +0000 (12:29 -0800)]
[SCSI] bfa: fix endian and bit field check bug
Fix some endian issue. __BIGENDIAN is not defined and it needs to be
replaced with __BIG_ENDIAN. Also fixed a bug in bit field access.
These two issues were reported by Dan Carpenter.
Signed-off-by: Jing Huang <huangj@brocade.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Moger, Babu [Thu, 1 Dec 2011 20:01:40 +0000 (15:01 -0500)]
[SCSI] scsi_dh_rdac: Adding the match function for rdac device handler
This patch introduces the match function for rdac device handler. Without
this, sometimes handler attach fails during the device_add. Included check for
TPGS bit before proceeding further. The match function was introduced by
commit 6c3633d08acf514e2e89aa95d2346ce9d64d719a
Signed-off-by: Babu Moger <babu.moger@netapp.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Moger, Babu [Thu, 1 Dec 2011 20:01:28 +0000 (15:01 -0500)]
[SCSI] scsi_dh_hp_sw: Adding the match function for hp_sw device handler
This patch introduces the match function for hp_sw device handler. Included
the check for TPGS bit before proceeding further per Hannes comment. The
match function was introduced by commit 6c3633d08acf514e2e89aa95d2346ce9d64d719a
Signed-off-by: Babu Moger <babu.moger@netapp.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Moger, Babu [Thu, 1 Dec 2011 20:01:16 +0000 (15:01 -0500)]
[SCSI] scsi_dh_emc: Add a match function for emc device handler
This patch introduces the match function for emc device handler. Included
check for TPGS bit before proceeding further. The match function was
introduced by commit 6c3633d08acf514e2e89aa95d2346ce9d64d719a
Signed-off-by: Babu Moger <babu.moger@netapp.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Eddie Wai [Wed, 7 Dec 2011 06:41:21 +0000 (22:41 -0800)]
[SCSI] bnx2i: Fixed kernel panic caused by unprotected task->sc->request deref
During session recovery, the conn_stop call will trigger a flush
to all outstanding SCSI cmds in the xmit queue. This will set
all outstanding task->sc to NULL prior to the session_teardown
call which frees the task memory.
In the bnx2i SCSI response processing path, only the task was being checked
for NULL under the session lock before the task->sc->request dereferencing.
If there are outstanding SCSI cmd responses pending for process, the
following kernel panic can be exposed where task->sc was found to be NULL.
Tomas Henzl [Fri, 2 Dec 2011 03:38:42 +0000 (21:38 -0600)]
[SCSI] qla4xxx: a small loop fix
When the qla4xxx_get_fwddb_entry returns QLA_ERROR
the nex_idx is not updated,
for (idx = 0; idx < max_ddbs; idx = next_idx) {
ret = qla4xxx_get_fwddb_entry(ha, idx, NULL, 0, NULL,
&next_idx, &state, &conn_err,
NULL, NULL);
if (ret == QLA_ERROR)
continue;
This means there is a risk that the 'idx < max_ddbs' condition will never
met and the loop will loop forever.
Fix this by explicitly increasing the next_idx in the error condition.
Maybe a break instead of continue is more appropriate, leaving the decision
on the qlogic maintainer.
Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Mike Christie [Fri, 2 Dec 2011 03:38:41 +0000 (21:38 -0600)]
[SCSI] qla4xxx: fix flash/ddb support
With open-iscsi support, target entries persisted in the FLASH were not
login. Added support in the qla4xxx driver to do the login on probe
time to the target entries saved in the FLASH by user.
With this changes upgrade to the new kernel with open-iscsi support in
qla4xxx will ensure users original target entries login on driver load
Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com> Signed-off-by: Ravi Anand <ravi.anand@qlogic.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Steffen Maier [Fri, 18 Nov 2011 19:00:40 +0000 (20:00 +0100)]
[SCSI] zfcp: return early from slave_destroy if slave_alloc returned early
zfcp_scsi_slave_destroy erroneously always tried to finish its task
even if the corresponding previous zfcp_scsi_slave_alloc returned
early. This can lead to kernel page faults on accessing uninitialized
fields of struct zfcp_scsi_dev in zfcp_erp_lun_shutdown_wait. Take the
port field of the struct to determine if slave_alloc returned early.
This zfcp bug is exposed by 4e6c82b (in turn fixing f7c9c6b to be
compatible with 21208ae) which can call slave_destroy for a
corresponding previous slave_alloc that did not finish.
This patch is based on James Bottomley's fix suggestion in
http://www.spinics.net/lists/linux-scsi/msg55449.html.
Thomas Gleixner [Fri, 11 Nov 2011 19:52:01 +0000 (20:52 +0100)]
[SCSI] fcoe: Fix preempt count leak in fcoe_filter_frames()
The error exit path leaks preempt count. Add the missing put_cpu().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yi Zou <yi.zou@intel.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Andrew Vasquez [Fri, 18 Nov 2011 17:02:15 +0000 (09:02 -0800)]
[SCSI] qla2xxx: Return the correct value for a mailbox command if 82xx is in reset recovery.
We need to return QLA_FUNCTION_TIMEOUT immediately otherwise we mess up the
mailbox command state machine.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This function can wait for 5min under certain scenarios. One of them is when
the port is down from switch and bus reset is issued. The bus reset used to
wait for 5 minutes for the loop and upper layer callers used to hang and give
stack trace because of getting stuck for 120 sec. It is legacy code that was
used when the driver used to do queuing of the commands.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Linus Torvalds [Fri, 9 Dec 2011 22:45:44 +0000 (14:45 -0800)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
cifs: check for NULL last_entry before calling cifs_save_resume_key
cifs: attempt to freeze while looping on a receive attempt
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Linus Torvalds [Fri, 9 Dec 2011 22:45:12 +0000 (14:45 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Calling __pa() with an ioremap()ed address is invalid
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
x86/intel_mid: Kconfig select fix
x86/intel_mid: Fix the Kconfig for MID selection
Linus Torvalds [Fri, 9 Dec 2011 22:41:50 +0000 (14:41 -0800)]
Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6
* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
spi/gpio: fix section mismatch warning
spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
spi/nuc900: Include linux/module.h
spi/ath79: fix compile error due to missing include
Linus Torvalds [Fri, 9 Dec 2011 16:18:08 +0000 (08:18 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: raid5 crash during degradation
md/raid5: never wait for bad-block acks on failed device.
md: ensure new badblocks are handled promptly.
md: bad blocks shouldn't cause a Blocked status on a Faulty device.
md: take a reference to mddev during sysfs access.
md: refine interpretation of "hold_active == UNTIL_IOCTL".
md/lock: ensure updates to page_attrs are properly locked.
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: use new generic {enable,disable}_percpu_irq() routines
drivers/net/ethernet/tile: use skb_frag_page() API
asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
arch/tile: fix double-free bug in homecache_free_pages()
arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.
Linus Torvalds [Fri, 9 Dec 2011 16:07:42 +0000 (08:07 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix lost speaker volume controls
ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
ALSA: hda/realtek - Don't create extra controls with channel suffix
ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
ASoC: Provide a more complete DMA driver stub
ASoC: Remove references to corgi and spitz from machine driver document
ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
ASoC: uda1380: Return proper error in uda1380_modinit failure path
ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
ASoC: Mark WM8994 ADC muxes as virtual
ALSA: hda/realtek - Fix Oops in alc_mux_select()
ALSA: sis7019 - give slow codecs more time to reset
Linus Torvalds [Fri, 9 Dec 2011 16:07:24 +0000 (08:07 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do no try to schedule task events if there are none
lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
perf header: Use event_name() to get an event name
perf stat: Failure with "Operation not supported"
Modify initialization of PCIe capability registers in Tsi721 mport driver:
- change Completion Timeout value to avoid unexpected data transfer
aborts during intensive traffic.
- replace hardcoded offset of PCIe capability block by making it use the
common function.
This patch is applicable to kernel versions starting from 3.2-rc1.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).
This patch is applicable to kernel versions staring from 3.2-rc1.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 8 Dec 2011 22:34:32 +0000 (14:34 -0800)]
procfs: do not overflow get_{idle,iowait}_time for nohz
Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless. We rely on get_{idle,iowait}_time functions to retrieve
proper data.
These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.
When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int. Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.
This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.
Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).
Signed-off-by: Michal Hocko <mhocko@suse.cz> Tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 8 Dec 2011 22:34:30 +0000 (14:34 -0800)]
mm: vmalloc: check for page allocation failure before vmlist insertion
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().
This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.
Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.
Michal Hocko [Thu, 8 Dec 2011 22:34:27 +0000 (14:34 -0800)]
mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks
setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:
We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.
The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").
Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Dang Bo <bdang@vmware.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Youquan Song [Thu, 8 Dec 2011 22:34:18 +0000 (14:34 -0800)]
thp: set compound tail page _count to zero
Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.
Youquan Song [Thu, 8 Dec 2011 22:34:16 +0000 (14:34 -0800)]
thp: add compound tail page _mapcount when mapped
With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.
The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.
So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.
This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.
Peter Zijlstra [Thu, 8 Dec 2011 22:34:13 +0000 (14:34 -0800)]
printk: avoid double lock acquire
Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.
Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically. And he will do
more works. Michal Hokko did many bug fixes and know memory cgroup very
well. Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.
Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.
khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.
Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.
Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Fleming [Fri, 18 Nov 2011 13:09:11 +0000 (13:09 +0000)]
x86, efi: Calling __pa() with an ioremap()ed address is invalid
If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.
On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:
BUG: unable to handle kernel paging request at f7f22280
IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
[...]
A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().
Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,
Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.
Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeff Layton [Fri, 2 Dec 2011 01:23:34 +0000 (20:23 -0500)]
cifs: check for NULL last_entry before calling cifs_save_resume_key
Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.
When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.
Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.
Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Adam G. Metzler <adamgmetzler@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>