We need to report the entire jack state to the core jack code, not just
the bits that were being updated by the caller, otherwise the status
reported by other detection methods will be omitted from the state seen
by userspace.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The checksum field in the EEPROM on HPPA is really not a
checksum but a signature (0x16d6). So allow 0x16d6 as the
matching checksum on HPPA systems.
This issue is present on longterm/stable kernels, I have
verified that this patch is applicable back to at least
2.6.32.y kernels.
v2- changed ifdef to use CONFIG_PARISC instead of __hppa__
CC: Guy Martin <gmsoft@tuxicoman.be> CC: Rolf Eike Beer <eike-kernel@sf-tec.de> CC: Matt Turner <mattst88@gmail.com> Reported-by: Mikulas Patocka <mikulas@artax.kerlin.mff.cuni.cz> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ADC calibrations cannot run on 5 GHz with fast clock enabled. They
need to be disabled, otherwise they'll hang and IQ mismatch calibration
will not be run either.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reported-by: Adrian Chadd <adrian@freebsd.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When trying to connect to 5GHz we can provide negative index to
mac80211 what trigger BUG_ON. Reason of iwl-3945-rs malfunction
on 5GHz is unknown and needs further investigation. For now, to
do not trigger a bug, correct value and just print WARNING.
Transitioning to a LOOP_UPDATE loop-state could cause the driver
to miss normal link/target processing. LOOP_UPDATE is a crufty
artifact leftover from at time the driver performed it's own
internal command-queuing. Safely remove this state.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a physical device exposed to the OS by hpsa
is replaced (e.g. one hot plug tape drive is replaced
by another, or a tape drive is placed into "OBDR" mode
in which it acts like a CD-ROM device) and a rescan is
initiated, the replaced device will be added to the
SCSI midlayer with target and lun numbers set to -1.
After that, a panic is likely to ensue. When a physical
device is replaced, the lun and target number should be
preserved.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The test to detect OBDR ("One Button Disaster Recovery")
cd-rom devices was comparing against uninitialized data.
Fixed by moving the test for the device to where the
inquiry data is collected, and uninitialized variable
altogether as it wasn't really being used.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
'struct of_device' no longer exists, and its functionality has been merged
into platform_device. Update the MPC5200 audio DMA driver (mpc5200_dma)
accordingly. This fixes a build break.
Signed-off-by: Timur Tabi <timur@freescale.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The unsolicited frame control infrastructure requires a table of dma
addresses for the hardware to lookup the frame buffer location by an
index. The hardware expects the elements of this table to be 64-bit
quantities, so we cannot reference these elements as dma_addr_t. All
unsolicited frame protocols are affected, particularly SATA-PIO and SMP
which prevented direct-attached SATA drives and expander-attached drives
to not be discovered.
Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A bug (likely copy/paste) that has been carried from the original
implementation. The unsolicited frame handling structure returns the
d2h fis in the isci_request.stp.rsp buffer.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.
Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
[extra comments from author as to why this needs to go to stable:
Earlier for different operation such as open we used the values of open
flag as defined by the OS. But some of these flags such as O_DIRECT are
arch dependent. So if we have the 9p client and server running on
different architectures, we end up with client sending client
architecture value of these open flag and server will try to map these
values to what its architecture states. For ex: O_DIRECT on a x86 client
maps to
Hence we need to map these open flags to OS/arch independent flag
values. Getting these changes to an early version of kernel ensures us
that we work with different combination of client and server. We should
ideally backport this patch to all possible kernel version.]
d_instantiate marks the dentry positive. So a parallel lookup and mkdir of
the directory can find dentry that doesn't have fid attached. This can result
in both the code path doing v9fs_fid_add which results in v9fs_dentry leak.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can only sort the _TSS return package if there is no _PSS
in the same scope. This is because if _PSS is present, the ACPI
specification dictates that the _TSS Power Dissipation field is
to be ignored, and therefore some BIOSs leave garbage values in
the _TSS Power field(s). In this case, it is best to just return
the _TSS package as-is.
Reported-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The value is only set to true but never set back to false,
which causes to many completion-wait commands to be sent to
hardware. Fix it with this patch.
The domain_flush_devices() function takes the domain->lock.
But this function is only called from update_domain() which
itself is already called unter the domain->lock. This causes
a deadlock situation when the dma-address-space of a domain
grows larger than 1GB.
WARNING: drivers/net/irda/smsc-ircc2.o(.devinit.text+0x1a7): Section mismatch in reference from the function smsc_ircc_pnp_probe() to the function .init.text:smsc_ircc_open()
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Without this fix, if any invalid mount options/args are passed while mouting
the 9p fs, no error (-EINVAL) is returned and default arg value is assigned.
This fix returns -EINVAL when an invalid arguement is found while parsing
mount options.
Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This make sure we don't use wrong inode from the inode hash. The inode number
of the file deleted is reused by the next file system object created
and if we only use inode number for inode hash lookup we could end up
with wrong struct inode.
Also compare inode generation number. Not all Linux file system provide
st_gen in userspace. So it could be 0;
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
dvb_usb_device_init calls the frontend_attach method of this driver which
uses vp7045_usb_ob. In order to have a buffer ready in vp7045_usb_op, it has to
be allocated before that happens.
Luckily we can use the whole private data as the buffer as it gets separately
allocated on the heap via kzalloc in dvb_usb_device_init and is thus apt for
use via usb_control_msg.
This fixes a
BUG: unable to handle kernel paging request at 0000000000001e78
reported by Tino Keitel and diagnosed by Dan Carpenter.
The nuvoton-cir driver was storing up consecutive pulse-pulse and
space-space samples internally, for no good reason, since
ir_raw_event_store_with_filter() already merges back to back like
samples types for us. This should also fix a regression introduced late
in 3.0 that related to a timeout change, which actually becomes correct
when coupled with this change. Tested with RC6 and RC5 on my own
nuvoton-cir hardware atop vanilla 3.0.0, after verifying quirky
behavior in 3.0 due to the timeout change.
Reported-by: Stephan Raue <sraue@openelec.tv> CC: Stephan Raue <sraue@openelec.tv> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.
Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.
Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed. The free
is done in a separate thread so it might happen a short time later.
__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.
Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock. This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.
So move the called to bdev_inode_switch_bdi before the call to
->release.
Not cleaning after alloc failure would result in crash on destroy,
because nouveau_sgdma_clear assumes "ttm_alloced" to be not null when
"pages" is not null.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The TNET variant of DaVinci compiles some code that it shares
with other DaVinci variants, however it has a V6 CPU rather than
an ARM926T, thus the hardcoded call to arm926_flush_kern_cache_all()
in sleep.S will obviously fail, and we need to build with the
v6_flush_kern_cache_all() call instead. This was triggered by
manually altering the DaVinci config to build the TNET version.
Cc: Dave Martin <dave.martin@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
DA850/OMAP-L138 EMAC driver uses random mac address instead of
a fixed one because the mac address is not stuffed into EMAC
platform data.
This patch provides a function which reads the mac address
stored in SPI flash (registered as MTD device) and populates the
EMAC platform data. The function which reads the mac address is
registered as a callback which gets called upon addition of MTD
device.
NOTE: In case the MAC address stored in SPI flash is erased, follow
the instructions at [1] to restore it.
Modifications in v2:
Guarded registering the mtd_notifier only when MTD is enabled.
Earlier this was handled using mtd_has_partitions() call, but
this has been removed in Linux v3.0.
Modifications in v3:
a. Guarded da850_evm_m25p80_notify_add() function and
da850evm_spi_notifier structure with CONFIG_MTD macros.
b. Renamed da850_evm_register_mtd_user() function to
da850_evm_setup_mac_addr() and removed the struct mtd_notifier
argument to this function.
c. Passed the da850evm_spi_notifier structure to register_mtd_user()
function.
Modifications in v4:
Moved the da850_evm_setup_mac_addr() function within the first
CONFIG_MTD ifdef construct.
I was intrigued by the fact that the clock stood still on
the Integrator, but it wasn't strange at all, because the
timer was set up all wrong and probably has been for a
while. With this patch the clock starts ticking again:
make the timer periodic (reload), |= on the divisor bit
and load the timer before starting it.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have hit a couple of customer bugs where they would like to
use those parameters to run an UP kernel - but both of those
options turn of important sources of interrupt information so
we end up not being able to boot. The correct way is to
pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and
the kernel will patch itself to be a UP kernel.
If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):
Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.
Reproducer for bug is attached to RHBZ 707552
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the problem in sdhci-s3c host driver for Samsung Soc's.
During the card identification stage the mmc core driver enumerates for
the best bus width in combination with the highest available data rate.
It starts enumerating from the highest bus width (8) to lowest width (1).
In case of few MMC cards the 4-bit bus enumeration fails and tries
the 1-bit bus enumeration. When switched to 1-bit bus mode the host driver
has to clear the previous bus width setting and apply the new setting.
The current patch will clear the previous bus mode and apply the new
mode setting.
Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The default multithread workqueue can cause the same work to be executed
concurrently on a different CPUs. This isn't really suitable for clock
gating as it might already gated the clock and gating it twice results both
host->clk_old and host->ios.clock to be set to 0.
To prevent this from happening we use system_nrt_wq instead.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have seen at least two different races when clock gating kicks in in a
middle of ios structure update.
First one happens when ios->clock is changed outside of aggressive clock
gating framework, for example via mmc_set_clock(). The race might happen
when we run following code:
mmc_set_ios():
...
if (ios->clock > 0)
mmc_set_ungated(host);
Now if gating kicks in right after the condition check we end up setting
host->clk_gated to false even though we have just gated the clock. Next
time a request is started we try to ungate and restore the clock in
mmc_host_clk_hold(). However since we have host->clk_gated set to false the
original clock is not restored.
This eventually will cause the host controller to hang since its clock is
disabled while we are trying to issue a request. For example on Intel
Medfield platform we see:
/*
* Reset ocr mask to be the highest possible voltage supported for
* this mmc host. This value will be used at next power up.
*/
host->ocr = 1 << (fls(host->ocr_avail) - 1);
If the clock gating worker kicks in while we are only partially updated the
ios structure the host controller gets incomplete ios and might not work as
supposed. Again on Intel Medfield platform we get:
> If you think the names of the functions are confusing then
> you may rename them, say like this:
>
> mmc_host_clk_ungate() -> mmc_host_clk_hold()
> mmc_host_clk_gate() -> mmc_host_clk_release()
>
> Which would make the usecases more clear
(This is CC'd to stable@ because the next two patches, which fix
observable races, depend on it.)
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.
Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.
This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.
Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.
To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.
Thomas earlier submitted a fix to limit the RTC PIE freq, but
picked 5000Hz out of the air. Willy noticed that we should
instead use the 8192Hz max from the rtc man documentation.
Cc: Willy Tarreau <w@1wt.eu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.
A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.
CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With zone_reclaim_mode enabled, it's possible for zones to be considered
full in the zonelist_cache so they are skipped in the future. If the
process enters direct reclaim, the ZLC may still consider zones to be full
even after reclaiming pages. Reconsider all zones for allocation if
direct reclaim returns successfully.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There have been a small number of complaints about significant stalls
while copying large amounts of data on NUMA machines reported on a
distribution bugzilla. In these cases, zone_reclaim was enabled by
default due to large NUMA distances. In general, the complaints have not
been about the workload itself unless it was a file server (in which case
the recommendation was disable zone_reclaim).
The stalls are mostly due to significant amounts of time spent scanning
the preferred zone for pages to free. After a failure, it might fallback
to another node (as zonelists are often node-ordered rather than
zone-ordered) but stall quickly again when the next allocation attempt
occurs. In bad cases, each page allocated results in a full scan of the
preferred zone.
Patch 1 checks the preferred zone for recent allocation failure
which is particularly important if zone_reclaim has failed
recently. This avoids rescanning the zone in the near future and
instead falling back to another node. This may hurt node locality
in some cases but a failure to zone_reclaim is more expensive than
a remote access.
Patch 2 clears the zlc information after direct reclaim.
Otherwise, zone_reclaim can mark zones full, direct reclaim can
reclaim enough pages but the zone is still not considered for
allocation.
This was tested on a 24-thread 2-node x86_64 machine. The tests were
focused on large amounts of IO. All tests were bound to the CPUs on
node-0 to avoid disturbances due to processes being scheduled on different
nodes. The kernels tested are
MMTests Statistics: vmstat
Page Ins 46268 63840 66008
Page Outs 908215969067112888043732
Swap Ins 0 0 0
Swap Outs 0 0 0
Direct pages scanned 1309169789668638971790
Kswapd pages scanned 0 18300111831116
Kswapd pages reclaimed 0 18290681829930
Direct pages reclaimed 1303777789568288648314
Kswapd efficiency 100% 99% 99%
Kswapd velocity 0.000 810.643 826.346
Direct efficiency 99% 99% 96%
Direct velocity 5340.128 3972.068 4048.788
Percentage direct scans 100% 83% 83%
Page writes by reclaim 0 3 0
Slabs scanned 796672 720640 720256
Direct inode steals 742266771600127088638
Kswapd inode steals 0 17368402021238
Test completes far faster with a large increase in the number of files
created per second. Standard deviation is high as a small number of
iterations were much higher than the mean. The number of pages scanned by
zone_reclaim is reduced and kswapd is used for more work.
LARGE DD
3.0-rc6 3.0-rc6 3.0-rc6
vanilla zlcfirst zlcreconsider
download tar 59 ( 0.00%) 59 ( 0.00%) 55 ( 7.27%)
dd source files 527 ( 0.00%) 296 (78.04%) 320 (64.69%)
delete source 36 ( 0.00%) 19 (89.47%) 20 (80.00%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 125.03 118.98 122.01
Total Elapsed Time (seconds) 624.56 375.02 398.06
This test downloads a large tarfile and copies it with dd a number of
times - similar to the most recent bug report I've dealt with. Time to
completion is reduced. The number of pages scanned directly is still
disturbingly high with a low efficiency but this is likely due to the
number of dirty pages encountered. The figures could probably be improved
with more work around how kswapd is used and how dirty pages are handled
but that is separate work and this result is significant on its own.
Streaming Mapped Writer
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 124.47 111.67 112.64
Total Elapsed Time (seconds) 2138.14 1816.30 1867.56
MMTests Statistics: vmstat
Page Ins 90760 89124 89516
Page Outs 121028340120199524120736696
Swap Ins 0 86 55
Swap Outs 0 0 0
Direct pages scanned 1149893639646143996330619
Kswapd pages scanned 564309485696576357075875
Kswapd pages reclaimed 277432192775204427766606
Direct pages reclaimed 49777 46884 36655
Kswapd efficiency 49% 48% 48%
Kswapd velocity 26392.541 31363.631 30561.736
Direct efficiency 0% 0% 0%
Direct velocity 53780.091 53108.759 51581.004
Percentage direct scans 67% 62% 62%
Page writes by reclaim 385 122 1513
Slabs scanned 43008 39040 42112
Direct inode steals 0 10 8
Kswapd inode steals 733 534 477
This test just creates a large file mapping and writes to it linearly.
Time to completion is again reduced.
The gains are mostly down to two things. In many cases, there is less
scanning as zone_reclaim simply gives up faster due to recent failures.
The second reason is that memory is used more efficiently. Instead of
scanning the preferred zone every time, the allocator falls back to
another zone and uses it instead improving overall memory utilisation.
This patch: initialise ZLC for first zone eligible for zone_reclaim.
The zonelist cache (ZLC) is used among other things to record if
zone_reclaim() failed for a particular zone recently. The intention is to
avoid a high cost scanning extremely long zonelists or scanning within the
zone uselessly.
Currently the zonelist cache is setup only after the first zone has been
considered and zone_reclaim() has been called. The objective was to avoid
a costly setup but zone_reclaim is itself quite expensive. If it is
failing regularly such as the first eligible zone having mostly mapped
pages, the cost in scanning and allocation stalls is far higher than the
ZLC initialisation step.
This patch initialises ZLC before the first eligible zone calls
zone_reclaim(). Once initialised, it is checked whether the zone failed
zone_reclaim recently. If it has, the zone is skipped. As the first zone
is now being checked, additional care has to be taken about zones marked
full. A zone can be marked "full" because it should not have enough
unmapped pages for zone_reclaim but this is excessive as direct reclaim or
kswapd may succeed where zone_reclaim fails. Only mark zones "full" after
zone_reclaim fails if it failed to reclaim enough pages after scanning.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the bios or OS sets the pci max read request size to 0 or an
invalid value (6,7), it can result in a hang or slowdown. Check
and set it to something sane if it's invalid.
On some Power rv100 cards, we have no ATY OF table, but we have
no combios table either, and hence we refuse all modes on VGA-0
since we end up with a 0 max pixel clock.
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I don't know what I was thinking putting 'rcu' after a dynamically
sized array! The array could still be in use when we call rcu_free()
(That is the point) so we mustn't corrupt it.
This patch fixes L2 Cache size calculations for L2C-210, L2C-310 and
PL310, by changing the L2X0_AUX_CTRL_WAY_SIZE_MASK from 2 bits to 3
bits.
The Auxiliary Control Register for L2C-210, L2C-310 and PL310 has 3bits
[19:17] for Way size, however the existing code only uses 2 bits to
get this value. This results in incorrect cachesize calculations.
It also results in performing operations on the whole cache when we
erroneously decide that the range is big enough (due to l2x0_size being
too small) and also prints incorrect cachesize.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For some reason SPI block is in broken state after module
unloading. This lead to broken rendering after reloading
module. Fix this by reseting SPI block in CP resume function
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Also add a default case in tps65910_list_voltage_dcdc to silence
'volt' may be used uninitialized in this function warning.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk> Cc: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On Sun4d systems running in SMP mode, IRQ 14 is used for timer interrupts
and has a specialized interrupt handler. IPI is currently set to use IRQ 14
as well, which causes it to trigger the timer interrupt handler, and not the
IPI interrupt handler.
The IPI interrupt is therefore changed to IRQ 13, which is the highest
normally handled interrupt. This IRQ is also used for SBUS interrupts,
however there is nothing in the IPI/SBUS interrupt handlers that indicate
that they will not handle sharing the interrupt.
(IRQ 13 is indicated as audio interrupt, which is unlikely to be found in a
sun4d system)
Signed-off-by: Kjetil Oftedal <oftedal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CC arch/sparc/kernel/pcic.o
arch/sparc/kernel/pcic.c: In function 'pcic_probe':
arch/sparc/kernel/pcic.c:359:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:359:8: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:360:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:360:8: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:361:33: error: array subscript is above array bounds [-Werror=array-bounds]
arch/sparc/kernel/pcic.c:361:8: error: array subscript is above array bounds [-Werror=array-bounds]
cc1: all warnings being treated as errors
I'm not particularly familiar with sparc but t_nmi (defined in head_32.S via
the TRAP_ENTRY macro) and pcic_nmi_trap_patch (defined in entry.S) both appear
to be 4 instructions long and I presume from the usage that instructions are
int sized.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we can't push the pending register windows onto the user's stack,
we disallow signal delivery even if the signal would be delivered on a
valid seperate signal stack.
Add a register window save area in the signal frame, and store any
unsavable windows there.
On sigreturn, if any windows are still queued up in the signal frame,
try to push them back onto the stack and if that fails we kill the
process immediately.
This allows the debug/tst-longjmp_chk2 glibc test case to pass.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sparc32 version of arch_write_unlock() is just a plain assignment.
Unfortunately this allows the compiler to schedule side-effects in a
protected region to occur after the HW-level unlock, which is broken.
E.g., the following trivial test case gets miscompiled:
Fixed by adding a compiler memory barrier to arch_write_unlock(). The
sparc64 version combines the barrier and assignment into a single asm(),
and implements the operation as a static inline, so that's what I did too.
Compile-tested with sparc32_defconfig + CONFIG_SMP=y.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sparc64 spinlock_64.h contains a number of operations defined
first as static inline functions, and then as macros with the same
names and parameters as the functions. Maybe this was needed at
some point in the past, but now nothing seems to depend on these
macros (checked with a recursive grep looking for ifdefs on these
names). Other archs don't define these identity-macros.
So this patch deletes these unnecessary macros.
Compile-tested with sparc64_defconfig.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oops might happen because we check rt2x00queue_empty(queue) twice,
but this condition can change and we can process entry in
rt2800_txdone_entry(), which was already processed by
rt2800usb_txdone_entry_check() -> rt2x00lib_txdone_noinfo() and
has nullify entry->skb .
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oops might happen because we perform parallel putting new entries in a
queue (rt2x00queue_write_tx_frame()) and removing entries after
finishing transmitting (rt2800usb_work_txdone()). There are cases when
_txdone may process an entry that was not fully send and nullify
entry->skb .
To fix check in _txdone if entry has flags that indicate pending
transmission and wait until flags get cleared.
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().
atm: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On AVERATEC 3200, pata_via causes memory corruption with ATAPI DMA,
which often leads to random kernel oops. The cause of the problem is
not well understood yet and only small subset of machines using the
controller seem affected. Blacklist ATAPI DMA on the machine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=11426 Reported-and-tested-by: Jim Bray <jimsantelmo@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Two additional savage4 variants were added, but the S3_SAVAGE4_SERIES
macro was incompletely modified, resulting in a false positive detection
of a savage4 card regardless of which savage card is actually present.
For non-savage4 series cards, such as a Savage/IX-MV card, this results
in garbled video and/or a hard-hang at boot time. Fix this by changing
an '||' to an '&&' in the S3_SAVAGE4_SERIES macro.
Signed-off-by: John P. Stanley <jpsinthemix@verizon.net> Reviewed-by: Tormod Volden <debian.tormod@gmail.com>
[ The macros have incomplete parenthesis too, but whatever .. -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When hibernating ->resume may not be called by usb core, but disconnect
and probe instead, so we do not increase the counter after decreasing
it in ->supend. As a result we free memory early, and get crash when
unplugging usb dongle.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Due to some recent optimization done in the way the mac address
bytes are written into the OTP memory, some AR9485 chipsets were
forced to use the first byte from the eeprom template and the
remaining bytes are read from OTP.
AR9485 happens to use generic eeprom template which has 0x1 as
the first byte causes issues in bringing up the card.
So fixed the eeprom template accordingly to address the issue.
Cc: Paul Stewart <pstew@google.com> Signed-off-by: Senthil Balasubramanian <senthilb@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If is_main_vif(ar, vif) reports that we have to fall back
to software encryption, we goto err_softw; before locking ar->mutex.
As a result, we have unprotected call to carl9170_set_operating_mode
and unmatched mutex_unlock.
The patch fix the issue by adding mutex_lock before goto.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-By: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If h_add_logical_lan_buffer returns an error we need to free
the skb.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
this callback is called during suspend/resume and also via iw command.
it configures parameters like sifs, slottime, acktimeout in
ath9k_hw_init_global_settings where few REG_READ, REG_RMW are also done
and hence the need for PS wrappers
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dereferencing a user pointer directly from kernel-space without going
through the copy_from_user family of functions is a bad idea. Two of
such usages can be found in the sendmsg code path called from sendmmsg,
added by
Usages are performed through memcmp() and memcpy() directly. Fix those
by using the already copied msg_sys structure instead of the __user *msg
structure. Note that msg_sys can be set to NULL by verify_compat_iovec()
or verify_iovec(), which requires additional NULL pointer checks.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@ev0ke.net> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Anton Blanchard <anton@samba.org> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For a long time, the xHCI driver has had this note:
/* FIXME: Ignoring zero-length packets, can those happen? */
It turns out that, yes, there are drivers that need to queue zero-length
transfers for isochronous OUT transfers. Without this patch, users will
see kernel hang messages when a driver attempts to enqueue an isochronous
URB with a zero length transfer (because count_isoc_trbs_needed will return
zero for that TD, xhci_td->last_trb will never be set, and updating the
dequeue pointer will cause an infinite loop).
Matěj ran into this issue when using an NI Audio4DJ USB soundcard
with the snd-usb-caiaq driver. See
https://bugzilla.kernel.org/show_bug.cgi?id=40702
Fix count_isoc_trbs_needed() to return 1 for zero-length transfers (thanks
Alan on the math help). Update the various TRB field calculations to deal
with zero-length transfers. We're still transferring one packet with a
zero-length data payload, so the total_packet_count should be 1. The
Transfer Burst Count (TBC) and Transfer Last Burst Packet Count (TLBPC)
fields should be set to zero.
This patch should be backported to kernels as old as 2.6.36.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Tested-by: Matěj Laitl <matej@laitl.cz> Cc: Daniel Mack <zonque@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a driver tries to cancel an URB, and the host controller is dying,
xhci_urb_dequeue will giveback the URB without removing the xhci_tds
that comprise that URB from the td_list or the cancelled_td_list. This
can cause a race condition between the driver calling URB dequeue and
the stop endpoint command watchdog timer.
If the timer fires on a dying host, and a driver attempts to resubmit
while the watchdog timer has dropped the xhci->lock to giveback a
cancelled URB, URBs may be given back by the xhci_urb_dequeue() function.
At that point, the URB's priv pointer will be freed and set to NULL, but
the TDs will remain on the td_list. This will cause an oops in
xhci_giveback_urb_in_irq() when the watchdog timer attempts to loop
through the endpoints' td_lists, giving back killed URBs.
Make sure that xhci_urb_dequeue() removes TDs from the TD lists and
canceled TD lists before it gives back the URB.
This patch should be backported to kernels as old as 2.6.36.
When an isochronous transfer is enqueued, xhci_queue_isoc_tx_prepare()
will ensure that there is enough room on the transfer rings for all of the
isochronous TDs for that URB. However, when xhci_queue_isoc_tx() is
enqueueing individual isoc TDs, the prepare_transfer() function can fail
if the endpoint state has changed to disabled, error, or some other
unknown state.
With the current code, if Nth TD (not the first TD) fails, the ring is
left in a sorry state. The partially enqueued TDs are left on the ring,
and the first TRB of the TD is not given back to the hardware. The
enqueue pointer is left on the TRB after the last successfully enqueued
TD. This means the ring is basically useless. Any new transfers will be
enqueued after the failed TDs, which the hardware will never read because
the cycle bit indicates it does not own them. The ring will fill up with
untransferred TDs, and the endpoint will be basically unusable.
The untransferred TDs will also remain on the TD list. Since the td_list
is a FIFO, this basically means the ring handler will be waiting on TDs
that will never be completed (or worse, dereference memory that doesn't
exist any more).
Change the code to clean up the isochronous ring after a failed transfer.
If the first TD failed, simply return and allow the xhci_urb_enqueue
function to free the urb_priv. If the Nth TD failed, first remove the TDs
from the td_list. Then convert the TRBs that were enqueued into No-op
TRBs. Make sure to flip the cycle bit on all enqueued TRBs (including any
link TRBs in the middle or between TDs), but leave the cycle bit of the
first TRB (which will show software-owned) intact. Then move the ring
enqueue pointer back to the first TRB and make sure to change the
xhci_ring's cycle state to what is appropriate for that ring segment.
This ensures that the No-op TRBs will be overwritten by subsequent TDs,
and the hardware will not start executing random TRBs because the cycle
bit was left as hardware-owned.
This bug is unlikely to be hit, but it was something I noticed while
tracking down the watchdog timer issue. I verified that the fix works by
injecting some errors on the 250th isochronous URB queued, although I
could not verify that the ring is in the correct state because uvcvideo
refused to talk to the device after the first usb_submit_urb() failed.
Ring debugging shows that the ring looks correct, however.
This patch should be backported to kernels as old as 2.6.36.
When the isochronous transfer support was introduced, and the xHCI driver
switched to using urb->hcpriv to store an "urb_priv" pointer, a couple of
memory leaks were introduced into the URB enqueue function in its error
handling paths.
xhci_urb_enqueue allocates urb_priv, but it doesn't free it if changing
the control endpoint's max packet size fails or the bulk endpoint is in
the middle of allocating or deallocating streams.
xhci_urb_enqueue also doesn't free urb_priv if any of the four endpoint
types' enqueue functions fail. Instead, it expects those functions to
free urb_priv if an error occurs. However, the bulk, control, and
interrupt enqueue functions do not free urb_priv if the endpoint ring is
NULL. It will, however, get freed if prepare_transfer() fails in those
enqueue functions.
Several of the error paths in the isochronous endpoint enqueue function
also fail to free it. xhci_queue_isoc_tx_prepare() doesn't free urb_priv
if prepare_ring() indicates there is not enough room for all the
isochronous TDs in this URB. If individual isochronous TDs fail to be
queued (perhaps due to an endpoint state change), urb_priv is also leaked.
This argues that the freeing of urb_priv should be done in the function
that allocated it, xhci_urb_enqueue.
This patch looks rather ugly, but refactoring the code will have to wait
because this patch needs to be backported to stable kernels.
This patch should be backported to kernels as old as 2.6.36.
When a USB2 port initiate a remote wakeup, software shall ensure that
resume is signaled for at least 20ms, and then write '0' to the PLS field.
According to this, xhci driver do the following things:
1. When receive a remote wakeup event in irq_handler, set the resume_done
value as jiffies + 20ms, and modify rh_timer to poll root hub status at
that time;
2. When receive a GetPortStatus request, if the jiffies is after the
resume_done value, clear the resume signal and resume_done.
However, if usb_port_resume() is called before the rh_timer triggered, it
will indicate the port as Suspend Cleared and skip the clear resume signal
part. The device will fail the usb_get_status request in finish_port_resume(),
and usbcore will try a reset-resume instead. Device will work OK after
reset-resume, but resume_done value is not cleared in this case, and
xhci_bus_suspend() will fail because when it finds a non-zero resume_done
value, it will regard the port as resuming and return -EBUSY.
This causes issue on some platforms that the system fail to suspend
after remote wakeup from suspend by USB2 devices connected to xHCI port.
To fix this issue, report the port status as suspend if the resume is
signaling less that 20ms, and usb_port_resume() will wait 25ms and check
port status again, so xHCI driver can clear the resume signaling and
resume_done value.
This should be backported to kernels as old as 2.6.37.
From EHCI Spec p.28 HC should clear PORT_SUSPEND when SW clears
PORT_RESUME. In Intel Oaktrail platform, MPH (Multi-Port Host
Controller) core clears PORT_SUSPEND directly when SW sets PORT_RESUME
bit. If we rely on PORT_SUSPEND bit to stop USB resume, we will miss
the action of clearing PORT_RESUME. This will cause unexpected long
resume signal on USB bus.
Signed-off-by: Wang Zhi <zhi.wang@windriver.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently the Option driver avoids binding interface 1 on Huawei K3765
and K4505 broadband modems as it should be handled by the cdc_ether
driver instead. This patch ensures we don't bind the interface 2
on those devices as that is CDC_DATA.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the product ID of Huawei's Vodafone K4605 mobile broadband
modem to option.c. This is necessary so that the driver gets loaded on
demand without the intervention of usb_modeswitch. This has the benefit of
it becoming available faster and also ensures that the option driver is not
bound to a network interface that should be claimed by suitable network
driver.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk> Signed-off-by: Alex Chiang <achiang@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the product ID of Huawei's Vodafone K3806 mobile broadband
modem to option.c. This is necessary so that the driver gets loaded on
demand without the intervention of usb_modeswitch. This has the benefit of
it becoming available faster and also ensures that the option driver is not
bound to a network interface that should be claimed by cdc_ether.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk> Signed-off-by: Alex Chiang <achiang@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tty_operations->remove is normally called like:
queue_release_one_tty
->tty_shutdown
->tty_driver_remove_tty
->tty_operations->remove
However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.
So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.
I see this was already reported at:
https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.
This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.
And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).
While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is to fix an issue where output will suddenly become very slow.
The problem occurs on 8250 UARTS with the hardware bug UART_BUG_THRE.
BACKGROUND
For normal UARTs (without UART_BUG_THRE): When the serial core layer
gets new transmit data and the transmitter is idle, it buffers the
data and calls the 8250s' serial8250_start_tx() routine which will
simply enable the TX interrupt in the IER register and return. This
should immediately fire a THRE interrupt and begin transmitting the
data.
For buggy UARTs (with UART_BUG_THRE): merely enabling the TX interrupt
in IER does not necessarily generate a new THRE interrupt.
Therefore, a background timer periodically checks to see if there is
pending data, and starts transmission if that is the case.
The bug happens on SMP systems when the system has nothing to transmit,
the transmit interrupt is disabled and the following sequence occurs:
- CPU0: The background timer routine serial8250_backup_timeout()
starts and saves the state of the interrupt enable register (IER)
and then disables all interrupts in IER. NOTE: The transmit interrupt
(TI) bit is saved as disabled.
- CPU1: The serial core gets data to transmit, grabs the port lock and
calls serial8250_start_tx() which enables the TI in IER.
- CPU0: serial8250_backup_timeout() waits for the port lock.
- CPU1: finishes (with TI enabled) and releases the port lock.
- CPU0: serial8250_backup_timeout() calls the interrupt routine which
will transmit the next fifo's worth of data and then restores the
IER from the previously saved value (TI disabled).
At this point, as long as the serial core has more transmit data
buffered, it will not call serial8250_start_tx() again and the
background timer routine will slowly transmit the data.
The fix is to have serial8250_start_tx() get the port lock before
it saves the IER state and release it after restoring IER. This will
prevent serial8250_start_tx() from running in parallel.
Signed-off-by: Al Cooper <alcooperx@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>