If .runtime_suspend() returns -EAGAIN or -EBUSY, the device should
still be in ACTIVE state, so it is not necessary to send an idle
notification to its parent. If .runtime_suspend() returns other
fatal failure, it doesn't make sense to send idle notification to
its parent.
Skip parent idle notification when failure is returned from
.runtime_suspend() and update comments in rpm_suspend() to reflect
that change.
[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Jeff Layton [Tue, 11 Oct 2011 19:20:55 +0000 (21:20 +0200)]
PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
TASK_KILLABLE is often used to put tasks to sleep for quite some time.
One of the most common uses is to put tasks to sleep while waiting for
replies from a server on a networked filesystem (such as CIFS or NFS).
Unfortunately, fake_signal_wake_up does not currently wake up tasks
that are sleeping in TASK_KILLABLE state. This means that even if the
code were in place to allow them to freeze while in this sleep, it
wouldn't work anyway.
This patch changes this function to wake tasks in this state as well.
This should be harmless -- if the code doing the sleeping doesn't have
handling to deal with freezer events, it should just go back to sleep.
If it does, then this will allow that code to do the right thing.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Barry Song [Mon, 10 Oct 2011 21:38:41 +0000 (23:38 +0200)]
PM / Hibernate: Add resumedelay kernel param in addition to resumewait
Patch "PM / Hibernate: Add resumewait param to support MMC-like
devices as resume file" added the resumewait kernel command line
option. The present patch adds resumedelay so that
resumewait/delay were analogous to rootwait/delay.
[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Barry Song <baohua.song@csr.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Ming Lei [Sun, 9 Oct 2011 03:40:25 +0000 (11:40 +0800)]
PM / Runtime: Update document about callbacks
Support for device power domains has been introduced in
commit 9659cc0678b954f187290c6e8b247a673c5d37e1 (PM: Make
system-wide PM and runtime PM treat subsystems consistently),
also power domain callbacks will take precedence over subsystem ones
from commit 4d27e9dcff00a6425d779b065ec8892e4f391661(PM: Make
power domain callbacks take precedence over subsystem ones).
So update part of "Device Runtime PM Callbacks" in
Documentation/power/runtime_pm.txt.
Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Barry Song [Thu, 6 Oct 2011 18:34:46 +0000 (20:34 +0200)]
PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
Some devices like MMC are async detected very slow. For example,
drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect
MMC partitions then add disk.
We have wait_for_device_probe() and scsi_complete_async_scans()
before calling swsusp_check(), but it is not enough to wait for MMC.
This patch adds resumewait kernel param just like rootwait so
that we have enough time to wait until MMC is ready. The difference is
that we wait for resume partition whereas rootwait waits for rootfs
partition (which may be on a different device).
This patch will make hibernation support many embedded products
without SCSI devices, but with devices like MMC.
[rjw: Modified the changelog slightly.]
Signed-off-by: Barry Song <Baohua.Song@csr.com> Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* pm-assorted:
PM / Hibernate: Fix typo in a kerneldoc comment
PM / Hibernate: Freeze kernel threads after preallocating memory
PM: Update the policy on default wakeup settings
PM / VT: Cleanup #if defined uglyness and fix compile error
PM / Suspend: Off by one in pm_suspend()
PM / Hibernate: Include storage keys in hibernation image on s390
PM: Fix build issue in main.c for CONFIG_PM_SLEEP unset
PM / Suspend: Add statistics debugfs file for suspend to RAM
* pm-domains:
PM / Domains: Split device PM domain data into base and need_restore
ARM: mach-shmobile: sh7372 sleep warning fixes
ARM: mach-shmobile: sh7372 A3SM support
ARM: mach-shmobile: sh7372 generic suspend/resume support
PM / Domains: Preliminary support for devices with power.irq_safe set
PM: Move clock-related definitions and headers to separate file
PM / Domains: Use power.sybsys_data to reduce overhead
PM: Reference counting of power.subsys_data
PM: Introduce struct pm_subsys_data
ARM / shmobile: Make A3RV be a subdomain of A4LC on SH7372
PM / Domains: Rename argument of pm_genpd_add_subdomain()
PM / Domains: Rename GPD_STATE_WAIT_PARENT to GPD_STATE_WAIT_MASTER
PM / Domains: Allow generic PM domains to have multiple masters
PM / Domains: Add "wait for parent" status for generic PM domains
PM / Domains: Make pm_genpd_poweron() always survive parent removal
PM / Domains: Do not take parent locks to modify subdomain counters
PM / Domains: Implement subdomain counters as atomic fields
* pm-runtime:
PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
PM / Runtime: Replace dev_dbg() with trace_rpm_*()
PM / Runtime: Introduce trace points for tracing rpm_* functions
PM / Runtime: Don't run callbacks under lock for power.irq_safe set
USB: Add wakeup info to debugging messages
PM / Runtime: pm_runtime_idle() can be called in atomic context
PM / Runtime: Add macro to test for runtime PM events
PM / Runtime: Add might_sleep() to runtime PM functions
Linus Torvalds [Thu, 6 Oct 2011 23:15:10 +0000 (16:15 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
net: fix typos in Documentation/networking/scaling.txt
bridge: leave carrier on for empty bridge
netfilter: Use proper rwlock init function
tcp: properly update lost_cnt_hint during shifting
tcp: properly handle md5sig_pool references
macvlan/macvtap: Fix unicast between macvtap interfaces in bridge mode
Paul Menzel [Wed, 31 Aug 2011 15:07:10 +0000 (17:07 +0200)]
x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
In summary, this DMI quirk uses the _CRS info by default for the ASUS
M2V-MX SE by turning on `pci=use_crs` and is similar to the quirk
added by commit 2491762cfb47 ("x86/PCI: use host bridge _CRS info on
ASRock ALiveSATA2-GLAN") whose commit message should be read for further
information.
Since commit 3e3da00c01d0 ("x86/pci: AMD one chain system to use pci
read out res") Linux gives the following oops:
Trusting the ACPI _CRS information (`pci=use_crs`) fixes this problem.
$ dmesg | grep -i crs # with the quirk
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
The match has to be against the DMI board entries though since the vendor entries are not populated.
DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304 10/30/2007
This quirk should be removed when `pci=use_crs` is enabled for machines
from 2006 or earlier or some other solution is implemented.
Using coreboot [1] with this board the problem does not exist but this
quirk also does not affect it either. To be safe though the check is
tightened to only take effect when the BIOS from American Megatrends is
used.
15:13 < ruik> but coreboot does not need that
15:13 < ruik> because i have there only one root bus
15:13 < ruik> the audio is behind a bridge
$ sudo dmidecode
BIOS Information
Vendor: American Megatrends Inc.
Version: 0304
Release Date: 10/30/2007
This resolves a regression seen by some users of bridging.
Some users use the bridge like a dummy device.
They expect to be able to put an IPv6 address on the device
with no ports attached. Although there are better ways of doing
this, there is no reason to not allow it.
Note: the bridge still will reflect the state of ports in the
bridge if there are any added.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Wed, 5 Oct 2011 03:24:43 +0000 (03:24 +0000)]
netfilter: Use proper rwlock init function
Replace the open coded initialization with the init function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
[SCSI] libsas: fix panic when single phy is disabled on a wide port
[SCSI] qla2xxx: Fix crash in qla2x00_abort_all_cmds() on unload
Yan, Zheng [Sun, 2 Oct 2011 04:21:50 +0000 (04:21 +0000)]
tcp: properly update lost_cnt_hint during shifting
lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb.
lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint;
When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture,
the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false,
tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not
need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to
lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost().
So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
increases use count of md5sig_pool for each peer. This patch
makes tcp_v4_md5_do_add() only increases use count for the first
tcp md5sig peer.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ward [Sun, 18 Sep 2011 12:53:20 +0000 (12:53 +0000)]
macvlan/macvtap: Fix unicast between macvtap interfaces in bridge mode
Packets should always be forwarded to the lowerdev using dev_forward_skb.
vlan->forward is for packets being forwarded directly to another macvlan/
macvtap device (used for multicast in bridge mode).
Reported-and-tested-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Pihet [Tue, 4 Oct 2011 19:54:45 +0000 (21:54 +0200)]
PM / QoS: Update Documentation for the pm_qos and dev_pm_qos frameworks
Update the documentation for the recently updated pm_qos API, kernel
and user space. Add documentation for the per-device PM QoS
(dev_pm_qos) framework API.
Signed-off-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
PM / QoS: Add function dev_pm_qos_read_value() (v3)
To read the current PM QoS value for a given device we need to
make sure that the device's power.constraints object won't be
removed while we're doing that. For this reason, put the
operation under dev->power.lock and acquire the lock
around the initialization and removal of power.constraints.
Moreover, since we're using the value of power.constraints to
determine whether or not the object is present, the
power.constraints_state field isn't necessary any more and may be
removed. However, dev_pm_qos_add_request() needs to check if the
device is being removed from the system before allocating a new
PM QoS constraints object for it, so make it use the
power.power_state field of struct device for this purpose.
Linus Torvalds [Tue, 4 Oct 2011 17:37:06 +0000 (10:37 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Linus Torvalds [Tue, 4 Oct 2011 16:59:22 +0000 (09:59 -0700)]
Merge branch 'fix/asoc' of git://github.com/tiwai/sound
* 'fix/asoc' of git://github.com/tiwai/sound:
ASoC: omap_mcpdm_remove cannot be __devexit
ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC
ASoC: use a valid device for dev_err() in Zylonite
Jon Mason [Mon, 3 Oct 2011 14:50:20 +0000 (09:50 -0500)]
PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Deucher [Tue, 4 Oct 2011 14:46:34 +0000 (10:46 -0400)]
drm/radeon/kms: fix channel_remap setup (v2)
Most asics just use the hw default value which requires
no explicit programming. For those that need a different
value, the vbios will program it properly. As such,
there's no need to program these registers explicitly
in the driver. Changing MC_SHARED_CHREMAP requires a reload
of all data in vram otherwise its contents will be scambled.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
We found that adding load, Rx data sometimes drops.(with DMA transfer mode)
The cause is that before starting Rx-DMA processing, Tx-DMA processing starts.
This causes FIFO overrun occurs.
This patch fixes the issue by modifying FIFO tx-threshold and DMA descriptor
size like below.
Current this patch
Rx-descriptor 4Byte+12Byte*341 --> 12Byte*340-4Byte-12Byte
Rx-threshold (Not modified)
Tx-descriptor 4Byte+12Byte*341 --> 16Byte-12Byte*340
Rx-threshold 12Byte --> 2Byte
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
spi-topcliff-pch: add tx-memory clear after complete transmitting
Currently, in case of reading date from SPI flash,
command is sent twice.
The cause is that tx-memory clear processing is missing .
This patch adds the tx-momory clear processing.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Takashi Iwai [Tue, 4 Oct 2011 01:09:14 +0000 (18:09 -0700)]
lis3: fix regression of HP DriveGuard with 8bit chip
Commit 2a7fade7e03 ("hwmon: lis3: Power on corrections") caused a
regression on HP laptops with 8bit chip. Writing CTRL2_BOOT_8B bit seems
clearing the BIOS setup, and no proper interrupt for DriveGuard will be
triggered any more.
Since the init code there is basically only for embedded devices, put a
pdata check so that the problematic initialization will be skipped for
hp_accel stuff.
Borislav Petkov [Mon, 3 Oct 2011 18:28:18 +0000 (14:28 -0400)]
ide-disk: Fix request requeuing
Simon Kirby reported that on his RAID setup with idedisk underneath
the box OOMs after a couple of days of runtime. Running with
CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally
allocates an ide_cmd struct. However, ide_requeue_and_plug() can be
called more than once per request, either from the request issue or the
IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn()
repeatedly, allocating a struct ide_cmd everytime and "forgetting" the
previous pointer.
Make sure the code reuses the old allocated chunk.
pch_gbe: Fixed the issue on which a network freezes
The pch_gbe driver has an issue which a network stops,
when receiving traffic is high.
In the case, The link down and up are necessary to return a network.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Sep 2011 10:38:28 +0000 (10:38 +0000)]
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
This is a minor change.
Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:
As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.
Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.
David Vrabel [Fri, 30 Sep 2011 06:37:51 +0000 (06:37 +0000)]
net: xen-netback: correctly restart Tx after a VM restore/migrate
If a VM is saved and restored (or migrated) the netback driver will no
longer process any Tx packets from the frontend. xenvif_up() does not
schedule the processing of any pending Tx requests from the front end
because the carrier is off. Without this initial kick the frontend
just adds Tx requests to the ring without raising an event (until the
ring is full).
This was caused by 47103041e91794acdbc6165da0ae288d844c820b (net:
xen-netback: convert to hw_features) which reordered the calls to
xenvif_up() and netif_carrier_on() in xenvif_connect().
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Fri, 23 Sep 2011 10:53:34 +0000 (10:53 +0000)]
bonding: properly stop queuing work when requested
During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.
There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.
This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reported-by: Liang Zheng <lzheng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michel Dänzer [Fri, 30 Sep 2011 15:16:53 +0000 (17:16 +0200)]
drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.
Apart from the obvious cleanup, this should make the line
cursor_end = x - xorigin + w;
correct now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Michel Dänzer [Fri, 30 Sep 2011 15:16:52 +0000 (17:16 +0200)]
drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation.
Fixes cursor disappearing prematurely when moving off a top/left edge which
is not located at the desktop top/left edge.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Cc: stable@kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicholas Miell <nmiell@gmail.com> Reviewed-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 3 Oct 2011 13:13:46 +0000 (09:13 -0400)]
drm/radeon/kms: add retry limits for native DP aux defer
The previous code could potentially loop forever. Limit
the number of DP aux defer retries to 4 for native aux
transactions, same as i2c over aux transactions.
Alex Deucher [Mon, 3 Oct 2011 13:13:45 +0000 (09:13 -0400)]
drm/radeon/kms: fix regression in DP aux defer handling
An incorrect ordering in the error checking code lead
to DP aux defer being skipped in the aux native write
path. Move the bytes transferred check (ret == 0)
below the defer check.
Tracked down by: Brad Campbell <brad@fnarfbargle.com>
The following patch is admittedly a band-aid, and does not solve the
root cause, but it still is a good candidate for hardening as a pointer
check before reference.
Signed-off-by: Mark Salyzyn <mark_salyzyn@us.xyratex.com> Tested-by: Jack Wang <jack_wang@usish.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Roland Dreier [Thu, 22 Sep 2011 07:06:05 +0000 (00:06 -0700)]
[SCSI] qla2xxx: Fix crash in qla2x00_abort_all_cmds() on unload
I hit a crash in qla2x00_abort_all_cmds() if the qla2xxx module is
unloaded right after it is loaded. I debugged this down to the abort
handling improperly treating a command of type SRB_ADISC_CMD as if it
had a bsg_job to complete when that command actually uses the iocb_cmd
part of the union. (I guess to hit this one has to unload the module
while the async FC initialization is still in progress)
It seems we should only look for a bsg_job if type is SRB_ELS_CMD_RPT,
SRB_ELS_CMD_HST or SRB_CT_CMD, so switch the test to make that explicit.
Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: Chad Dupuis <chad.dupuis@qlogic.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
MyungJoo Ham [Sat, 1 Oct 2011 22:19:34 +0000 (00:19 +0200)]
PM / devfreq: Add basic governors
Four cpufreq-like governors are provided as examples.
powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.
performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.
userspace: use the user specified frequency stored at
devfreq.user_set_freq. With sysfs support in the following patch, a user
may set the value with the sysfs interface.
simple_ondemand: simplified version of cpufreq's ondemand governor.
When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies devfreq to update operating frequency
accordingly. Thus, devfreq users (device drivers) do not need to update
devfreq manually with OPP entry updates or set polling_ms for powersave
, performance, userspace, or any other "static" governors.
Note that these are given only as basic examples for governors and any
devices with devfreq may implement their own governors with the drivers
and use them.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
MyungJoo Ham [Sat, 1 Oct 2011 22:19:28 +0000 (00:19 +0200)]
PM / devfreq: Add common sysfs interfaces
Device specific sysfs interface /sys/devices/.../power/devfreq_*
- governor R: name of governor
- cur_freq R: current frequency
- polling_interval R: polling interval in ms given with devfreq profile
W: update polling interval.
- central_polling R: 1 if polling is managed by devfreq framework
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
--
Documentation/ABI/testing/sysfs-class-devfreq | 44 ++++++++++++++++
drivers/devfreq/devfreq.c | 69 ++++++++++++++++++++++++++
2 files changed, 113 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-class-devfreq
MyungJoo Ham [Sat, 1 Oct 2011 22:19:15 +0000 (00:19 +0200)]
PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.
This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.
The generic DVFS for devices, devfreq, may appear quite similar with
/drivers/cpufreq. However, cpufreq does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.
Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
devfreq also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.
When PM QoS is going to be used with the devfreq device, the device
driver should enable OPPs that are appropriate with the current PM QoS
requests. In order to do so, the device driver may call opp_enable and
opp_disable at the notifier callback of PM QoS so that PM QoS's
update_target() call enables the appropriate OPPs. Note that at least
one of OPPs should be enabled at any time; be careful when there is a
transition.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Linus Torvalds [Sat, 1 Oct 2011 15:37:25 +0000 (08:37 -0700)]
Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
irq: Fix check for already initialized irq_domain in irq_domain_add
irq: Add declaration of irq_domain_simple_ops to irqdomain.h
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86/rtc: Don't recursively acquire rtc_lock
* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
posix-cpu-timers: Cure SMP wobbles
sched: Fix up wchan borkage
sched/rt: Migrate equal priority tasks to available CPUs
MyungJoo Ham [Fri, 30 Sep 2011 20:35:12 +0000 (22:35 +0200)]
PM / OPP: Add OPP availability change notifier.
The patch enables to register notifier_block for an OPP-device in order
to get notified for any changes in the availability of OPPs of the
device. For example, if a new OPP is inserted or enable/disable status
of an OPP is changed, the notifier is executed.
This enables the usage of opp_add, opp_enable, and opp_disable to
directly take effect with any connected entities such as cpufreq or
devfreq.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mike Turquette <mturquette@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Josef Bacik [Fri, 30 Sep 2011 19:23:54 +0000 (15:23 -0400)]
Btrfs: force a page fault if we have a shorty copy on a page boundary
A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing. It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in. The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else. This will force a readpage if we end up
doing a short copy. Alexandre could reproduce this easily with ceph and reports
it fixes his problem. I also wrote a reproducer that no longer hangs my box
with this patch. Thanks,
Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Peter Zijlstra [Thu, 1 Sep 2011 10:42:04 +0000 (12:42 +0200)]
posix-cpu-timers: Cure SMP wobbles
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.
Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.
I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).
The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.
This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.
This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().
Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ALSA: hda - Fix a regression of the position-buffer check
The commit a810364a0424c297242c6c66071a42f7675a5568
ALSA: hda - Handle -1 as invalid position, too
caused a regression on some machines that require the position-buffer
instead of LPIB, e.g. resulting in noises with mic recording with
PulseAudio.
This patch fixes the detection by delaying the test at the timing as
same as 3.0, i.e. doing the position check only when requested in
azx_position_ok().
Ram Pai [Thu, 22 Sep 2011 07:48:58 +0000 (15:48 +0800)]
Resource: fix wrong resource window calculation
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window. This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.
__find_resource() looks for gaps in resource allocation among the
children resource windows. When it encounters the last child window it
blindly tries the range next to one allocated to the last child. Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation. This leads to a conflicting
window allocation.
Michal Ludvig reported this issue seen on his platform. The following
patch fixes the problem and has been verified by Michal. I believe this
bug has been there for ages. It got exposed by git commit 2bbc6942273b
("PCI : ability to relocate assigned pci-resources")
Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus
* 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
[media] omap3isp: Fix build error in ispccdc.c
[media] uvcvideo: Fix crash when linking entities
[media] v4l: Make sure we hold a reference to the v4l2_device before using it
[media] v4l: Fix use-after-free case in v4l2_device_release
[media] uvcvideo: Set alternate setting 0 on resume if the bus has been reset
[media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] cio: fix cio_tpi ignoring adapter interrupts
[S390] gmap: always up mmap_sem properly
[S390] Do not clobber personality flags on exec
* git://github.com/davem330/sparc:
sparc64: Force the execute bit in OpenFirmware's translation entries.
sparc: Make '-p' boot option meaningful again.
sparc, exec: remove redundant addr_limit assignment
sparc64: Future proof Niagara cpu detection.
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux
* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux:
drm/i915: FBC off for ironlake and older, otherwise on by default
drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI
drm/i915: Enable dither whenever display bpc < frame buffer bpc
powerpc: Fix device-tree matching for Apple U4 bridge
Apple Quad G5 has some oddity in it's device-tree which causes the new
generic matching code to fail to relate nodes for PCI-E devices below U4
with their respective struct pci_dev. This breaks graphics on those
machines among others.
This fixes it using a quirk which copies the node pointer from the host
bridge for the root complex, which makes the generic code work for the
children afterward.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bootup: move 'usermodehelper_enable()' a little earlier
Commit d5767c53535a ("bootup: move 'usermodehelper_enable()' to the end
of do_basic_setup()") moved 'usermodehelper_enable()' to end of
do_basic_setup() to after the initcalls. But then I get failed to let
uvesafb work on my computer, and lose the splash boot.
So maybe we could start usermodehelper_enable a little early to make
some task work that need eary init with the help of user mode.
[ I would *really* prefer that initcalls not call into user space - even
the real 'init' hasn't been execve'd yet, after all! But for uvesafb
it really does look like we don't have much choice.
I considered doing this when we mount the root filesystem, but
depending on config options that is in multiple places. We could do
the usermode helper enable as a rootfs_initcall()..
So I'm just using wang yanqing's trivial patch. It's not wonderful,
but it's simple and should work. We should revisit this some day,
though. - Linus ]
Oliver Hartkopp [Thu, 29 Sep 2011 19:33:47 +0000 (15:33 -0400)]
can bcm: fix incomplete tx_setup fix
The commit aabdcb0b553b9c9547b1a506b34d55a764745870 ("can bcm: fix tx_setup
off-by-one errors") fixed only a part of the original problem reported by
Andre Naujoks. It turned out that the original code needed to be re-ordered
to reduce complexity and to finally fix the reported frame counting issues.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Olsa [Thu, 29 Sep 2011 15:05:08 +0000 (17:05 +0200)]
perf tools: Fix raw sample reading
Wrong pointer is being passed for raw data sanity checking, when parsing
sample event.
This ends up with invalid event and perf record being stuck in
__perf_session__process_events function during processing build IDs
(process_buildids function).
Following command hangs up in my setup:
./perf record -e raw_syscalls:sys_enter ls
The fix is to use proper pointer to the raw data instead of the 'u'
union.
Reviewed-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1317308709-9474-2-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David S. Miller [Thu, 29 Sep 2011 19:18:59 +0000 (12:18 -0700)]
sparc64: Force the execute bit in OpenFirmware's translation entries.
In the OF 'translations' property, the template TTEs in the mappings
never specify the executable bit. This is the case even though some
of these mappings are for OF's code segment.
Therefore, we need to force the execute bit on in every mapping.
This problem can only really trigger on Niagara/sun4v machines and the
history behind this is a little complicated.
Previous to sun4v, the sun4u TTE entries lacked a hardware execute
permission bit. So OF didn't have to ever worry about setting
anything to handle executable pages. Any valid TTE loaded into the
I-TLB would be respected by the chip.
But sun4v Niagara chips have a real hardware enforced executable bit
in their TTEs. So it has to be set or else the I-TLB throws an
instruction access exception with type code 6 (protection violation).
We've been extremely fortunate to not get bitten by this in the past.
The best I can tell is that the OF's mappings for it's executable code
were mapped using permanent locked mappings on sun4v in the past.
Therefore, the fact that we didn't have the exec bit set in the OF
translations we would use did not matter in practice.
Thanks to Greg Onufer for helping me track this down.
Signed-off-by: David S. Miller <davem@davemloft.net>
In the rds_iw_mr_pool struct the free_pinned field keeps track of
memory pinned by free MRs. While this field is incremented properly
upon allocation, it is never decremented upon unmapping. This would
cause the rds_rdma module to crash the kernel upon unloading, by
triggering the BUG_ON in the rds_iw_destroy_mr_pool function.
This change keeps track of the MRs that become unpinned, so that
free_pinned can be decremented appropriately.
Signed-off-by: Jonathan Lallinger <jonathan@ogc.us> Signed-off-by: Steve Wise <swise@ogc.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 28 Sep 2011 05:33:43 +0000 (05:33 +0000)]
ibmveth: Fix oops on request_irq failure
If request_irq fails, the ibmveth driver will overwrite
the rc and end up returning a successful rc on its open
function, resulting in an oops later when a packet gets
sent and buffers are not allocated due to the failed open.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Fri, 23 Sep 2011 08:23:47 +0000 (08:23 +0000)]
can bcm: fix tx_setup off-by-one errors
This patch fixes two off-by-one errors that canceled each other out.
Checking for the same condition two times in bcm_tx_timeout_tsklet() reduced
the count of frames to be sent by one. This did not show up the first time
tx_setup is invoked as an additional frame is sent due to TX_ANNONCE.
Invoking a second tx_setup on the same item led to a reduced (by 1) number of
sent frames.
Reported-by: Andre Naujoks <nautsch@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 21 Sep 2011 22:08:26 +0000 (22:08 +0000)]
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
I got:
Generating server: Tehuti.onmicrosoft.com
baum@tehutinetworks.net
#< #5.1.1 smtp;550 5.1.1 RESOLVER.ADR.RecipNotFound; not found> #SMTP#
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Alexander Indenbaum <baum@tehutinetworks.net> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 20 Sep 2011 01:25:42 +0000 (01:25 +0000)]
dp83640: reduce driver noise
The driver has two warning messages that might be triggered
by normal use cases. When they appear, the messages give the
impression of a never ending series of errors.
This commit changes them to debug messages instead.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 20 Sep 2011 01:25:41 +0000 (01:25 +0000)]
ptp: fix L2 event message recognition
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.
The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: <stable@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
bootup: move 'usermodehelper_enable()' to the end of do_basic_setup()
Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit 288d5abec831: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.
But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?
Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:
"I have some ARM IXP4xx based machines that use the two on chip MAC
ports (aka NPEs). The NPE needs a firmware in order to function.
Ever since the following commit [that 288d5abec831 one], it is no
longer possible to bring up the interfaces during the init scripts."
with a call trace showing an ioctl coming from user space. Richard says:
"The init is busybox, and the startup script does mount, syslogd, and
then ifup, so that all can go by quickly."
The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls. By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sage Weil [Wed, 28 Sep 2011 17:11:04 +0000 (10:11 -0700)]
libceph: fix pg_temp mapping update
The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0). The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.
This avoids misdirected (and hung) requests that manifest as server
errors like
[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11