Linus Torvalds [Sun, 20 Oct 2013 22:20:45 +0000 (23:20 +0100)]
Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:
"There are just two small fixes in here:
- Revert a commit which exported the flush_cache_page function. This
was noticed by Christoph Hellwig.
- Enable the DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD config
options in the parisc defconfigs so that latest udev/initrd finds
the root disk at boot"
* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: enable DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD in defconfigs
Revert "parisc: Export flush_cache_page() (needed by lustre)"
Jiri Pirko [Sat, 19 Oct 2013 10:29:17 +0000 (12:29 +0200)]
ip_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 19 Oct 2013 10:29:16 +0000 (12:29 +0200)]
ip6_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 19 Oct 2013 10:29:15 +0000 (12:29 +0200)]
udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames
if up->pending != 0 dontfrag is left with default value -1. That
causes that application that do:
sendto len>mtu flag MSG_MORE
sendto len>mtu flag 0
will receive EMSGSIZE errno as the result of the second sendto.
This patch fixes it by respecting IPV6_DONTFRAG socket option.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Seif Mazareeb [Fri, 18 Oct 2013 03:33:21 +0000 (20:33 -0700)]
net: fix cipso packet validation when !NETLABEL
When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.
This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:
Daniel Borkmann [Thu, 17 Oct 2013 20:51:31 +0000 (22:51 +0200)]
net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race
In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").
We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.
On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.
As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Helge Deller [Mon, 14 Oct 2013 20:55:36 +0000 (22:55 +0200)]
parisc: enable DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD in defconfigs
Latest udev requires that DEVTMPFS and DEVTMPFS_MOUNT are enabled, else
initrd will fail to find root filesystem. Enable missing BLK_DEV_INITRD
for B180 and C3000 machines.
Christoph Hellwig <hch@infradead.org> commented:
This one shouldn't go in - Geert sent it a bit prematurely, as Lustre
shouldn't use it just to reimplement core VM functionality (which it
shouldn't use either, but that's a separate story).
Linus Torvalds [Fri, 18 Oct 2013 23:46:21 +0000 (16:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
"Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
regression in our initial rc1 pull. When doing nocow writes we were
sometimes starting a transaction with locks held"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: release path before starting transaction in can_nocow_extent
Linus Torvalds [Fri, 18 Oct 2013 21:26:51 +0000 (14:26 -0700)]
Merge tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- intel_pstate fix for misbehavior after system resume if sysfs
attributes are set in a specific way before the corresponding suspend
from Dirk Brandewie.
- A recent intel_pstate fix has no effect if unsigned long is 32-bit,
so fix it up to cover that case as well.
- The s3c64xx cpufreq driver was not updated when the index field of
struct cpufreq_frequency_table was replaced with driver_data, so
update it now. From Charles Keepax.
- The Kconfig help text for ACPI_BUTTON still refers to
/proc/acpi/event that has been dropped recently, so modify it to
remove that reference. From Krzysztof Mazur.
- A Lan Tianyu's change adds a missing mutex unlock to an error code
path in acpi_resume_power_resources().
- Some code related to ACPI power resources, whose very purpose is
questionable to put it lightly, turns out to cause problems to happen
during testing on real systems, so remove it completely (we may
revisit that in the future if there's a compelling enough reason).
From Rafael J Wysocki and Aaron Lu.
* tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Drop two functions that are not used any more
ATA / ACPI: remove power dependent device handling
cpufreq: s3c64xx: Rename index to driver_data
ACPI / power: Drop automaitc resume of power resource dependent devices
intel_pstate: Fix type mismatch warning
cpufreq / intel_pstate: Fix max_perf_pct on resume
ACPI: remove /proc/acpi/event from ACPI_BUTTON help
ACPI / power: Release resource_lock after acpi_power_get_state() return error
Himanshu Madhani [Thu, 17 Oct 2013 22:26:38 +0000 (18:26 -0400)]
qlcnic: Validate Tx queue only for 82xx adapters.
o validate Tx queue only in case of adapters which supports
multi Tx queue.
This patch is to fix regression introduced in commit aa4a1f7df7cbb98797c9f4edfde3c726e2b3841f
"qlcnic: Enable Tx queue changes using ethtool for 82xx Series adapter"
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Thu, 17 Oct 2013 06:17:14 +0000 (11:47 +0530)]
be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd
It is a required field for all TX_CREATE cmd versions > 0.
This fixes a driver initialization failure, caused by recent SH-R Firmwares
(versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is
not passed.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salva Peiró [Wed, 16 Oct 2013 10:46:50 +0000 (12:46 +0200)]
wanxl: fix info leak in ioctl
The wanxl_ioctl() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Salva Peiró <speiro@ai2.upv.es> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Oct 2013 20:03:03 +0000 (16:03 -0400)]
Merge branch 'bridge_pvid'
Toshiaki Makita says:
====================
bridge: Fix problems around the PVID
There seem to be some undesirable behaviors related with PVID.
1. It has no effect assigning PVID to a port. PVID cannot be applied
to any frame regardless of whether we set it or not.
2. FDB entries learned via frames applied PVID are registered with
VID 0 rather than VID value of PVID.
3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
This leads interoperational problems such as sending frames with VID
4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
0 as they belong to VLAN 0, which is expected to be handled as they have
no VID according to IEEE 802.1Q.
Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
is fixed, because we cannot activate PVID due to it.
This is my analysis for each behavior.
1. We are using VLAN_TAG_PRESENT bit when getting PVID, and not when
adding/deleting PVID.
It can be fixed in either way using or not using VLAN_TAG_PRESENT,
but I think the latter is slightly more efficient.
2. We are setting skb->vlan_tci with the value of PVID but the variable
vid, which is used in FDB later, is set to 0 at br_allowed_ingress()
when untagged frames arrive at a port with PVID valid. I'm afraid that
vid should be updated to the value of PVID if PVID is valid.
3. According to IEEE 802.1Q-2011 (6.9.1 and Table 9-2), we cannot use
VID 0 or 4095 as a PVID.
It looks like that there are more stuff to consider.
- VID 0:
VID 0 shall not be configured in any FDB entry and used in a tag header
to indicate it is a 802.1p priority-tagged frame.
Priority-tagged frames should be applied PVID (from IEEE 802.1Q 6.9.1).
In my opinion, since we can filter incomming priority-tagged frames by
deleting PVID, we don't need to filter them by vlan_bitmap.
In other words, priority-tagged frames don't have VID 0 but have no VID,
which is the same as untagged frames, and should be filtered by unsetting
PVID.
So, not only we cannot set PVID as 0, but also we don't need to add 0 to
vlan_bitmap, which enables us to simply forbid to add vlan 0.
- VID 4095:
VID 4095 shall not be transmitted in a tag header. This VID value may be
used to indicate a wildcard match for the VID in management operations or
FDB entries (from IEEE 802.1Q Table 9-2).
In current implementation, we can create a static FDB entry with all
existing VIDs by not specifying any VID when creating it.
I don't think this way to add wildcard-like entries needs to change,
and VID 4095 looks no use and can be unacceptable to add.
Consequently, I believe what we should do for 3rd problem is below:
- Not allowing VID 0 and 4095 to be added.
- Applying PVID to priority-tagged (VID 0) frames.
Note: It has been descovered that another problem related to priority-tags
remains. If we use vlan 0 interface such as eth0.0, we cannot communicate
with another end station via a linux bridge.
This problem exists regardless of whether this patch set is applied or not
because we might receive untagged frames from another end station even if we
are sending priority-tagged frames.
This issue will be addressed by another patch set introducing an additional
egress policy, on which Vlad Yasevich is working.
See http://marc.info/?t=137880893800001&r=1&w=2 for detailed discussion.
Patch set follows this mail.
The order of patches is not the same as described above, because the way
to fix 1st problem is based on the assumption that we don't use VID 0 as
a PVID, which is realized by fixing 3rd problem.
(1/4)(2/4): Fix 3rd problem.
(3/4): Fix 1st problem.
(4/4): Fix 2nd probelm.
v2:
- Add descriptions about the problem related to priority-tags in cover letter.
- Revise patch comments to reference the newest spec.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Wed, 16 Oct 2013 08:07:16 +0000 (17:07 +0900)]
bridge: Fix updating FDB entries when the PVID is applied
We currently set the value that variable vid is pointing, which will be
used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
or priority-tagged frames, even though the PVID is valid.
This leads to FDB updates in such a wrong way that they are learned with
VID 0.
Update the value to that of PVID if the PVID is applied.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Wed, 16 Oct 2013 08:07:15 +0000 (17:07 +0900)]
bridge: Fix the way the PVID is referenced
We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is
set or not at br_get_pvid(), while we don't care about the bit in
adding/deleting the PVID, which makes it impossible to forward any
incomming untagged frame with vlan_filtering enabled.
Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate
that the PVID is not set, which is slightly more efficient than using
the VLAN_TAG_PRESENT.
Fix the problem by getting rid of using the VLAN_TAG_PRESENT.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Wed, 16 Oct 2013 08:07:14 +0000 (17:07 +0900)]
bridge: Apply the PVID to priority-tagged frames
IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
use the PVID for the port as its VID.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)
Apply the PVID to not only untagged frames but also priority-tagged frames.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Wed, 16 Oct 2013 08:07:13 +0000 (17:07 +0900)]
bridge: Don't use VID 0 and 4095 in vlan filtering
IEEE 802.1Q says that:
- VID 0 shall not be configured as a PVID, or configured in any Filtering
Database entry.
- VID 4095 shall not be configured as a PVID, or transmitted in a tag
header. This VID value may be used to indicate a wildcard match for the VID
in management operations or Filtering Database entries.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)
Don't accept adding these VIDs in the vlan_filtering implementation.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Polling TX statuses too frequently has two negative effects. First is
randomly peek CPU usage, causing overall system functioning delays.
Second bad effect is that device is not able to fill TX statuses in
H/W register on some workloads and we get lot of timeouts like below:
ieee80211 phy4: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 7 in queue 2
ieee80211 phy4: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 7 in queue 2
ieee80211 phy4: rt2800usb_txdone: Warning - Got TX status for an empty queue 2, dropping
This not only cause flood of messages in dmesg, but also bad throughput,
since rate scaling algorithm can not work optimally.
In the future, we should probably make polling interval be adjusted
automatically, but for now just increase values, this make mentioned
problems gone.
Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Acked-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
ath5k_tx_frame_completed saves the intended per-rate retry counts before
they are cleared by ieee80211_tx_info_clear_status, however at this
point the information in info->status.rates is incomplete.
This causes significant throughput degradation and excessive packet loss
on links where high bit rates don't work properly.
Move the copy from bf->rates a few lines up to ensure that the saved
retry counts are updated, and that they are really cleared in
info->status.rates after the call to ieee80211_tx_info_clear_status.
Cc: stable@vger.kernel.org # 3.10+ Cc: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Cc: Benjamin Vahl <bvahl@net.t-labs.tu-berlin.de> Reported-by: Ben West <ben@gowasabi.net> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we set IEEE80211_HW_QUEUE_CONTROL, we can let
mac80211 do the queue assignement and don't need to
override its decisions.
While reassiging the same values is harmless of course,
it triggered a WARNING when iwlwifi and mac80211 came
to different conclusions. This happened when mac80211 set
IEEE80211_TX_CTL_SEND_AFTER_DTIM, but didn't route the
packet to the cab_queue because no stations were asleep.
iwlwifi should not override mac80211's decicions for
offchannel packets and packets to be sent after DTIM,
but it should override mac80211's decision for AMPDUs
since we have a special queue for them. So for AMPDU,
we still override info->hw_queue by the AMPDU queue.
* acpi-fixes:
ACPI / PM: Drop two functions that are not used any more
ATA / ACPI: remove power dependent device handling
ACPI / power: Drop automaitc resume of power resource dependent devices
ACPI: remove /proc/acpi/event from ACPI_BUTTON help
ACPI / power: Release resource_lock after acpi_power_get_state() return error
In ipcomp_compress(), sortirq is enabled too early, allowing the
per-cpu scratch buffer to be rewritten by ipcomp_decompress()
(called on the same CPU in softirq context) between populating
the buffer and copying the compressed data to the skb.
v2: as pointed out by Steffen Klassert, if we also move the
local_bh_disable() before reading the per-cpu pointers, we can
get rid of get_cpu()/put_cpu().
v3: removed ipcomp_decompress part (as explained by Herbert Xu,
it cannot be called from process context), get rid of cpu
variable (thanks to Eric Dumazet)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Linus Torvalds [Fri, 18 Oct 2013 01:49:21 +0000 (18:49 -0700)]
Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Five small cifs fixes (includes fixes for: unmount hang, 2 security
related, symlink, large file writes)"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
cifs: ntstatus_to_dos_map[] is not terminated
cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
cifs: Fix inability to write files >2GB to SMB2/3 shares
cifs: Avoid umount hangs with smb2 when server is unresponsive
do not treat non-symlink reparse points as valid symlinks
Nicolas Ferre [Thu, 17 Oct 2013 15:37:11 +0000 (17:37 +0200)]
tty/serial: at91: fix uart/usart selection for older products
Since commit 055560b04a8cd063aea916fd083b7aec02c2adb8 (serial: at91:
distinguish usart and uart) the older products which do not have a
name field in their register map are unable to use their serial output.
As the main console output is usually the serial interface (aka DBGU) it
is pretty unfortunate.
So, instead of failing during probe() we just silently configure the serial
peripheral as an uart. It allows us to use these serial outputs.
The proper solution is proposed in another patch.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible to set bridge_forward_delay to be higher then
permitted maximum when STP is off. When turning STP on, the
higher then allowed delay has to be clamed down to max value.
CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Oct 2013 19:24:54 +0000 (12:24 -0700)]
tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()
sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO
packets in the first place. (As a performance hint)
Once we have GSO packets in write queue, we can not decide they are no
longer GSO only because flow now uses a route which doesn't handle
TSO/GSO.
Core networking stack handles the case very well for us, all we need
is keeping track of packet counts in MSS terms, regardless of
segmentation done later (in GSO or hardware)
Right now, if tcp_fragment() splits a GSO packet in two parts,
@left and @right, and route changed through a non GSO device,
both @left and @right have pcount set to 1, which is wrong,
and leads to incorrect packet_count tracking.
This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU
discovery")
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Oct 2013 18:54:30 +0000 (11:54 -0700)]
tcp: must unclone packets before mangling them
TCP stack should make sure it owns skbs before mangling them.
We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.
Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.
We have identified two points where skb_unclone() was needed.
This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.
Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Enrico Mioso [Tue, 15 Oct 2013 13:06:48 +0000 (15:06 +0200)]
net: qmi_wwan: Olivetti Olicard 200 support
This is a QMI device, manufactured by TCT Mobile Phones.
A companion patch blacklisting this device's QMI interface in the option.c
driver has been sent.
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Antonella Pellizzari <anto.pellizzari83@gmail.com> Tested-by: Dan Williams <dcbw@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Tue, 15 Oct 2013 03:18:59 +0000 (11:18 +0800)]
virtio-net: refill only when device is up during setting queues
We used to schedule the refill work unconditionally after changing the
number of queues. This may lead an issue if the device is not
up. Since we only try to cancel the work in ndo_stop(), this may cause
the refill work still work after removing the device. Fix this by only
schedule the work when device is up.
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 14 Oct 2013 12:28:38 +0000 (15:28 +0300)]
yam: integer underflow in yam_ioctl()
We cap bitrate at YAM_MAXBITRATE in yam_ioctl(), but it could also be
negative. I don't know the impact of using a negative bitrate but let's
prevent it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Pargmann [Sun, 13 Oct 2013 19:17:01 +0000 (21:17 +0200)]
net/ethernet: cpsw: Bugfix interrupts before enabling napi
If interrupts happen before napi_enable was called, the driver will not
work as expected. Network transmissions are impossible in this state.
This bug can be reproduced easily by restarting the network interface in
a loop. After some time any network transmissions on the network
interface will fail.
This patch fixes the bug by enabling napi before enabling the network
interface interrupts.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 12 Oct 2013 21:08:34 +0000 (14:08 -0700)]
bnx2x: record rx queue for LRO packets
RPS support is kind of broken on bnx2x, because only non LRO packets
get proper rx queue information. This triggers reorders, as it seems
bnx2x like to generate a non LRO packet for segment including TCP PUSH
flag : (this might be pure coincidence, but all the reorders I've
seen involve segments with a PUSH)
11:13:34.335847 IP A > B: . 415808:447136(31328) ack 1 win 457 <nop,nop,timestamp 37893363985797>
11:13:34.335992 IP A > B: . 447136:448560(1424) ack 1 win 457 <nop,nop,timestamp 37893363985797>
11:13:34.336391 IP A > B: . 448560:479888(31328) ack 1 win 457 <nop,nop,timestamp 37893373985797>
11:13:34.336425 IP A > B: P 511216:512640(1424) ack 1 win 457 <nop,nop,timestamp 37893373985798>
11:13:34.336423 IP A > B: . 479888:511216(31328) ack 1 win 457 <nop,nop,timestamp 37893373985798>
11:13:34.336924 IP A > B: . 512640:543968(31328) ack 1 win 457 <nop,nop,timestamp 37893373985798>
11:13:34.336963 IP A > B: . 543968:575296(31328) ack 1 win 457 <nop,nop,timestamp 37893373985798>
We must call skb_record_rx_queue() to properly give to RPS (and more
generally for TX queue selection on forward path) the receive queue
information.
Similar fix is needed for skb_mark_napi_id(), but will be handled
in a separate patch to ease stable backports.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eilon Greenstein <eilong@broadcom.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 12 Oct 2013 17:16:27 +0000 (10:16 -0700)]
tcp: fix incorrect ca_state in tail loss probe
On receiving an ACK that covers the loss probe sequence, TLP
immediately sets the congestion state to Open, even though some packets
are not recovered and retransmisssion are on the way. The later ACks
may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g.,
https://bugzilla.redhat.com/show_bug.cgi?id=989251
The fix is to follow the similar procedure in recovery by calling
tcp_try_keep_open(). The sender switches to Open state if no packets
are retransmissted. Otherwise it goes to Disorder and let subsequent
ACKs move the state to Recovery or Open.
Reported-By: Michael Sterrett <michael@sterretts.net> Tested-By: Dormando <dormando@rydia.net> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 12 Oct 2013 06:24:08 +0000 (14:24 +0800)]
usbnet: fix error return code in usbnet_probe()
Fix to return -ENOMEM in the padding pkt alloc fail error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 19:25:13 +0000 (15:25 -0400)]
Merge branch 'sctp_csum'
Vlad Yasevich says:
====================
sctp: Use software checksum under certain circumstances.
There are some cards that support SCTP checksum offloading. When using
these cards with IPSec or forcing IP fragmentation of SCTP traffic,
the checksum is computed incorrectly due to the fact that xfrm and IP/IPv6
fragmentation code do not know that this is SCTP traffic and do not
know that checksum has to be computed differently.
To fix this, we let SCTP detect these conditions and perform software
checksum calculation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 16 Oct 2013 02:01:31 +0000 (22:01 -0400)]
sctp: Perform software checksum if packet has to be fragmented.
IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.
CC: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fan Du [Wed, 16 Oct 2013 02:01:30 +0000 (22:01 -0400)]
sctp: Use software crc32 checksum when xfrm transform will happen.
igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.
Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 17 Oct 2013 17:39:01 +0000 (10:39 -0700)]
Merge tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one fix for the hotplug memory path that resolves a regression
when removing memory that showed up in 3.12-rc1"
* tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Release device_hotplug_lock when store_mem_state returns EINVAL
Linus Torvalds [Thu, 17 Oct 2013 17:38:18 +0000 (10:38 -0700)]
Merge tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option
USB serial driver for new Huawei devices. Other than that, just some
small bug fixes for issues that people have reported (run-time and
build-time), nothing major"
* tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
usb/chipidea: fix oops on memory allocation failure
usb-storage: add quirk for mandatory READ_CAPACITY_16
usb: serial: option: blacklist Olivetti Olicard200
USB: quirks: add touchscreen that is dazzeled by remote wakeup
Revert "usb: musb: gadget: fix otg active status flag"
USB: quirks.c: add one device that cannot deal with suspension
USB: serial: option: add support for Inovia SEW858 device
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.
USB: support new huawei devices in option.c
usb: musb: start musb on the udc side, too
xhci: Fix spurious wakeups after S5 on Haswell
xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers
xhci: quirk for extra long delay for S4
xhci: Don't enable/disable RWE on bus suspend/resume.
Linus Torvalds [Thu, 17 Oct 2013 17:37:42 +0000 (10:37 -0700)]
Merge tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are two serial driver fixes for your tree. One is a revert of a
patch that causes a build error, the other is a fix to provide the
correct brace placement which resolves a bug where the driver was not
working properly"
* tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: vt8500: add missing braces
Revert "serial: i.MX: evaluate linux,stdout-path property"
Linus Torvalds [Thu, 17 Oct 2013 17:36:57 +0000 (10:36 -0700)]
Merge tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small iio and w1 driver fixes for 3.12-rc6.
There is also a hyper-v fix in here, which turned out to be incorrect,
so it was reverted. That will probably have to wait unto 3.13-rc1 to
get accepted as it's still being discussed"
* tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
Drivers: hv: vmbus: Fix a bug in channel rescind code
iio:buffer: Free active scan mask in iio_disable_all_buffers()
iio: frequency: adf4350: add missing clk_disable_unprepare() on error in adf4350_probe()
w1 - call request_module with w1 master mutex unlocked
w1 - fix fops in w1_bus_notify
David S. Miller [Thu, 17 Oct 2013 17:36:15 +0000 (13:36 -0400)]
Merge branch 'dm9000'
Nikita Kiryanov says:
====================
dm9000 improvements
This is a collection of improvements and bug fixes for dm9000, mostly
related to its startup and resume-from-suspend sequences.
Patch "Implement full reset of DM9000 network device" was submitted to the
linux-kernel mailing list but never applied.
An archive of the submission and the following conversation can be found here:
http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/02817.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikita Kiryanov [Wed, 16 Oct 2013 08:41:34 +0000 (11:41 +0300)]
dm9000: report the correct LPA
Report the LPA by checking mii_if_info, instead of just saying "no LPA" every
time.
Cc: David S. Miller <davem@davemloft.net> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il> Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Abbott [Wed, 16 Oct 2013 08:41:33 +0000 (11:41 +0300)]
dm9000: Implement full reset of DM9000 network device
A Davicom application note for the DM9000 network device recommends
performing software reset twice to correctly initialise the device.
Without this reset some devices fail to initialise correctly on
system startup.
Cc: David S. Miller <davem@davemloft.net> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Nikita Kiryanov [Wed, 16 Oct 2013 08:41:32 +0000 (11:41 +0300)]
dm9000: take phy out of reset during init
Take the phy out of reset explicitly during system resume to avoid
losing network connectivity.
Cc: David S. Miller <davem@davemloft.net> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il> Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Nikita Kiryanov [Wed, 16 Oct 2013 08:41:31 +0000 (11:41 +0300)]
dm9000: during init reset phy only for dm9000b
Some of the changes introduced in commit 6741f40 (DM9000B: driver
initialization upgrade) break functionality on DM9000A
(error message during NFS boot: "dm9000 dm9000.0: eth0: link down")
Since the changes were meant to serve only DM9000B, make them
dependent on the chip type.
Cc: David S. Miller <davem@davemloft.net> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il> Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 17 Oct 2013 17:17:25 +0000 (10:17 -0700)]
Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap
regression fix, and kernel memory leak fix in hdsp driver. Hopefully
this will be the last pull request for 3.12..."
* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hdsp - info leak in snd_hdsp_hwdep_ioctl()
ALSA: us122l: Fix pcm_usb_stream mmapping regression
ALSA: hda - Fix inverted internal mic not indicated on some machines
Linus Torvalds [Thu, 17 Oct 2013 17:16:45 +0000 (10:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull apparmor fixes from James Morris:
"A couple more regressions fixed"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix bad lock balance when introspecting policy
apparmor: fix memleak of the profile hash
Merge tag 'iio-fixes-for-3.12c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:
Third set of IIO fixes for the 3.12 cycle.
Two little ones this time:
1) A missing clk_unprepare in adf4350.
2) A missing free of the active_scan_mask when iio_disable_all_buffers is
called during an unexpected device removal. This leak was introduced by
the fix a87c82e454f184a9473f8cdfd4d304205f585f65 iio: Stop sampling when the device is removed
and hence is a regression fix.
Guenter Roeck [Thu, 17 Oct 2013 02:18:41 +0000 (19:18 -0700)]
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
changed the conditional around the declaration of usb_nop_xceiv_register
from
#if defined(CONFIG_NOP_USB_XCEIV) ||
(defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
to
#if IS_ENABLED(CONFIG_NOP_USB_XCEIV)
While that looks the same, it is semantically different. The first expression
is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
code is built as module. The second expression is true if code depending on
CONFIG_NOP_USB_XCEIV if built as module or into the kernel.
As a result, the arm:allmodconfig build fails with
arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
`usb_nop_xceiv_register'
Fix the problem by reverting to the old conditional.
Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
This reverts commit 90d33f3ec519db19d785216299a4ee85ef58ec97 as it's not
the correct fix for this issue, and it causes a build warning to be
added to the kernel tree.
Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACPI / PM: Drop two functions that are not used any more
Two functions defined in device_pm.c, acpi_dev_pm_add_dependent()
and acpi_dev_pm_remove_dependent(), have no callers and may be
dropped, so drop them.
Moreover, they are the only functions adding entries to and removing
entries from the power_dependent list in struct acpi_device, so drop
that list too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Aaron Lu [Thu, 17 Oct 2013 13:38:53 +0000 (15:38 +0200)]
ATA / ACPI: remove power dependent device handling
Previously, we wanted SCSI devices corrsponding to ATA devices to
be runtime resumed when the power resource for those ATA device was
turned on by some other device, so we added the SCSI device to the
dependent device list of the ATA device's ACPI node. However, this
code has no effect after commit 41863fc (ACPI / power: Drop automaitc
resume of power resource dependent devices) and the mechanism it was
supposed to implement is regarded as a bad idea now, so drop it.
[rjw: Changelog] Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Johannes Berg [Tue, 15 Oct 2013 10:25:07 +0000 (12:25 +0200)]
mac80211: disable WMM with invalid parameters
Some APs (notably a Sitecom WL-153 v1 with firmware 1.45) are sending
invalid WMM parameters setting AIFSN, ECWmin and ECWmax to zero. The
spec mandates that the value of AIFSN is at least 2, and some cards
(e.g. Intel with the iwldvm driver) can't transmit when the invalid
QoS parameters are actually uploaded to the firmware.
Since there's little chance of being able to guess the values that
the AP actually meant, disable WMM if such an invalid case is found.
Since ECWmin/ECWmax are allowed to be zero, only verify AIFSN >= 2
and ECWmin <= ECWmax.
Reviewed-by: Eliad Peller <eliad@wizery.com> Reported-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Thu, 17 Oct 2013 04:36:03 +0000 (21:36 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm: revert mremap pud_free anti-fix
mm: fix BUG in __split_huge_page_pmd
swap: fix set_blocksize race during swapon/swapoff
procfs: call default get_unmapped_area on MMU-present architectures
procfs: fix unintended truncation of returned mapped address
writeback: fix negative bdi max pause
percpu_refcount: export symbols
fs: buffer: move allocation failure loop into the allocator
mm: memcg: handle non-error OOM situations more gracefully
tools/testing/selftests: fix uninitialized variable
block/partitions/efi.c: treat size mismatch as a warning, not an error
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
mm/zswap: bugfix: memory leak when re-swapon
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
mm: migration: do not lose soft dirty bit if page is in migration state
gcov: MAINTAINERS: Add an entry for gcov
mm/hugetlb.c: correct missing private flag clearing
mm/vmscan.c: don't forget to free shrinker->nr_deferred
ipc/sem.c: synchronize semop and semctl with IPC_RMID
ipc: update locking scheme comments
...
Hugh Dickins [Wed, 16 Oct 2013 20:47:09 +0000 (13:47 -0700)]
mm: revert mremap pud_free anti-fix
Revert commit 1ecfd533f4c5 ("mm/mremap.c: call pud_free() after fail
calling pmd_alloc()").
The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
ensure that the pud, pmd, pt is already allocated, and seldom do they
need to allocate; on failure, upper levels are freed if appropriate by
the subsequent do_munmap(). Whereas commit 1ecfd533f4c5 did an
unconditional pud_free() of a most-likely still-in-use pud: saved only
by the near-impossiblity of pmd_alloc() failing.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 16 Oct 2013 20:47:08 +0000 (13:47 -0700)]
mm: fix BUG in __split_huge_page_pmd
Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
__split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
It's invalid: we don't always have down_write of mmap_sem there: a racing
do_huge_pmd_wp_page() might have copied-on-write to another huge page
before our split_huge_page() got the anon_vma lock.
Forget the BUG_ON, just go back and try again if this happens.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swap: fix set_blocksize race during swapon/swapoff
Fix race between swapoff and swapon. Swapoff used old_block_size from
swap_info outside of swapon_mutex so it could be overwritten by
concurrent swapon.
The race has visible effect only if more than one swap block device
exists with different block sizes (e.g. /dev/sda1 with block size 4096
and /dev/sdb1 with 512). In such case it leads to setting the blocksize
of swapped off device with wrong blocksize.
The bug can be triggered with multiple concurrent swapoff and swapon:
0. Swap for some device is on.
1. swapoff:
First the swapoff is called on this device and "struct swap_info_struct
*p" is assigned. This is done under swap_lock however this lock is
released for the call try_to_unuse().
2. swapon:
After the assignment above (and before acquiring swapon_mutex &
swap_lock by swapoff) the swapon is called on the same device.
The p->old_block_size is assigned to the value of block_size the device.
This block size should be the same as previous but sometimes it is not.
The swapon ends successfully.
3. swapoff:
Swapoff resumes, grabs the locks and mutex and continues to disable this
swap device. Now it sets the block size to value taken from swap_info
which was overwritten by swapon in 2.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reported-by: Weijie Yang <weijie.yang.kh@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:05 +0000 (13:47 -0700)]
procfs: call default get_unmapped_area on MMU-present architectures
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.
To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file
operation in the procfs file, is not defined.
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:04 +0000 (13:47 -0700)]
procfs: fix unintended truncation of returned mapped address
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64. This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.
Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
proc file mmap(2)").
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since pause is bounded by [min_pause, max_pause] where min_pause is also
bounded by max_pause. It's suspected and demonstrated that the
max_pause calculation goes wrong:
The problem lies in the two "long = unsigned long" assignments in
bdi_max_pause() which might go negative if the highest bit is 1, and the
min_t(long, ...) check failed to protect it falling under 0. Fix all of
them by using "unsigned long" throughout the function.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Richard Weinberger <richard@nod.at> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:47:00 +0000 (13:47 -0700)]
fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.
The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all. Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap. This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.
Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:46:59 +0000 (13:46 -0700)]
mm: memcg: handle non-error OOM situations more gracefully
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doug Anderson [Wed, 16 Oct 2013 20:46:57 +0000 (13:46 -0700)]
block/partitions/efi.c: treat size mismatch as a warning, not an error
In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba")
we started treating bad sizes in lba field of the partition that has the
0xEE (GPT protective) as errors.
However, we may run into these "bad sizes" in the real world if someone
uses dd to copy an image from a smaller disk to a bigger disk. Since
this case used to work (even without using force_gpt), keep it working
and treat the size mismatch as a warning instead of an error.
Reported-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 16 Oct 2013 20:46:56 +0000 (13:46 -0700)]
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
Commit 11feeb498086 ("kvm: optimize away THP checks in
kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
compound pages.
That commit depends on the assumption that PG_reserved is identical for
all head and tail pages of a compound page. So that if get_user_pages
returns a tail page, we don't need to check the head page in order to
know if we deal with a reserved page that requires different
refcounting.
The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic hugepages
allocated through bootmem don't clear the PG_reserved on the tail pages
(the clearing of PG_reserved is done later only if the gigantic hugepage
is freed).
This patch corrects the gigantic compound page initialization so that we
can retain the optimization in 11feeb498086. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
[akpm@linux-foundation.org: tweak comment layout and grammar] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: andy123 <ajs124.ajs124@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weijie Yang [Wed, 16 Oct 2013 20:46:54 +0000 (13:46 -0700)]
mm/zswap: bugfix: memory leak when re-swapon
zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
so a memory leak occurs.
Free the memory of zswap_tree in zswap_frontswap_invalidate_area().
Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org>
From: Weijie Yang <weijie.yang@samsung.com>
Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Consider the following scenario:
thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
finished, entry x and its zbud is not freed as its refcount != 0
now, the swap_map[x] = 0
thread 0: now call zswap_get_swap_cache_page
swapcache_prepare return -ENOENT because entry x is not used any more
zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
zswap_writeback_entry do nothing except put refcount
Now, the memory of zswap_entry x and its zpage leak.
Modify:
- check the refcount in fail path, free memory if it is not referenced.
- use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
can be not only caused by nomem but also by invalidate.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:53 +0000 (13:46 -0700)]
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:51 +0000 (13:46 -0700)]
mm: migration: do not lose soft dirty bit if page is in migration state
If page migration is turned on in config and the page is migrating, we
may lose the soft dirty bit. If fork and mprotect are called on
migrating pages (once migration is complete) pages do not obtain the
soft dirty bit in the correspond pte entries. Fix it adding an
appropriate test on swap entries.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>