git.karo-electronics.de Git - mv-sheeva.git/log

dsa: fix 88e6xxx statistics counter snapshotting

The bit that tells us whether a statistics counter snapshot operation
has completed is located in the GLOBAL register block, not in the
GLOBAL2 register block, so fix up mv88e6xxx_stats_wait() to poll the
right register address.

Signed-off-by: Stephane Contri <Stephane.Contri@grassvalley.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: Fix NAPI race.

Eric Dumazet a écrit :
> Ingo Molnar a écrit :
>>> The following changes since commit 52989765629e7d182b4f146050ebba0abf2cb0b7:
>>>   Linus Torvalds (1):
>>>         Merge git://git.kernel.org/.../davem/net-2.6
>>>
>>> are available in the git repository at:
>>>
>>>   master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
>> Hm, something in this lot quickly wrecked networking here - see the
>> tx timeout dump below. It starts with:
>>
>> [  351.004596] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x10b/0x19c()
>> [  351.011815] Hardware name: System Product Name
>> [  351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
>>
>> Config attached. Unfortunately i've got no time to do bisection
>> today.
>
>
>
> forcedeth might have a problem, in its netif_wake_queue() logic, but
> I could not see why a recent patch could make this problem visible now.
>
> CPU0/1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
> is not a new cpu either :)
>
> forcedeth uses an internal tx_stop without appropriate barrier.
>
> Could you try following patch ?
>
> (random guess as I dont have much time right now)

We might have a race in napi_schedule(), leaving interrupts disabled forever.
I cannot test this patch, I dont have the hardware...

Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/smsc911x.c: Fix resource size off by 1 error

The call resource_size(res) returns res->end - res->start + 1 and thus the
second change is semantics-preserving.  res_size is then used as the second
argument of a call to request_mem_region, and the memory allocated by this
call appears to be the same as what is released in the two calls to
release_mem_region.  So the size argument for those calls should be
resource_size(size) as well.  Alternatively, in the second call to
release_mem_region, the second argument could be res_size, as that variable
has already been initialized at the point of this call.

The problem was found using the following semantic patch:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
struct resource *res;
@@

- (res->end - res->start) + 1
+ resource_size(res)

@@
struct resource *res;
@@

- res->end - res->start
+ BAD(resource_size(res))
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

pcnet_cs: add new id

add new id (RIOS System PC CARD3 ETHERNET).

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix the maximal values of coalescing timeouts.

This patch properly defines the maximum values for rx/tx coalescing timeouts.

Signed-off-by: Vlad Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Disable HC coalescing when setting timeout to zero.

Problem reported by Flavio Leitner <fleitner@redhat.com>:
When setting rx/tx coalescing timeout to the values less than 12 traffic was
stopped.

The FW supports coalescing in 12us granularity, and so value of less then 12
should be interpreted as disabling coalescing

Signed-off-by: Vlad Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Fix device unregister race

It is currently possible for an asynchronous device unregister
to cause the same tun device to be unregistered twice.  This
is because the unregister in tun_chr_close only checks whether
__tun_get(tfile) != NULL.  This however has nothing to do with
whether the device has already been unregistered.  All it tells
you is whether __tun_detach has been called.

This patch fixes this by using the most obvious thing to test
whether the device has been unregistered.

It also moves __tun_detach outside of rtnl_unlock since nothing
that it does requires that lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix spurious interrupt handling in intx mode

Occasionally we may see an interrupt without an event in the eq.
In intx, we currently see the event queue and return IRQ_NONE causing
a the irq to be disabled ("no one cared".) Instead, read the CEV_ISR
reg to check the existence of the interrupt.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: disable K1 at 1000Mbps for 82577/82578

This workaround is required for an issue in hardware where noise on the
interconnect between the MAC and PHY could be generated by a lower power
mode (K1) at 1000Mbps resulting in bad packets. Disable K1 while at 1000
Mbps but keep it enabled for 10/100Mbps and when the cable is disconnected.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: delay second read of PHY_STATUS register on failure of first read

Some PHYs may require two reads of the PHY_STATUS register to determine the
link status. If the PHY is being accessed by another thread it is possible
the first read could timeout and fail. In this case, put a delay in so
the second read will pick up the correct link status.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: prevent NVM corruption on sectors larger than 4K

Limit NVM writes to 4K sections to prevent NVM corruption on larger
sector allocations (up to 64K).

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: do not write SmartSpeed register bits on parts without support

The driver was accessing register bits for features on parts that do
not support that feature. This could cause problems in the hardware.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: delay after LCD reset and proper checks for PHY configuration done

A previous workaround for 82578 to avoid link stall causes some PHY
registers to get cleared inadvertently. Add a delay after all LCD resets
to make sure PHY registers are in a stable state before continuing. Also,
after resets check the EEC register for the state of PHY configuration
performed by the MAC for ICH9 and earlier parts (as done before), but check
the LAN_INIT_DONE bit in the STATUS register for ICH10 and newer parts (EEC
doesn't exist in these newer parts).

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: PHY loopback broken on 82578

PHY loopback on 82578 fails to work as a result of flushing the packets
in the FIFO buffer in the link stall workaround. Don't perform the
workaround if in PHY loopback mode.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Not allow 8259x unsupported wol options change from ethtool

Wake-on-lan is currently only supported by 82599 KX4 devices, in all
other cases return a proper value from ixgbe_wol_exclusion function call.
Otherwise from ethtool we will be able to change wol options of
unsupported 8259x devices.

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix inconsistent SFP/SFP+ failure results.

Currently if we loaded the driver, insert an unsupported module, and then
attempt to "ifconfig up" the device it will be brought down but the netdev
would not be unregistered. This behavior is different than all other
code paths. This patch corrects that by down'ing the device and then
scheduling the sfp_config_module_task tasklet. The tasklet will detect
this condition (like it does with other code paths) and do the
unregister_netdev().

I also removed the log message as this condition (an unsupported SFP+
module) will be logged in sfp_config_module_task.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix regression on some 82598 adapters

The change to check the SFP+ module again on open() was
causing the XFP (non-SFP+) adapters to be rejected. We
only want to try and re-identify the SFP+ module if the
original probe found that this device was an SFP+ device.
So for this code path (driver loaded with SFP module, module
inserted, ifconfig up of the device) the type will be
ixgbe_phy_unknown for an unidentified SFP+ module. So we
only check if that is the case.

This problem also shows up on Copper devices.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix issues with failing to detect insert of unsupported module

Several small fixes around negative test case of the insertion of a
IXGBE_ERR_NOT_SUPPORTED module.

- mdio45_probe call was always failing due to mdio.prtad not being
set. The function set to mdio.mdio_read was still working as we just
happen to always be at prtad == 0. This will allow us to set the phy_id
and phy.type correctly now.

- There was timing issue with i2c calls when initiated from a tasklet.
A small delay was added to allow the electrical oscillation to calm down.

- Logic change in ixgbe_sfp_task that allows NOT_SUPPORTED condition
to be recognized.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix sizeof usage.

Some usage was only sizing a pointer rather than the data type.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Add/use function for link up/down.

We need to set/clear the mac address register when the link goes up/down
respectively. Without this both ports of a 2-port device can end up
with the same mac address in a bonding scenario.
The new ql_link_on() and ql_link_off() will also be used in handling
certain firmware events.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix MAC address bonding issue.

This addes functionality to set/clear the MAC address in the hardware
when the link goes up/down.
The MAC address register is persistent across function resets. In
bonding the same address can bounce from one port to the other. This
can cause packets to be delivered to the wrong port.
This patch clears the MAC address in the hardware when the link is down
and sets it when the link comes up.
It was found that pulling/pushing the cable from one port to another
causes the same MAC address to be in both ports.
The next patch in this series will use this functionality as well.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix tx byte counter.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix redundant call to free resources.

The caller will free acquired resouces if a failure occurs.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix carrier on condition.

We were turning on the carrier without verifying the link was up.
This adds link up to the link initialize check before turning carrier
on.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Clear frame to queue routing before reset.

Not clearing the routing bits can cause frames to erroneously get routed to
management processor.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Expand coverage of hw lock for config register.

The hardware semaphore covers the configuration register as well as the
ICB registers.  The ICB high and low regs contain the address of the
initialization control block and the config register is used to signal
the hardware that a block is ready to be downloaded.  Currently we were
only protecting the ICB regs.  This changes expands to cover the config
register as well.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IPv6: preferred lifetime of address not getting updated

There's a bug in addrconf_prefix_rcv() where it won't update the
preferred lifetime of an IPv6 address if the current valid lifetime
of the address is less than 2 hours (the minimum value in the RA).

For example, If I send a router advertisement with a prefix that
has valid lifetime = preferred lifetime = 2 hours we'll build
this address:

3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
       valid_lft 7175sec preferred_lft 7175sec

If I then send the same prefix with valid lifetime = preferred
lifetime = 0 it will be ignored since the minimum valid lifetime
is 2 hours:

3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
       valid_lft 7161sec preferred_lft 7161sec

But according to RFC 4862 we should always reset the preferred lifetime
even if the valid lifetime is invalid, which would cause the address
to immediately get deprecated.  So with this patch we'd see this:

5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
       valid_lft 7163sec preferred_lft 0sec

The comment winds-up being 5x the size of the code to fix the problem.

Update the preferred lifetime of IPv6 addresses derived from a prefix
info option in a router advertisement even if the valid lifetime in
the option is invalid, as specified in RFC 4862 Section 5.5.3e.  Fixes
an issue where an address will not immediately become deprecated.
Reported by Jens Rosenboom.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm6: fix the proto and ports decode of sctp protocol

The SCTP pushed the skb above the sctp chunk header, so the
check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
_decode_session6() will never return 0 and the ports decode
of sctp will always fail. (nh + offset + 1 - skb->data < 0)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm4: fix the ports decode of sctp protocol

The SCTP pushed the skb data above the sctp chunk header, so the check
of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
never return 0 because xprth + 4 - skb->data < 0, the ports decode of
sctp will always fail.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/9p: Fix crash due to bad mount parameters.

It is not safe to use match_int without checking the token type returned
by match_token (especially when the token type returned is Opt_err and
args is empty). Fix it.

Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>

if_ether: add define for 1588 aka Timesync

This patch adds ETH_P_1588 protocol ID define.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: fixes for PHY_RESUMING state changes

The PHY_HALTED state disables phydev->link, but the link will not be
updated upon entering PHY_RESUMING.  Add a call to phy_read_status() to
update the link before entering PHY_RUNNING.  If the link is not up at
this point, enter the PHY_NOLINK state instead.

Also, when transitioning from PHY_RESUMING to PHY_RUNNING, calls to
netif_carrier_on() and phydev->adjust_link() are missing.  Add the calls
similar to the other transitions to PHY_RUNNING.

Signed-off-by: Wade Farnsworth <wfarnsworth@mvista.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: avoid frequent firmware reset

Restrict firmware reset to following cases -

o chip rev is NX2031 (firmare doesn't support heartbit).
o firmware is dead.
o previous attempt to init firmware had failed.
o we have got newer file firmware.

This speeds up module load tremendously (by upto 8 sec),
also avoids downtime for NCSI (management) pass-thru
traffic.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: fix the version code macro

Correct firmware encoding is 8 bit major, 8 bit minor and
16 bit subversion. Flash has sizes rightly set, but original
driver submission messed it leaving 16 bit major and 8 bit
subversion.

Also fix a infinite loop when cut-thru file firmware is
invalid.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gigaset: drop pointless check

Drop a sanity check which doesn't serve any useful purpose anymore.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

gigaset: accept connection establishment messages in any order

ISDN connection setup failed if the "connection active" and
"B channel up" messages from the device arrived in a different
order than expected. Modify the state machine to accept them in
any order.

Impact: bugfix
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"

This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f.

After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.

The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.

It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!

Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: return PCI_ERS_RESULT_DISCONNECT on permanent error

PCI drivers that implement the io_error_detected callback should return
PCI_ERS_RESULT_DISCONNECT if the state passed in is
pci_channel_io_perm_failure. This patch fixes the issue for igb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: io_error_detected callback should return PCI_ERS_RESULT_DISCONNECT

on permanent failure

PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is not
checked in many of the network drivers.

This patch fixes the omission in the e1000e driver.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error

PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is
not checked in many of the network drivers.

The patch fixes the omission in the e1000 driver.

Based on Mike Mason's similar patch for e1000e.

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
CC: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: fix unmap bug

as reported by kerneloops.org

[  121.781161] ------------[ cut here ]------------
[  121.781171] WARNING: at lib/dma-debug.c:793 check_unmap+0x14e/0x577()
[  121.781173] Hardware name: S5520HC
[  121.781177] e1000 0000:0a:00.0: DMA-API: device driver tries to free DMA
memory it has not allocated [device address=0x00000001d688b0fa] [size=1522
bytes]
[  121.781180] Modules linked in: e1000 mdio  dca [last unloaded: ixgbe]
[  121.781187] Pid: 4793, comm: bash Tainted: P 2.6.30-master-06161113 #3
[  121.781190] Call Trace:
[  121.781195]  [<ffffffff8123056f>] ? check_unmap+0x14e/0x577
[  121.781201]  [<ffffffff81057a19>] warn_slowpath_common+0x77/0x8f
[  121.781205]  [<ffffffff81057ae1>] warn_slowpath_fmt+0x9f/0xa1
[  121.781212]  [<ffffffff81477ce2>] ? _spin_lock_irqsave+0x3f/0x49
[  121.781216]  [<ffffffff8122fa97>] ? get_hash_bucket+0x28/0x33
[  121.781220]  [<ffffffff8123056f>] check_unmap+0x14e/0x577
[  121.781225]  [<ffffffff810e4f48>] ? check_bytes_and_report+0x38/0xcb
[  121.781230]  [<ffffffff81230bbf>] debug_dma_unmap_page+0x80/0x92
[  121.781234]  [<ffffffff8122e549>] ? unmap_single+0x1a/0x4e
[  121.781239]  [<ffffffff813901e1>] ? __kfree_skb+0x74/0x78
[  121.781250]  [<ffffffffa00662ef>] pci_unmap_single+0x64/0x6d [e1000]
[  121.781259]  [<ffffffffa0066344>] e1000_clean_rx_ring+0x4c/0xbf [e1000]
[  121.781268]  [<ffffffffa00663df>] e1000_clean_all_rx_rings+0x28/0x36 [e1000]
[  121.781277]  [<ffffffffa0067464>] e1000_down+0x138/0x141 [e1000]
[  121.781286]  [<ffffffffa00681c2>] __e1000_shutdown+0x6b/0x198 [e1000]
[  121.781296]  [<ffffffffa0068405>] e1000_suspend+0x17/0x50 [e1000]
[  121.781301]  [<ffffffff81237665>] pci_legacy_suspend+0x3b/0xbe
[  121.781305]  [<ffffffff81237bc6>] pci_pm_suspend+0x3e/0xf1
[  121.781310]  [<ffffffff812eaf1c>] pm_op+0x57/0xde
[  121.781314]  [<ffffffff812eb444>] dpm_suspend_start+0x31e/0x470
[  121.781319]  [<ffffffff810877da>] suspend_devices_and_enter+0x3e/0x1a2
[  121.781323]  [<ffffffff81087a0f>] enter_state+0xd1/0x127
[  121.781327]  [<ffffffff8108717a>] state_store+0xa7/0xc9
[  121.781332]  [<ffffffff81221843>] kobj_attr_store+0x17/0x19
[  121.781336]  [<ffffffff8113c01e>] sysfs_write_file+0xe5/0x121
[  121.781341]  [<ffffffff810ed165>] vfs_write+0xab/0x105
[  121.781344]  [<ffffffff810ed279>] sys_write+0x47/0x6d
[  121.781349]  [<ffffffff81027aab>] system_call_fastpath+0x16/0x1b
[  121.781352] ---[ end trace 97bacaaac2ed7786 ]---

Fix is to correctly zero out internal ->dma value when unmapping
and make sure never to unmap unless there specifically was a mapping done.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix unmap length bug

driver was mixing NET_IP_ALIGN count bytes in map/unmap calls
unevenly. Only map the bytes that the hardware might dma into

also fix unmap related bug where ->dma was not being cleared
after unmap

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix unmap length bug

This patch addresses three WARN_ON statements from DMA-API debug code

ixgbe is mapping more than it unmaps, reduce the length of the map call and
remove the "used once" local variable.

found by Joerg Roedel <joerg.roedel@amd.com> in 2.6.30, so is a candidate
for -stable.

in addition, fix missing ->dma = 0 after unmap to prevent double free with
pci_unmap_single

and lastly, don't unmap (half) pages that aren't mapped.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix link capabilities during adapter resets

Adapter link advertisement capabilities were not persistent during
adapter resets. While configuring multispeed fiber link check for
phy autoneg_advertised settings before overwriting with default
link capabilities

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix device capabilities of 82599 single speed fiber NICs.

82599 single speed fiber modules only support 10G/Full. Return
proper device capabilities while querrying the adapter and error
while changing device advertisement/speed/duplex capabilities.

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix SFP log messages

We had a wide range of log messages for the same sort of SFP
failure. This patch makes them all more similar and less
confusing along with converting them to dev_err.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: Remove private stats structure

Now that nothing uses the private stats structure we can remove it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc95xx: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rndis_host: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net1080: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

dm9601: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_eem: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Fix fib_trie rebalancing, part 3

Alas current delaying of freeing old tnodes by RCU in trie_rebalance
is still not enough because we can free a top tnode before updating a
t->trie pointer.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix the behavior of ethtool when ONBOOT=no

This is the same fix as commit
7959ea254ed18faee41160b1c50b3c9664735967 ("bnx2: Fix the behavior of
ethtool when ONBOOT=no"), but for bnx2x:

--------------------
    When configure in ifcfg-eth* is ONBOOT=no,
    the behavior of ethtool command is wrong.

        # grep ONBOOT /etc/sysconfig/network-scripts/ifcfg-eth2
        ONBOOT=no
        # ethtool eth2 | tail -n1
                Link detected: yes

    I think "Link detected" should be "no".
--------------------

Signed-off-by: Naohiro Ooiwa <nooiwa@miraclelinux.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: xmit sctp packet always return no route error

Commit 'net: skb->dst accessors'(adf30907d63893e4208dfe3f5c88ae12bc2f25d5)
broken the sctp protocol stack, the sctp packet can never be sent out after
Eric Dumazet's patch, which have typo in the sctp code.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vlad Yasevich <vladisalv.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/irda: convert bfin_sir to net_device_ops

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: use xfrm_addr_cmp() instead of compare addresses directly

Clean up to use xfrm_addr_cmp() instead of compare addresses directly.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Do not tack on TSO data to non-TSO packet

If a socket starts out on a non-TSO route, and then switches to
a TSO route, then we will tack on data to the tail of the tx queue
even if it started out life as non-TSO. This is suboptimal because
all of it will then be copied and checksummed unnecessarily.

This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before appending extra data beyond the MSS.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Stop non-TSO packets morphing into TSO

If a socket starts out on a non-TSO route, and then switches to
a TSO route, then the tail on the tx queue can morph into a TSO
packet, causing mischief because the rest of the stack does not
expect a partially linear TSO packet.

This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before declaring a packet as TSO.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6

nl802154: add module license and description

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>

nl802154: fix Oops in ieee802154_nl_get_dev

ieee802154_nl_get_dev() lacks check for the existance of the device
that was returned by dev_get_XXX, thus resulting in Oops for non-existing
devices. Fix it.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>

MAINTAINERS: ieee802154 lists are moderated for non-subscribers.

Note that our mailing list is moderated for non-subscribers.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>

netfilter: xtables: conntrack match revision 2

As reported by Philip, the UNTRACKED state bit does not fit within
the 8-bit state_mask member. Enlarge state_mask and give status_mask
a few more bits too.

Reported-by: Philip Craig <philipc@snapgear.com>
References: http://markmail.org/thread/b7eg6aovfh4agyz7
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: headers_check fix: linux/netfilter/xt_osf.h

fix the following 'make headers_check' warnings:

usr/include/linux/netfilter/xt_osf.h:40: found __[us]{8,16,32,64} type without #include <linux/types.h>

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: tcp conntrack: fix unacknowledged data detection with NAT

When NAT helpers change the TCP packet size, the highest seen sequence
number needs to be corrected. This is currently only done upwards, when
the packet size is reduced the sequence number is unchanged. This causes
TCP conntrack to falsely detect unacknowledged data and decrease the
timeout.

Fix by updating the highest seen sequence number in both directions after
packet mangling.

Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>

be2net: Fix to avoid a crash seen on PPC with LRO and Jumbo frames.

While testing the driver on PPC, we ran into a crash with LRO, Jumbo frames.
With CONFIG_PPC_64K_PAGES configured (a default in PPC), MAX_SKB_FRAGS drops to 3 and we were crossing the array limits on skb_shinfo(skb)->frags[].
Now we coalesce the frags from the same physical page into one slot in
skb_shinfo(skb)->frags[] and go to the next index when the frag is from

different physical page.

This patch is against the net-2.6 tree.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gro: Flush GRO packets in napi_disable_pending path

When NAPI is disabled while we're in net_rx_action, we end up
calling __napi_complete without flushing GRO packets. This is
a bug as it would cause the GRO packets to linger, of course it
also literally BUGs to catch error like this :)

This patch changes it to napi_complete, with the obligatory IRQ
reenabling. This should be safe because we've only just disabled
IRQs and it does not materially affect the test conditions in
between.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: Call skb_orphan before tproxy activates

As transparent proxying looks up the socket early and assigns
it to the skb for later processing, we must drop any existing
socket ownership prior to that in order to distinguish between
the case where tproxy is active and where it is not.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211: Use rcu_barrier() on unload.

The mac80211 module uses rcu_call() thus it should use rcu_barrier()
on module unload.

The rcu_barrier() is placed in mech.c ieee80211_stop_mesh() which is
invoked from ieee80211_stop() in case vif.type == NL80211_IFTYPE_MESH_POINT.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sunrpc: Use rcu_barrier() on unload.

The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
module unload.

Have not verified that the possibility for new call_rcu() callbacks
has been disabled. As a hint for checking, the functions calling
call_rcu() (unx_destroy_cred and generic_destroy_cred) are
registered as crdestroy function pointer in struct rpc_credops.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Use rcu_barrier() instead of syncronize_net() on unload.

When unloading modules that uses call_rcu() callbacks, then we must
use rcu_barrier(). This module uses syncronize_net() which is not
enough to be sure that all callback has been completed.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Use rcu_barrier() on module unload.

The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

decnet: Use rcu_barrier() on module unload.

The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues.

We add a rcu_barrier() anyhow for correctness.

The maintainer (Chrissie Caulfield) will look into the unload issue
when time permits.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Chrissie Caulfield <christine.caulfield@googlemail.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sky2: Fix checksum endianness

sky2 driver on PowerPC targets floods kernel log with following errors:

  eth1: hw csum failure.
  Call Trace:
  [ef84b8a0] [c00075e4] show_stack+0x50/0x160 (unreliable)
  [ef84b8d0] [c02fa178] netdev_rx_csum_fault+0x3c/0x5c
  [ef84b8f0] [c02f6920] __skb_checksum_complete_head+0x7c/0x84
  [ef84b900] [c02f693c] __skb_checksum_complete+0x14/0x24
  [ef84b910] [c0337e08] tcp_v4_rcv+0x4c8/0x6f8
  [ef84b940] [c031a9c8] ip_local_deliver+0x98/0x210
  [ef84b960] [c031a788] ip_rcv+0x38c/0x534
  [ef84b990] [c0300338] netif_receive_skb+0x260/0x36c
  [ef84b9c0] [c025de00] sky2_poll+0x5dc/0xcf8
  [ef84ba20] [c02fb7fc] net_rx_action+0xc0/0x144

The NIC is Yukon-2 EC chip revision 1.

Converting checksum field from le16 to CPU byte order fixes the issue.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio add missing GPL flag

Add missing GPL flag and description.

mdio: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint

Signed-off-by: Nicolas Reinecke <nr <at> das-labor.org>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: remove redundant test on unsigned

Unsigned boguscnt cannot be less than 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl_pq_mdio: Fix fsl_pq_mdio to work with modules

This patch fixes the case when ucc_geth or gianfar are compiled
as modules. Without this patch the call to phy_connect() fails.

Signed-off-by: Ionut Nicu <ionut.nicu@freescale.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: avoid wraparound for expired preferred lifetime

Avoid showing wrong high values when the preferred lifetime of an address
is expired.

Signed-off-by: Jens Rosenboom <me@jayr.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: missing check ACK flag of received segment in FIN-WAIT-2 state

RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop
the segment and return[Page 72]. But this check is missing in function
tcp_timewait_state_process(). This cause the segment with FIN flag but
no ACK has two diffent action:

Case 1:
    Node A                      Node B
              <-------------    FIN,ACK
                                (enter FIN-WAIT-1)
    ACK       ------------->
                                (enter FIN-WAIT-2)
    FIN       ------------->    discard
                                (move sk to tw list)

Case 2:
    Node A                      Node B
              <-------------    FIN,ACK
                                (enter FIN-WAIT-1)
    ACK       ------------->
                                (enter FIN-WAIT-2)
                                (move sk to tw list)
    FIN       ------------->

              <-------------    ACK

This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nf_conntrack: Use rcu_barrier()

RCU barriers, rcu_barrier(), is inserted two places.

In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
kmem_cache_destroy().  Firstly to make sure the callback to the
nf_ct_expect_free_rcu() code is still around.  Secondly because I'm
unsure about the consequence of having in flight
nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a
kmem_cache_destroy() slab destroy.

And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
invoked by __nf_ct_ext_add().  It might be more efficient to call
rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
thats make it more difficult to read the code (as the callback code
in located in nf_conntrack_extend.c).

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>

atl1*: add device_set_wakeup_enable to atl1*_set_wol

Tell PCI core that atl1* device can wakeup the system when WOL is
enabled by calling device_set_wakeup_enable.

Joerg noted that his atl1e device WOL fine after enabling it with
ethtool and changing /sys/class/net/eth0/device/power/wakeup to enabled
Tested on atl1e: https://bugzilla.novell.com/show_bug.cgi?id=493214

Tested by: Joerg Reuter <jreuter@novell.com>
Signed-off-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: generate Netlink RTM_DELADDR when destroying a device

Netlink address deletion events were not sent when a network device
vanished neither when Phonet was unloaded.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: publicize the Netlink notification function

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "veth: prevent oops caused by netdev destructor"

This reverts commit ae0e8e82205c903978a79ebf5e31c670b61fa5b4.

This change had two problems:

1) Since it frees the stats in the drivers' close method, we
can OOPS in the transmit routine.

2) stats are no longer remembered across ifdown/ifup which
disagrees with how every other device operates.

Thanks to analysis and test patch from Serge E. Hallyn
and initial OOPS report by Sachin Sant.

Signed-off-by: David S. Miller <davem@davemloft.net>

cpmac: fix compilation failure introduced with netdev_ops conversion

This patch fixes and obvious typo in the netdev_ops initialization:
ndo_so_ioctl should be ndo_do_ioctl.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipsec: Fix name of CAST algorithm

Our CAST algorithm is called cast5, not cast128. Clearly nobody
has ever used it :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 2.6.31-rc1

Revert "PCI: use ACPI _CRS data by default"

This reverts commit 9e9f46c44e487af0a82eb61b624553e2f7118f5b.

Quoting from the commit message:

"At this point, it seems to solve more problems than it causes, so let's
try using it by default. It's an easy revert if it ends up causing
trouble."

And guess what? The _CRS code causes trouble.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.infradead.org/battery-2.6

* git://git.infradead.org/battery-2.6:
  da9030_battery: Fix race between event handler and monitor
  Add MAX17040 Fuel Gauge driver
  w1: ds2760_battery: add support for sleep mode feature
  w1: ds2760: add support for EEPROM read and write
  ds2760_battery: cleanups in ds2760_battery_probe()

Merge branches 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs-2.6,audit-current}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  another race fix in jfs_check_acl()
  Get "no acls for this inode" right, fix shmem breakage
  inline functions left without protection of ifdef (acl)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL

another race fix in jfs_check_acl()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Get "no acls for this inode" right, fix shmem breakage

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL

Even though one cannot make use of the audit watch code without
CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
the audit rule filtering requires that it at least be compiled.

Thus build the audit_watch code when we build auditfilter like it was
before cfcad62c74abfef83762dc05a556d21bdf3980a2

Clearly this is a point of potential future cleanup..

Reported-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

inline functions left without protection of ifdef (acl)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix the write access fault problem for real

futex: Fix the write access fault problem for real

commit 64d1304a64 (futex: setup writeable mapping for futex ops which
modify user space data) did address only half of the problem of write
access faults.

The patch was made on two wrong assumptions:

1) access_ok(VERIFY_WRITE,...) would actually check write access.

   On x86 it does _NOT_. It's a pure address range check.

2) a RW mapped region can not go away under us.

   That's wrong as well. Nobody can prevent another thread to call
   mprotect(PROT_READ) on that region where the futex resides. If that
   call hits between the get_user_pages_fast() verification and the
   actual write access in the atomic region we are toast again.

The solution is to not rely on access_ok and get_user() for any write
access related fault on private and shared futexes. Instead we need to
fault it in with verification of write access.

There is no generic non destructive write mechanism which would fault
the user page in trough a #PF, but as we already know that we will
fault we can as well call get_user_pages() directly and avoid the #PF
overhead.

If get_user_pages() returns -EFAULT we know that we can not fix it
anymore and need to bail out to user space.

Remove a bunch of confusing comments on this issue as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

SLUB: Don't pass __GFP_FAIL for the initial allocation

SLUB uses higher order allocations by default but falls back to small
orders under memory pressure. Make sure the GFP mask used in the initial
allocation doesn't include __GFP_NOFAIL.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>