git.karo-electronics.de Git - mv-sheeva.git/log

udp: introduce sk_for_each_rcu_safenext()

Corey Minyard found a race added in commit 271b72c7fa82c2c7a795bc16896149933110672d
(udp: RCU handling for Unicast packets.)

"If the socket is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
socket has moved to the end of the new list, the traversal will not
complete properly on the list it should have, since the socket will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the hash."

This patch corrects this problem, introducing a new
sk_for_each_rcu_safenext() macro.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: udp_get_next() should use spin_unlock_bh()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: calculate udp_mem based on low memory instead of all memory

This patch mimics commit 57413ebc4e0f1e471a3b4db4aff9a85c083d090e
(tcp: calculate tcp_mem based on low memory instead of all memory)

The udp_mem array which contains limits on the total amount of memory
used by UDP sockets is calculated based on nr_all_pages. On a 32 bits
x86 system, we should base this on the number of lowmem pages.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: RCU handling for Unicast packets.

Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
writes should happen in the fast path.

Note: Multicasts and broadcasts still will need to take a lock,
because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: introduce struct udp_table and multiple spinlocks

UDP sockets are hashed in a 128 slots hash table.

This hash table is protected by *one* rwlock.

This rwlock is readlocked each time an incoming UDP message is handled.

This rwlock is writelocked each time a socket must be inserted in
hash table (bind time), or deleted from this table (close time)

This is not scalable on SMP machines :

1) Even in read mode, lock() and unlock() are atomic operations and
must dirty a contended cache line, shared by all cpus.

2) A writer might be starved if many readers are 'in flight'. This can
happen on a machine with some NIC receiving many UDP messages. User
process can be delayed a long time at socket creation/dismantle time.

This patch prepares RCU migration, by introducing 'struct udp_table
and struct udp_hslot', and using one spinlock per chain, to reduce
contention on central rwlock.

Introducing one spinlock per chain reduces latencies, for port
randomization on heavily loaded UDP servers. This also speedup
bindings to specific ports.

udp_lib_unhash() was uninlined, becoming to big.

Some cleanups were done to ease review of following patch
(RCUification of UDP Unicast lookups)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users

Open code NIP6_FMT in the one call inside sscanf and one user
of NIP6() that could use %p6 in the netfilter code.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

uwb: use the %pM formatting specifier in eda.c

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

infiniband: remove IPOIB_GID_RAW_ARG, IPOIB_GID_ARG, IPOIB_GID_FMT

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

infiniband: ipoib replace IPOIB_GID_FMT with %p6

Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

infiniband: use %p6 for printing message ids

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: propogate ethtool speed values

This enables more ethtool information. The speed and settings of the
underlying device are propagated up. This makes services like SNMP that
use ethtool to get speed setting, work when managing a vlan, without adding
silly heurtistics into SNMP daemon.

For the driver info, just use existing driver strings.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

veth: remove unused list

The veth network device is stored in a list in the netdev private.
AFAICS, this list is never used so I removed this list from the code.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

veth: Remove useless veth field

The veth private structure contains a netdev pointer refering to its peer.
This field is never used and it is pointless because if we can access,
the veth_priv, that means we already have the netdev which is stored
in veth_priv->dev.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net, misc: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

misc: replace NIP6_FMT with %p6 format specifier

The iscsi_ibft.c changes are almost certainly a bugfix as the
pointer 'ip' is a u8 *, so they never print the last 8 bytes
of the IPv6 address, and the eight bytes they do print have
a zero byte with them in each 16-bit word.

Other than that, this should cause no difference in functionality.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: replace all current users of NIP6_SEQFMT with %#p6

The define in kernel.h can be done away with at a later time.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

printk: add %p6 format specifier for IPv6 addresses

Takes a pointer to a IPv6 address and formats it in the usual
colon-separated hex format:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx

Each 16 bit word is printed in network-endian byteorder.

%#p6 is also supported and will omit the colons.

%p6 is a replacement for NIP6_FMT and NIP6()
%#p6 is a replacement for NIP6_SEQFMT and NIP6()

Note that NIP6() took a struct in6_addr whereas this takes a pointer
to a struct in6_addr.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: Notify changes in UDP encapsulation via netlink

Add new_mapping() implementation to the netlink xfrm_mgr to notify
address/port changes detected in UDP encapsulated ESP packets.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: don't use INIT_RCU_HEAD

call_rcu() will unconditionally rewrite RCU head anyway.
Applies to
struct neigh_parms
struct neigh_table
struct net
struct cipso_v4_doi
struct in_ifaddr
struct in_device
rt->u.dst

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: reduce structures when XFRM=n

ifdef out
* struct sk_buff::sp (pointer)
* struct dst_entry::xfrm (pointer)
* struct sock::sk_policy (2 pointers)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: constify struct nlattr * arg to parsing functions

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: Coexist with the sysfs limitations v2

To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time.  To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs.  The gain in ease of testing
and code coverage should be significant.

Changelog:

v2: As pointed out by Benjamin Thery it only makes sense to call
    device_rename in the initial network namespace for now.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert more to %pM

A number of places still use %02x:...:%02x because it's
in debug statements or for no real reason. Make a few
of them use %pM.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert print_mac to %pM

This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.

I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211: convert to %pM away from print_mac

Also remove a few stray DECLARE_MAC_BUF that were no longer
used at all.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

printk: add %pM format specifier for MAC addresses

Add format specifiers for printing out six colon-separated bytes:

MAC addresses (%pM):
xx:xx:xx:xx:xx:xx

%#pM is also supported and omits the colon separators.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: implement emergency route cache rebulds when gc_elasticity is exceeded

This is a patch to provide on demand route cache rebuilding.  Currently, our
route cache is rebulid periodically regardless of need.  This introduced
unneeded periodic latency.  This patch offers a better approach.  Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire.  Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace.  This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected.  This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: fw-sbp2: fix races
  firewire: fw-sbp2: delay first login to avoid retries
  firewire: fw-ohci: initialization failure path fixes
  firewire: fw-ohci: don't leak dma memory on module removal
  firewire: fix struct fw_node memory leak
  firewire: Survive more than 256 bus resets

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: ASoC: Blackfin: update SPORT0 port selector (v2)
  ALSA: hda - Restore default pin configs for realtek codecs
  sound: use a common working email address
  pci: use pci_ioremap_bar() in sound/

Merge branches 'topic/fix/asoc', 'topic/fix/hda', 'topic/fix/misc' and 'topic/pci-ioremap-bar' into for-linus

ALSA: ASoC: Blackfin: update SPORT0 port selector (v2)

- Setting the TFS pin selector for SPORT 0 based on whether the selected
  port id F or G. If the port is F then no conflict should exist for the
  TFS. When Port G is selected and EMAC then there is a conflict between
  the PHY interrupt line and TFS.  Current settings prevent the conflict
  by ignoring the TFS pin when Port G is selected. This allows both
  ssm2602 using Port G and EMAC concurrently.

- some code cleanup

Signed-off-by: Cliff Cai <cliff.cai@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Restore default pin configs for realtek codecs

Some machines have broken BIOS resume that doesn't restore the default
pin configuration properly, which results in a wrong detection of HP
pin. This causes a silent speaker output due to missing HP detection.
Related bug: Novell bug#406101
https://bugzilla.novell.com/show_bug.cgi?id=406101

This patch fixes the issue by saving/restoring the default pin configs
by the driver itself.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  syncookies: fix inclusion of tcp options in syn-ack
  libertas: free sk_buff with kfree_skb
  btsdio: free sk_buff with kfree_skb
  Phonet: do not reply to indication reset packets
  Phonet: include generic link-layer header size in MAX_PHONET_HEADER

Switch to a valid email address...

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tidy up addresses in random drivers

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

leds: da903x: fix the building failure of incomplete type of 'work'

The leds-da903x LED driver was missing the proper #include of
linux/workqueue.h, but happened to compile on ARM due to implied
includes through other header files.

We do need the explict include on other architectures (reported at least
for x86-64).

Reported-tested-and-acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sound: use a common working email address

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

syncookies: fix inclusion of tcp options in syn-ack

David Miller noticed that commit
33ad798c924b4a1afad3593f2796d465040aadd5 '(tcp: options clean up')
did not move the req->cookie_ts check.
This essentially disabled commit 4dfc2817025965a2fc78a18c50f540736a6b5c24
'[Syncookies]: Add support for TCP options via timestamps.'.

This restores the original logic.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

libertas: free sk_buff with kfree_skb

free sk_buff with kfree_skb, instead of kree

Signed-off-by: Sergio Luis <sergio@larces.uece.br>
Signed-off-by: David S. Miller <davem@davemloft.net>

btsdio: free sk_buff with kfree_skb

free sk_buff with kfree_skb, instead of kree

Signed-off-by: Sergio Luis <sergio@larces.uece.br>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: do not reply to indication reset packets

This fixes a potential error packet loop.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: include generic link-layer header size in MAX_PHONET_HEADER

This fixes an OOPS in hard_header if a Phonet address is assigned to a
non-Phonet network interface.

Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: allow extended partitions on md devices.
  md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state
  md: use sysfs_notify_dirent to notify changes to md/array_state

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: psmouse - add support for Elantech touchpads
Input: i8042 - add Blue FB5601 to noloop exception table

Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd

* 'for-linus' of git://git.o-hand.com/linux-mfd:
mfd: Make WM8400 depend on I2C until SPI is submitted
mfd: add missing Kconfig entry for da903x

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb

* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb:
uwb: build UWB before USB/WUSB

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: Add support for Sony Vaio VGX-TP1E
  HID: fix lock imbalance in hiddev
  HID: fix lock imbalance in hidraw
  HID: fix hidbus/appletouch device binding regression
  HID: add hid_type to general hid struct
  HID: quirk for OLED devices present in ASUS G50/G70/G71
  HID: Remove "default m" for Thrustmaster and Zeroplus
  HID: fix hidraw_exit section mismatch
  HID: add support for another Gyration remote control
  Revert "HID: Invert HWHEEL mappings for some Logitech mice"

docbooks: fix fatal filename errors

Fix docbook fatal errors (file location changed):

docproc: lin2628-rc1/include/asm-x86/io_32.h: No such file or directory
make[1]: *** [Documentation/DocBook/deviceiobook.xml] Error 1

docproc: lin2628-rc1/include/asm-x86/atomic_32.h: No such file or directory
make[1]: *** [Documentation/DocBook/kernel-api.xml] Error 1

docproc: lin2628-rc1/include/asm-x86/mca_dma.h: No such file or directory
make[1]: *** [Documentation/DocBook/mcabook.xml] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-doc: allow more whitespace in macros

Allow macros that are annotated with kernel-doc to contain whitespace
between the '#' and "define". It's valid and being used, so allow it.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  leds-hp-disk: fix build warning
  ACPI: Oops in ACPI with git latest
  ACPI suspend: build fix for ACPI_SLEEP=n && XEN_SAVE_RESTORE=y.
  toshiba_acpi: always call input_sync() after input_report_switch()
  ACPI: Always report a sync event after a lid state change
  ACPI: cpufreq, processor: fix compile error in drivers/acpi/processor_perflib.c
  i7300_idle: Fix compile warning CONFIG_I7300_IDLE_IOAT_CHANNEL not defined
  i7300_idle: Cleanup based review comments
  i7300_idle: Disable ioat channel only on platforms where ile driver can load

Linux 2.6.28-rc2

.. fix all the worst problems in -rc1

m68k: Disable Amiga serial console support if modular

If CONFIG_AMIGA_BUILTIN_SERIAL=m, I get the following warnings:

| drivers/char/amiserial.c: At top level:
| drivers/char/amiserial.c:2138: warning: data definition has no type or storage class
| drivers/char/amiserial.c:2138: warning: type defaults to 'int' in declaration of 'console_initcall'
| drivers/char/amiserial.c:2138: warning: parameter names (without types) in function declaration
| drivers/char/amiserial.c:2134: warning: 'amiserial_console_init' defined but not used

because console_initcall() is not defined (nor really sensible) in the
modular case.

So disable serial console support if the driver is modular.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

epoll: avoid double-inserts in case of EFAULT

In commit f337b9c58332bdecde965b436e47ea4c94d30da0 ("epoll: drop
unnecessary test") Thomas found that there is an unnecessary (always
true) test in ep_send_events().  The callback never inserts into
->rdllink while the send loop is performed, and also does the
~EP_PRIVATE_BITS test.  Given we're holding the mutex during this time,
the conditions tested inside the loop are always true.

HOWEVER.

The test "!ep_is_linked(&epi->rdllink)" wasn't there because we insert
into ->rdllink, but because the send-events loop might terminate before
the whole list is scanned (-EFAULT).

In such cases, when the loop terminates early, and when a (leftover)
file received an event while we're performing the lockless loop, we need
such test to avoid to double insert the epoll items.  The list_splice()
done a few steps below, will correctly re-insert the ones that were left
on "txlist".

This should fix the kenrel.org bugzilla entry 11831.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

select: deal with math overflow from borderline valid userland data

Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select(). The old kernels accepted this just
fine, so the new kernels must too.

However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.

This patch fixes this by adding the excess microseconds to the seconds
field.

Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.

Reported-by: Carlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

wireless: fix regression caused by regulatory config option

The default for the regulatory compatibility option is wrong;
if you picked the default you ended up with a non-functional wifi
system (at least I did on Fedora 9 with iwl4965).
I don't think even the October 2008 releases of the various distros
has the new userland so clearly the default is wrong, and also
we can't just go about deleting this in 2.6.29...

Change the default to "y" and also adjust the config text a little to
reflect this.

This patch fixes regression #11859

With thanks to Johannes Berg for the diagnostics

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroup: remove unused variable

/scratch/sfr/next/kernel/cgroup.c: In function 'cgroup_tasks_start':
/scratch/sfr/next/kernel/cgroup.c:2107: warning: unused variable 'i'

Introduced in commit cc31edceee04a7b87f2be48f9489ebb72d264844 "cgroups:
convert tasks file to use a seq_file with shared pid array".

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6

* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  hwmon: (abituguru3) enable DMI probing feature on AW9D-MAX
  hwmon: (abituguru3) Cosmetic whitespace fixes
  hwmon: (adt7473) Fix voltage conversion routines
  hwmon: (lm90) Add support for the LM99 16 degree offset
  hwmon: (lm90) Fix handling of hysteresis value
  hwmon-vid: Add support for AMD family 10h CPUs
  hwmon: (w83781d) Fix linking when built-in

r8169: revert "read MAC address from EEPROM on init"

This reverts commit 7bf6bf4803df1adc927f585168d2135fb019c698.

The code has both a short existence and an increasing track of failures
despite some work to amend it for -rc1. It is not just a matter of
reading the eeprom: sometimes the eeprom is read correctly, then the mac
address is not written correctly back into the mac registers.

Some chipsets seem to work reliably but it is not clear at this point if
the code can simply be made to work on a per-chipset basis and post -rc1
is not the place where I want to experiment these things.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arm ide breakage

a) semicolon before the function body is a bad idea
b) it's const struct foo, not struct const foo
c) incidentally, it's ecard_remove_driver(), not ecard_unregister_driver()
d) compiling is occasionally useful.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fix allmodconfig breakage

If you use KCONFIG_ALLCONFIG (even with empty file) you get broken
allmodconfig/allyesconfig; CONFIG_MODULES gets turned off, with obvious
massive fallout.

Breakage had been introduced when conf_set_all_new_symbols() got used
for allmodconfig et.al.

What happens is that sym_calc_value(modules_sym) done in
conf_read_simple() sets SYMBOL_VALID on both modules_sym and MODULES.
When we get to conf_set_all_new_symbols(), we set sym->def[S_DEF_USER]
on everything, but it has no effect on sym->curr for the symbols that
already have SYMBOL_VALID - these are stuck.

Solution: use sym_clear_all_valid() in there. Note that it makes
reevaluation of modules_sym redundant - sym_clear_all_valid() will do
that itself.

[ Fixes http://bugzilla.kernel.org/show_bug.cgi?id=11512, says Alexey ]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwmon: (abituguru3) enable DMI probing feature on AW9D-MAX

Switch the AW9D-MAX over from port probing to the preferred DMI
probe method.

Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

hwmon: (abituguru3) Cosmetic whitespace fixes

As the probable result of zealous copy/pasting, many supported boards
contain sensor names with trailing whitespace. Though this is not a
huge problem, it is inconsistent with other sensor names, and with
other similar hwmon drivers.

Additionally, the DMI nag message added in 2.6.27 was missing a
space between two sentence fragments -- might as well clean that up
too.

Doesn't alter any kernel text, just data.

Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

hwmon: (adt7473) Fix voltage conversion routines

Fix voltage conversion routines. Based on an earlier patch from
Paulius Zaleckas.

According to the datasheet voltage is scaled with resistors and
value 192 is nominal voltage. 0 is 0V.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Cc: Darrick J. Wong <djwong@us.ibm.com>

hwmon: (lm90) Add support for the LM99 16 degree offset

The LM99 differs from the LM86, LM89 and LM90 in that it reports
remote temperatures (temp2) 16 degrees lower than they really are. So
far we have been cheating and handled this in userspace but it really
should be handled by the driver directly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>

hwmon: (lm90) Fix handling of hysteresis value

There are several problems in the way the hysteresis value is handled
by the lm90 driver:

* In show_temphyst(), specific handling of the MAX6646 is missing, so
  the hysteresis is reported incorrectly if the critical temperature
  is over 127 degrees C.
* In set_temphyst(), the new hysteresis register value is written to
  the chip but data->temp_hyst isn't updated accordingly, so there is
  a short period of time (up to 2 seconds) where the old hystereris
  value will be returned while the new one is already active.
* In set_temphyst(), the critical temperature which is used as a base
  to compute the value of the hysteresis register lacks
  device-specific handling. As a result, the value of the hysteresis
  register might be incorrect for the ADT7461 and MAX6646 chips.

Fix these 3 bugs.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Nate Case <ncase@xes-inc.com>

hwmon-vid: Add support for AMD family 10h CPUs

The AMD family 10h CPUs use the same VID decoding table as the family
0Fh CPUs.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Rudolf Marek <r.marek@assembler.cz>

hwmon: (w83781d) Fix linking when built-in

When w83781d is built-in, the final links fails with the following vague error
message:

`.exit.text' referenced in section `.init.text' of drivers/built-in.o: defined
in discarded section `.exit.text' of drivers/built-in.o

w83781d_isa_unregister() cannot be marked __exit, as it's also called from
sensors_w83781d_init(), which is marked __init.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

firewire: fw-sbp2: fix races

1: There is a small race between queue_delayed_work() and its
   corresponding kref_get().  Do the kref_get first, and _put it again
   if the queue_delayed_work() failed, so there is no chance of the
   kref going to zero while the work is scheduled.
2: An SBP2_LOGOUT_REQUEST could be sent out with a login_id full of
   garbage.  Initialize it to an invalid value so we can tell if we
   ever got a valid login_id.
3: The node ID and generation may have changed but the new values may
   not yet have been recorded in lu and tgt when the final logout is
   attempted.  Use the latest values from the device in
   sbp2_release_target().

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-sbp2: delay first login to avoid retries

This optimizes firewire-sbp2's device probe for the case that the local
node and the SBP-2 node were discovered at the same time.  In this case,
fw-core's bus management work and fw-sbp2's login and SCSI probe work
are scheduled in parallel (in the globally shared workqueue and in
fw-sbp2's workqueue, respectively).  The bus reset from fw-core may then
disturb and extremely delay the login and SCSI probe because the latter
fails with several command timeouts and retries and has to be retried
from scratch.

We avoid this particular situation of sbp2_login() and fw_card_bm_work()
running in parallel by delaying the first sbp2_login() a little bit.

This is meant to be a short-term fix for
https://bugzilla.redhat.com/show_bug.cgi?id=466679.  In the long run,
the SCSI probe, i.e. fw-sbp2's call of __scsi_add_device(), should be
parallelized with sbp2_reconnect().

Problem reported and fix tested and confirmed by Alex Kanavin.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: initialization failure path fixes

Fix leaks when pci_probe fails. Simplify error log strings.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: don't leak dma memory on module removal

The transmit and receive context dma memory was not being freed on
module removal. Neither was the config rom memory. Fix that.

The ab->next assignment is pure paranoia.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fix struct fw_node memory leak

With the bus_resets patch applied, it is easy to see this memory leak
by repeatedly resetting the firewire bus while running slabtop in
another window. Just watch kmalloc-32 grow and grow...

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: Survive more than 256 bus resets

The "color" is used during the topology building after a bus reset,
hovever in "struct fw_node"s it is stored in a u8, but in struct fw_card
it is stored in an int. When the value wraps in one struct, but not
the other, disaster strikes.

Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10922.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix duplicate entries returned from getdents() system call
ext3: Fix duplicate entries returned from getdents() system call

Revert "Call init_workqueues before pre smp initcalls."

This reverts commit a802dd0eb5fc97a50cf1abb1f788a8f6cc5db635 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.

It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall(). That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext4: Fix duplicate entries returned from getdents() system call

Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

ext3: Fix duplicate entries returned from getdents() system call

Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice. This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.

This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

Merge branch 'i7300_idle' into release

leds-hp-disk: fix build warning

drivers/leds/leds-hp-disk.c:59: warning: passing argument 4 of ‘acpi_evaluate_integer’ from incompatible pointer type

Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Oops in ACPI with git latest

ACPI Warning (nseval-0168): Insufficient arguments - method [_OSC] needs 5, found 4 [20080926]
ACPI Warning (nspredef-0252): \_SB_.PCI0._OSC: Parameter count mismatch - ASL declared 5, expected 4 [20080926]
ACPI Error (nspredef-0163): \_SB_.PCI0._OSC: Missing expected return value [20080926]
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0237671>] acpi_run_osc+0xa1/0x170

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI suspend: build fix for ACPI_SLEEP=n && XEN_SAVE_RESTORE=y.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>

toshiba_acpi: always call input_sync() after input_report_switch()

Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: Always report a sync event after a lid state change

Currently not always an EV_SYN event is reported to userland
after the EV_SW SW_LID event has been sent. This is easy to verify
by using “input-events” from input-utils and just closing and opening
the lid.

Signed-off-by: Guillem Jover <guillem.jover@nokia.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI: cpufreq, processor: fix compile error in drivers/acpi/processor_perflib.c

When trying to build 2.6.28-rc1 on ia64, make aborts with:

  CC      drivers/acpi/processor_perflib.o
  drivers/acpi/processor_perflib.c:41:28: error: asm/cpufeature.h: No such file or directory
  drivers/acpi/processor_perflib.c: In function ‘acpi_processor_get_performance_info’:
  drivers/acpi/processor_perflib.c:364: error: implicit declaration of function ‘boot_cpu_has’
  drivers/acpi/processor_perflib.c:364: error: ‘X86_FEATURE_EST’ undeclared (first use in this function)
  drivers/acpi/processor_perflib.c:364: error: (Each undeclared identifier is reported only once
  drivers/acpi/processor_perflib.c:364: error: for each function it appears in.)
  make[2]: *** [drivers/acpi/processor_perflib.o] Error 1
  make[1]: *** [drivers/acpi] Error 2
  make: *** [drivers] Error 2

this patch fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>

i7300_idle: Fix compile warning CONFIG_I7300_IDLE_IOAT_CHANNEL not defined

When I7300_idle driver is not configured, there is a compile time
warning about IDLE_IOAT_CHANNEL not defined. Fix it.

Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

i7300_idle: Cleanup based review comments

Cleanup of i7300 idle driver based on review comments from Randy Dunlap,
Andi Kleen and Len Brown.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

i7300_idle: Disable ioat channel only on platforms where ile driver can load

Based on input from Andi Kleen:
share the platform detection code with ioat_dma and disable the channel in
dma engine only for specific platforms.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

mfd: Make WM8400 depend on I2C until SPI is submitted

Otherwise we could build in WM8400 but not I2C.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: add missing Kconfig entry for da903x

This one was accidentally left out during the rc1 mfd merge.

Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

uwb: build UWB before USB/WUSB

The WHCI-HCD driver in drivers/usb/host/ depends on the umc driver in
drivers/uwb/.

Signed-off-by: David Vrabel <david.vrabel@csr.com>

libata: fix bug with non-ncq devices

The recent commit 2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e ("libata:
switch to using block layer tagging support") to enable support for
block layer tagging in libata was broken for non-NCQ devices

The block layer initializes the tag field to -1 to detect invalid uses
of a tag, and if the libata devices does NOT support NCQ, we just used
that field to index the internal command list. So we need to check for
-1 first and only use the tag field if it's valid.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Paul Mundt <lethal@linux-sh.org>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linux 2.6.28-rc1

Fix PCI hotplug printk format

Fix printk format warning:

drivers/pci/hotplug/acpiphp_ibm.c:207: warning: format '%08lx' expects type 'long unsigned int', but argument 3 has type 'long long unsigned int'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: remove unused resource assignment in pci_read_bridge_bases()
  PCI hotplug: shpchp: message refinement
  PCI hotplug: shpchp: replace printk with dev_printk
  PCI: add routines for debugging and handling lost interrupts
  PCI hotplug: pciehp: message refinement
  PCI: fix ARI code to be compatible with mixed ARI/non-ARI systems
  PCI hotplug: cpqphp: fix kernel NULL pointer dereference

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  tcp: Restore ordering of TCP options for the sake of inter-operability
  net: Fix disjunct computation of netdev features
  sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED state
  sctp: Fix to handle SHUTDOWN in SHUTDOWN-PENDING state
  sctp: Add check for the TSN field of the SHUTDOWN chunk
  sctp: Drop ICMP packet too big message with MTU larger than current PMTU
  p54: enable 2.4/5GHz spectrum by eeprom bits.
  orinoco: reduce stack usage in firmware download path
  ath5k: fix suspend-related oops on rmmod
  [netdrvr] fec_mpc52xx: Implement polling, to make netconsole work.
  qlge: Fix MSI/legacy single interrupt bug.
  smc911x: Make the driver safer on SMP
  smc911x: Add IRQ polarity configuration
  smc911x: Allow Kconfig dependency on ARM
  sis190: add identifier for Atheros AR8021 PHY
  8139x: reduce message severity on driver overlap
  igb: add IGB_DCA instead of selecting INTEL_IOATDMA
  igb: fix tx data corruption with transition to L0s on 82575
  ehea: Fix memory hotplug support
  netdev: DM9000: remove BLACKFIN hacking in DM9000 netdev driver
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  math-emu: Fix thinko in _FP_DIV
  math-emu: Fix signalling of underflow and inexact while packing result.
  sparc: Add checkstack support
  sparc: correct section of current_pc()
  sparc: correct section of apc_no_idle
  sparc64: Fix race in arch/sparc64/kernel/trampoline.S

PCI: remove unused resource assignment in pci_read_bridge_bases()

This cleanup removes the resource assignment in pci_read_bridge_bases()
since it has taken care by pci_alloc_child_bus() when allocating the bus:

        /* Set up default resource pointers and names.. */
        for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
                child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
                child->resource[i]->name = child->name;
        }

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

PCI hotplug: shpchp: message refinement

This patch refines messages in shpchp module. The main changes are as
follows:

- remove the trailing "."
- remove __func__ as much as possible
- capitalize the first letter of messages
- show PCI device address including its domain

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>