Paul Mundt [Thu, 14 Feb 2008 22:48:45 +0000 (14:48 -0800)]
net: xfrm statistics depend on INET
net/built-in.o: In function `xfrm_policy_init':
/home/pmundt/devel/git/sh-2.6.25/net/xfrm/xfrm_policy.c:2338: undefined reference to `snmp_mib_init'
snmp_mib_init() is only built in if CONFIG_INET is set.
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Turton [Wed, 13 Feb 2008 07:13:48 +0000 (23:13 -0800)]
[NET]: Improve cache line coherency of ingress qdisc
Move the ingress qdisc members of struct net_device from the transmit
cache line to the receive cache line to avoid cache line ping-pong.
These members are only used on the receive path.
Signed-off-by: Neil Turton <nturton@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race in Linux kernel file net/core/dev.c, function dev_close.
The function calls function dev_deactivate, which calls function
dev_watchdog_down that deletes the watchdog timer. However, after that, a
driver can call netif_carrier_ok, which calls function
__netdev_watchdog_up that can add the watchdog timer again. Function
unregister_netdevice calls function dev_shutdown that traps the bug
!timer_pending(&dev->watchdog_timer). Moving dev_deactivate after
netif_running() has been cleared prevents function netif_carrier_on
from calling __netdev_watchdog_up and adding the watchdog timer again.
Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 13 Feb 2008 06:50:35 +0000 (22:50 -0800)]
[IPSEC]: Fix bogus usage of u64 on input sequence number
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian. This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[RTNETLINK]: Send a single notification on device state changes.
In do_setlink() a single notification is sent at the end of the
function if any modification occured. If the address has been changed,
another notification is sent.
Both of them is required because originally only the NETDEV_CHANGEADDR
notification was sent and although device state change implies address
change, some programs may expect the original notification. It remains
for compatibity.
If set_operstate() is called from do_setlink(), it doesn't send a
notification, only if it is called from rtnl_create_link() as earlier.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Wed, 13 Feb 2008 06:37:19 +0000 (22:37 -0800)]
[NETLABEL]: Don't produce unused variables when IPv6 is off.
Some code declares variables on the stack, but uses them
under #ifdef CONFIG_IPV6, so thay become unused when ipv6
is off. Fortunately, they are used in a switch's case
branches, so the fix is rather simple.
Is it OK from coding style POV to add braces inside "cases",
or should I better avoid such style and rework the patch?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Wed, 13 Feb 2008 06:16:33 +0000 (22:16 -0800)]
[GENETLINK]: Relax dances with genl_lock.
The genl_unregister_family() calls the genl_unregister_mc_groups(),
which takes and releases the genl_lock and then locks and releases
this lock itself.
Relax this behavior, all the more so the genl_unregister_mc_groups()
is called from genl_unregister_family() only.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 13 Feb 2008 02:07:27 +0000 (18:07 -0800)]
[IPV6]: Fix IPsec datagram fragmentation
This is a long-standing bug in the IPsec IPv6 code that breaks
when we emit a IPsec tunnel-mode datagram packet. The problem
is that the code the emits the packet assumes the IPv6 stack
will fragment it later, but the IPv6 stack assumes that whoever
is emitting the packet is going to pre-fragment the packet.
In the long term we need to fix both sides, e.g., to get the
datagram code to pre-fragment as well as to get the IPv6 stack
to fragment locally generated tunnel-mode packet.
For now this patch does the second part which should make it
work for the IPsec host case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2008 05:45:44 +0000 (21:45 -0800)]
[NDISC]: Fix race in generic address resolution
Frank Blaschka provided the bug report and the initial suggested fix
for this bug. He also validated this version of this fix.
The problem is that the access to neigh->arp_queue is inconsistent, we
grab references when dropping the lock lock to call
neigh->ops->solicit() but this does not prevent other threads of
control from trying to send out that packet at the same time causing
corruptions because both code paths believe they have exclusive access
to the skb.
The best option seems to be to hold the write lock on neigh->lock
during the ->solicit() call. I looked at all of the ndisc_ops
implementations and this seems workable. The only case that needs
special care is the IPV4 ARP implementation of arp_solicit(). It
wants to take neigh->lock as a reader to protect the header entry in
neigh->ha during the emission of the soliciation. We can simply
remove the read lock calls to take care of that since holding the lock
as a writer at the caller providers a superset of the protection
afforded by the existing read locking.
The rest of the ->solicit() implementations don't care whether the
neigh is locked or not.
Signed-off-by: David S. Miller <davem@davemloft.net>
David Newall [Tue, 12 Feb 2008 05:41:30 +0000 (21:41 -0800)]
hci_ldisc: fix null pointer deref
Arjan:
With the help of kerneloops.org I've spotted a nice little interaction
between the TTY layer and the bluetooth code, however the tty layer is not
something I'm all too familiar with so I rather ask than brute-force fix the
code incorrectly.
The raw details are at:
http://www.kerneloops.org/search.php?search=uart_flush_buffer
What happens is that, on closing the bluetooth tty, the tty layer goes
into the release_dev() function, which first does a bunch of stuff, then
sets the file->private_data to NULL, does some more stuff and then calls the
ldisc close function. Which in this case, is hci_uart_tty_close().
Now, hci_uart_tty_close() calls hci_uart_close() which clears some
internal bit, and then calls hci_uart_flush()... which calls back to the
tty layers' uart_flush_buffer() function. (in drivers/bluetooth/hci_tty.c
around line 194) Which then WARN_ON()'s because that's not allowed/supposed
to be called this late in the shutdown of the port....
Should the bluetooth driver even call this flush function at all??
David:
This seems to be what happens: Hci_uart_close() flushes using
hci_uart_flush(). Subsequently, in hci_dev_do_close(), (one step in
hci_unregister_dev()), hci_uart_flush() is called again. The comment in
uart_flush_buffer(), relating to the WARN_ON(), indicates you can't flush
after the port is closed; which sounds reasonable. I think hci_uart_close()
should set hdev->flush to NULL before returning. Hci_dev_do_close() does
check for this. The code path is rather involved and I'm not entirely clear
of all steps, but I think that's what should be done.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Tue, 12 Feb 2008 05:36:39 +0000 (21:36 -0800)]
[AX25] ax25_timer: use mod_timer instead of add_timer
According to one of Jann's OOPS reports it looks like
BUG_ON(timer_pending(timer)) triggers during add_timer()
in ax25_start_t1timer(). This patch changes current use
of: init_timer(), add_timer() and del_timer() to
setup_timer() with mod_timer(), which should be safer
anyway.
Reported-by: Jann Traschewski <jann@gmx.de> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This lockdep report shows that ax25_route_lock is taken for reading in
softirq context, and for writing in process context with BHs enabled.
So, to make this safe, all write_locks in ax25_route.c are changed to
_bh versions.
Reported-by: Jann Traschewski <jann@gmx.de>, Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Tue, 12 Feb 2008 05:24:56 +0000 (21:24 -0800)]
[AX25] af_ax25: remove sock lock in ax25_info_show()
This lockdep warning:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24 #3
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
> (ax25_list_lock){-+..}, at: [<f91dd3b1>] ax25_destroy_socket+0x171/0x1f0 [ax25]
>
> but task is already holding lock:
> (slock-AF_AX25){-+..}, at: [<f91dbabc>] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25]
>
> which lock already depends on the new lock.
...
shows that ax25_list_lock and slock-AF_AX25 are taken in different
order: ax25_info_show() takes slock (bh_lock_sock(ax25->sk)) while
ax25_list_lock is held, so reversely to other functions. To fix this
the sock lock should be moved to ax25_info_start(), and there would
be still problem with breaking ax25_list_lock (it seems this "proper"
order isn't optimal yet). But, since it's only for reading proc info
it seems this is not necessary (e.g. ax25_send_to_raw() does similar
reading without this lock too).
So, this patch removes sock lock to avoid deadlock possibility; there
is also used sock_i_ino() function, which reads sk_socket under proper
read lock. Additionally printf format of this i_ino is changed to %lu.
Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use key/offset caching to change /proc/net/route (use by iputils route)
from O(n^2) to O(n). This improves performance from 30sec with 160,000
routes to 1sec.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2008 01:50:30 +0000 (17:50 -0800)]
[IPV4]: Remove IP_TOS setting privilege checks.
Various RFCs have all sorts of things to say about the CS field of the
DSCP value. In particular they try to make the distinction between
values that should be used by "user applications" and things like
routing daemons.
This seems to have influenced the CAP_NET_ADMIN check which exists for
IP_TOS socket option settings, but in fact it has an off-by-one error
so it wasn't allowing CS5 which is meant for "user applications" as
well.
Further adding to the inconsistency and brokenness here, IPV6 does not
validate the DSCP values specified for the IPV6_TCLASS socket option.
The real actual uses of these TOS values are system specific in the
final analysis, and these RFC recommendations are just that, "a
recommendation". In fact the standards very purposefully use
"SHOULD" and "SHOULD NOT" when describing how these values can be
used.
In the final analysis the only clean way to provide consistency here
is to remove the CAP_NET_ADMIN check. The alternatives just don't
work out:
1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
setups.
2) If we just fix the off-by-one error in the class comparison in
IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
default. So people will just ask for a sysctl asking to
override that.
I checked several other freely available kernel trees and they
do not make any privilege checks in this area like we do. For
the BSD stacks, this goes back all the way to Stevens Volume 2
and beyond.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 12 Feb 2008 04:52:01 +0000 (20:52 -0800)]
WMI: initialize wmi_blocks.list even if ACPI is disabled
Even if we don't want to register the WMI driver, we should initialize
the wmi_blocks list to be empty, since we don't want the wmi helper
functions to oops just because that basic list has not even been set up.
With this, "find_guid()" will happily return "not found" rather than
oopsing all over the place, and the callers will then just automatically
return false or AE_NOT_FOUND as appropriate.
Roland McGrath [Mon, 11 Feb 2008 22:38:51 +0000 (14:38 -0800)]
x86: vdso_install fix
The makefile magic for installing the 32-bit vdso images on disk had a
little error. A single-line change would fix that bug, but this does a
little more to reduce the error-prone duplication of this bit of
makefile variable magic.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 12 Feb 2008 04:30:22 +0000 (13:30 +0900)]
mempolicy: silently restrict nodemask to allowed nodes
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully populated
[out to MAXNUMNODES] nodemask. set_mempolicy() [in do_set_mempolicy()]
calls contextualize_policy() which requires that the nodemask be a
subset of the current task's mems_allowed; else EINVAL will be returned.
A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
i.e., nodes with memory. So, a fully populated nodemask will be
declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the nodemask
that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy() will
complain [again, resulting in EINVAL] if the nodemask contains any
memoryless nodes. This is somewhat redundant as mpol_new() will remove
memoryless nodes for interleave policy, as will bind_zonelist()--called
by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
[ Patch history:
v1 -> v2:
- Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
- As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
v2 -> v3:
- As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner. ]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Corbet [Mon, 11 Feb 2008 23:17:33 +0000 (16:17 -0700)]
Be more robust about bad arguments in get_user_pages()
So I spent a while pounding my head against my monitor trying to figure
out the vmsplice() vulnerability - how could a failure to check for
*read* access turn into a root exploit? It turns out that it's a buffer
overflow problem which is made easy by the way get_user_pages() is
coded.
In particular, "len" is a signed int, and it is only checked at the
*end* of a do {} while() loop. So, if it is passed in as zero, the loop
will execute once and decrement len to -1. At that point, the loop will
proceed until the next invalid address is found; in the process, it will
likely overflow the pages array passed in to get_user_pages().
I think that, if get_user_pages() has been asked to grab zero pages,
that's what it should do. Thus this patch; it is, among other things,
enough to block the (already fixed) root exploit and any others which
might be lurking in similar code. I also think that the number of pages
should be unsigned, but changing the prototype of this function probably
requires some more careful review.
Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 12 Feb 2008 04:42:11 +0000 (20:42 -0800)]
Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
sata_mv: platform driver allocs dma without create
pata_ninja32: setup changes
pata_legacy: typo fix
pata_amd: Note in the module description it handles Nvidia
sata_mv: fix loop with last port
libata: ignore deverr on SETXFER if mode is configured
pata_via: fix SATA cable detection on cx700
Olof Johansson [Mon, 11 Feb 2008 02:22:57 +0000 (20:22 -0600)]
mlx4_core: Fix build break (missing include)
Commit 313abe55 ("mlx4_core: For 64-bit systems, vmap() kernel queue
buffers") caused this to pop up on powerpc allyesconfig, looks like a
missing include file:
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
drivers/net/mlx4/alloc.c:162: error: implicit declaration of function 'vmap'
drivers/net/mlx4/alloc.c:162: error: 'VM_MAP' undeclared (first use in this function)
drivers/net/mlx4/alloc.c:162: error: (Each undeclared identifier is reported only once
drivers/net/mlx4/alloc.c:162: error: for each function it appears in.)
drivers/net/mlx4/alloc.c:162: warning: assignment makes pointer from integer without a cast
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_free':
drivers/net/mlx4/alloc.c:187: error: implicit declaration of function 'vunmap'
Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Tony Luck [Mon, 11 Feb 2008 21:23:46 +0000 (13:23 -0800)]
[IA64] Fix build for sim_defconfig
Commit bdc807871d58285737d50dc6163d0feb72cb0dc2 broke the build
for this config because the sim_defconfig selects CONFIG_HZ=250
but include/asm-ia64/param.h has an ifdef for the simulator to
force HZ to 32. So we ended up with a kernel/timeconst.h set
for HZ=250 ... which then failed the check for the right HZ
value and died with:
Drop the #ifdef magic from param.h and make force CONFIG_HZ=32
directly for the simulator.
Alan Cox [Fri, 8 Feb 2008 15:25:10 +0000 (15:25 +0000)]
pata_ninja32: setup changes
Forcibly set more of the configuration at init time. This seems to fix at
least one problem reported. We don't know what most of these bits do, but
we do know what windows stuffs there.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Thu, 7 Feb 2008 01:34:08 +0000 (10:34 +0900)]
libata: ignore deverr on SETXFER if mode is configured
Some controllers (VIA CX700) raise device error on SETXFER even after
mode configuration succeeded. Update ata_dev_set_mode() such that
device error is ignored if transfer mode is configured correctly. To
implement this, device is revalidated even after device error on
SETXFER.
This fixes kernel bugzilla bug 8563.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Thomas Gleixner [Sun, 10 Feb 2008 22:57:36 +0000 (23:57 +0100)]
x86: remove over noisy debug printk
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.
Andi Kleen [Mon, 11 Feb 2008 00:35:20 +0000 (01:35 +0100)]
Prevent IDE boot ops on NUMA system
Without this patch a Opteron test system here oopses at boot with
current git.
Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.
Linus Torvalds [Mon, 11 Feb 2008 17:19:47 +0000 (09:19 -0800)]
Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
SUNPRC: Fix printk format warning
nfsd: clean up svc_reserve_auth()
NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight
NLM: don't reattempt GRANT_MSG when there is already an RPC in flight
NLM: have server-side RPC clients default to soft RPC tasks
NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
Sergio Luis [Sun, 10 Feb 2008 20:56:25 +0000 (17:56 -0300)]
drivers/net/sis190: fix section mismatch warning in sis190_get_mac_addr
Fix following warnings:
WARNING: drivers/net/sis190.o(.text+0x103): Section mismatch in reference from the function sis190_get_mac_addr() to the function .devinit.text:sis190_get_mac_addr_from_apc()
WARNING: drivers/net/sis190.o(.text+0x10e): Section mismatch in reference from the function sis190_get_mac_addr() to the function .devinit.text:sis190_get_mac_addr_from_eeprom()
Annotate sis190_get_mac_addr() with __devinit.
Signed-off-by: Sergio Luis <sergio@uece.br>
sis190.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-) Signed-off-by: Jeff Garzik <jeff@garzik.org>
Matthew Wilcox [Mon, 11 Feb 2008 04:18:15 +0000 (23:18 -0500)]
Use proper abstractions in quirk_intel_irqbalance
Since we may not have a pci_dev for the device we need to access, we can't
use pci_read_config_word. But raw_pci_read is an internal implementation
detail; it's better to use the architected pci_bus_read_config_word
interface. Using PCI_DEVFN instead of a mysterious constant helps
reassure everyone that we really do intend to access device 8.
[ Thanks to Grant Grundler for pointing out to me that this is exactly
what the write immediately above this is doing -- enabling device 8 to
respond to config space cycles.
- Matthew
Grant also says:
"Can you also add a comment which points at the Intel
documentation?
The 'Intel E7320 Memory Controller Hub (MCH) Datasheet' at
Ursula Braun [Fri, 8 Feb 2008 12:09:03 +0000 (13:09 +0100)]
netiucv: change name of nop function
Dummy NOP actions for fsm-statemachines have to be defined
separately for every using module of fsm-statemachines.
Thus the generic name fsm_action_nop is replaced by
module specific name netiucv_action_nop.
Signed-off-by: Ursula Braun <braunu@de.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ayaz Abdulla [Tue, 5 Feb 2008 17:30:01 +0000 (12:30 -0500)]
forcedeth: tx pause watermarks
New chipsets introduced variant Rx FIFO sizes that need to be taken into
account when setting up the tx pause watermarks. This patch introduces
the new device feature flags based on a version and implements the new
watermarks.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ayaz Abdulla [Tue, 5 Feb 2008 17:29:49 +0000 (12:29 -0500)]
forcedeth: tx collision fix
This patch supports a new fix in hardware regarding tx collisions. In
the cases where we are in autoneg mode and the link partner is in forced
mode, we need to setup the tx deferral register differently in order to
reduce collisions on the wire.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Steve Wise [Wed, 6 Feb 2008 18:05:19 +0000 (12:05 -0600)]
cxgb3: Handle ARP completions that mark neighbors stale.
When ARP completes due to a request rather than a reply the neighbor is
marked NUD_STALE instead of reachable (see arp_process()). The handler
for the resulting netevent needs to check also for NUD_STALE.
Failure to use the arp entry can cause RDMA connection failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:20 +0000 (00:02 +0000)]
DM9000: Add platform flag for no attached EEPROM
Allow the platform data to specify to the DM9000 driver
that there is no posibility of an attached EEPROM on the
device, so default all reads to 0xff and ignore any
write operations.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:17 +0000 (00:02 +0000)]
DM9000: Fix delays used by EEPROM read and write
The code was using a delay of 8ms, when it should have been
using the EEPROM status flag from the device to indicate the
EEPROM transaction had finished.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:16 +0000 (00:02 +0000)]
DM9000: Use netif_msg to enable debugging options
Use the netif_msg_*() macros to enable the debugging based
on the board's msg_enable field. The output still goes via
the dev_dbg() macros, so will be tagged and output as
appropriate.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:14 +0000 (00:02 +0000)]
DM9000: Ensure spinlock held whilst accessing EEPROM registers
Ensure we hold the spinlock whilst the registers and being
modified even though we hold the overall lock. This should
protect against an interrupt happening whilst we are using
the device.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:10 +0000 (00:02 +0000)]
DM9000: Add mutex to protect access
Add a mutex to serialise access to the chip functions from
entries such as the ethtool and the MII code. This should
reduce the amount of time the spinlock is held to protect
the address register.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:09 +0000 (00:02 +0000)]
DM9000: Remove barely used SROM array read.
The srom array in the board data is only being used in the device probe
routines. The probe also only uses the first 6 bytes of an array
we spend 512ms reading 128 bytes from. Change to reading the
MAC area directly to the MAC address structure.
As a side product, we rename the read_srom_word to dm9000_read_eeprom
to bring it into line with the rest of the driver. No change is made
to the delay in this function, which will be dealt with in a later
patch.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:07 +0000 (00:02 +0000)]
DM9000: Do not sleep with spinlock and IRQs held
The phy read and write routines call udelay() with the board
lock held, and with the posibility of IRQs being disabled. Since
these delays can be up to 500usec, and are only required as we
have to save the chip's address register.
To improve the behaviour, hold the lock whilst we are writing
and then restore the state before the delay and then repeat
the process once the delay has happened.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:05 +0000 (00:02 +0000)]
DM9000: Remove old timer based poll routines
Remove the timer based MII phy polling, as this is
currently broken with the new EEPROM code that now
uses mutexes to protect the phy access.
This will need to be replaced in the future by some
form of mutex safe mechanism for reading the MII
phy status.
The replacement has not been done here as changing
this patch, which is early in the sequence has quite
a knock-on effect. Once this series is merged, then
a new presentation of an patch to poll the MII link
status can be added.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ben Dooks [Tue, 5 Feb 2008 00:02:04 +0000 (00:02 +0000)]
DM9000: Pass IRQ flags via platform resources
Use the flags in the IRQ resource to specify the type of
IRQ being requested, so that systems which do not have
level-based interrupts, or change the interrupt in some
other way can specify this without making an #ifdef mess
in the driver.
This is specifically designed to undo the change in commit 4e4fc05a2b6e7bd2e0facd96e0c18dceb34d9349 which hardwires the
type for everyone but blackfin to IRQT_RISING, which breaks
all a number of Simtec boards which use (and setup in the
bootloader) active low IRQs.
Note, although there where originally objections due to
the use of IORESOURCE_IRQ and IRQT_ flags not sharing the
same definition, at least <include/linux/interrupt.h> notes
these are the same.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> CC: Daniel Mack <daniel@caiaq.de> CC: Bryan Wu <bryan.wu@analog.com> CC: Alex Landau <landau.alex@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Krishna Kumar [Wed, 30 Jan 2008 07:00:16 +0000 (12:30 +0530)]
Optimize cxgb3 xmit path (a bit)
1. Add common code for stopping queue.
2. No need to call netif_stop_queue followed by netif_wake_queue (and
infact a netif_start_queue could have been used instead), instead
call stop_queue if required, and remove code under USE_GTS macro.
3. There is no need to check for netif_queue_stopped, as the network
core guarantees that for us (I am sure every driver could remove
that check, eg e1000 - I have tested that path a few billion times
with about a few hundred thousand qstops but the condition never
hit even once).
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
e1000: Fix for 32 bits platforms with 64 bits resources
The e1000 driver stores the content of the PCI resources into
unsigned long's before ioremapping. This breaks on 32 bits
platforms that support 64 bits MMIO resources such as ppc 44x.
This fixes it by removing those temporary variables and passing
directly the result of pci_resource_start/len to ioremap.
The side effect is that I removed the assignments to the netdev
fields mem_start, mem_end and base_addr, which are totally useless
for PCI devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
--
Masakazu Mokuno [Thu, 7 Feb 2008 10:58:57 +0000 (19:58 +0900)]
PS3: gelic: Add wireless support for PS3
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Acked-by: Dan Williams <dcbw@redhat.com> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Masakazu Mokuno [Thu, 7 Feb 2008 10:58:42 +0000 (19:58 +0900)]
PS3: gelic: Add support for dual network interface
Add support for dual network (net_device) interface so that ethernet
and wireless can own separate ethX interfaces.
V2
- Fix the bug that bringing down and up the interface keeps rx
disabled.
- Make 'gelic_net_poll_controller()' extern , as David Woodhouse
pointed out at the previous submission.
- Fix weird usage of member names for the rx descriptor chain
V1
- Export functions which are convenient for both interfaces
- Move irq allocation/release code to driver probe/remove handlers
because interfaces share interrupts.
- Allocate skbs by using dev_alloc_skb() instead of netdev_alloc_skb()
as the interfaces share the hardware rx queue.
- Add gelic_port struct in order to abstract dual interface handling
- Change handlers for hardware queues so that they can handle dual
{source,destination} interfaces.
- Use new NAPI functions
This is a prerequisite for the new PS3 wireless support.
Masakazu Mokuno [Thu, 7 Feb 2008 10:58:08 +0000 (19:58 +0900)]
PS3: gelic: code cleanup
Code cleanup:
- Use appropriate prefixes for names instead of fixed 'gelic_net'
so that objects of the functions, variables and constants can be estimated.
- Remove definitions for IPSec offload to the gelic hardware. This
functionality is never supported on PS3.
- Group constants with enum.
- Use bitwise constants for interrupt status, instead of bit numbers to
eliminate shift operations.
- Style fixes. Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Masakazu Mokuno [Thu, 7 Feb 2008 10:57:54 +0000 (19:57 +0900)]
PS3: gelic: Add endianness macros
Mark the members of the structure for DMA descriptors with proper endian
annotations and use the appropriate accessor macros.
As the gelic driver works only on PS3, all these macros will be
expanded to null.
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>