David S. Miller [Thu, 10 Mar 2011 04:57:50 +0000 (20:57 -0800)]
ipv4: Optimize flow initialization in fib_validate_source().
Like in commit 44713b67db10c774f14280c129b0d5fd13c70cf2
("ipv4: Optimize flow initialization in output route lookup."
we can optimize the on-stack flow setup to only initialize
the members which are actually used.
Otherwise we bzero the entire structure, then initialize
explicitly the first half of it.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Mar 2011 04:42:07 +0000 (20:42 -0800)]
ipv4: Optimize flow initialization in input route lookup.
Like in commit 44713b67db10c774f14280c129b0d5fd13c70cf2
("ipv4: Optimize flow initialization in output route lookup."
we can optimize the on-stack flow setup to only initialize
the members which are actually used.
Otherwise we bzero the entire structure, then initialize
explicitly the first half of it.
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: ioctl type SIOCOUTQNSD returns amount of data not sent
In contrast to SIOCOUTQ which returns the amount of data sent
but not yet acknowledged plus data not yet sent this patch only
returns the data not sent.
For various methods of live streaming bitrate control it may
be helpful to know how much data are in the tcp outqueue are
not sent yet.
Signed-off-by: Mario Schuknecht <m.schuknecht@dresearch.de> Signed-off-by: Steffen Sledz <sledz@dresearch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Amerigo Wang [Sun, 6 Mar 2011 21:58:46 +0000 (21:58 +0000)]
bonding: move procfs code into bond_procfs.c
V2: Move #ifdef CONFIG_PROC_FS into bonding.h, as suggested by David.
bond_main.c is bloating, separate the procfs code out,
move them to bond_procfs.c
Signed-off-by: WANG Cong <amwang@redhat.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet: kill the ST-Ericsson pipe controller Kconfig
This is now a run-time choice so that a single kernel can support both
old and new generation ISI modems. Support for manually enabling the
pipe flow is removed as it did not work properly, does not fit well
with the socket API, and I am not aware of any use at the moment.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet: support active connection without pipe controller on modem
This provides support for newer ISI modems with no need for the
earlier experimental compile-time alternative choice. With this,
we can now use the same kernel and userspace with both types of
modems.
This also avoids confusing two different and incompatible state
machines, actively connected vs accepted sockets, and adds
connection response error handling (processing "SYN/RST" of sorts).
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet: provide pipe socket option to retrieve the pipe identifier
User-space sometimes needs this information. In particular, the GPRS
context or the AT commands pipe setups may use the pipe handle as a
reference.
This removes the settable pipe handle with CONFIG_PHONET_PIPECTRLR.
It did not handle error cases correctly. Furthermore, the kernel
*could* implement a smart scheme for allocating handles (if ever
needed), but userspace really cannot.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet: allocate sock from accept syscall rather than soft IRQ
This moves most of the accept logic to process context like other
socket stacks do. Then we can use a few more common socket helpers
and simplify a bit.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, the Phonet pipe backlog callbacks returned negative
errno instead of NET_RX_* values.
In other cases, NET_RX_DROP was returned for invalid packets, even
though it seems only intended for buffering problems (not for
deliberately discarded packets).
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet assumes that packets are never dropped. We try our best to
avoid this situation. But lets return ENOBUFS if queueing to the
network device fails so that the caller knows things went wrong.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet: fix NULL dereference on TX path with implicit source
The previous Phonet patch series introduced per-socket implicit
destination (i.e. connect()). In that case, the destination
socket address is NULL in the transmit function.
However commit a8059512b120362b15424f152b2548fe8b11bd0c
("Phonet: implement per-socket destination/peer address")
is incomplete and would trigger a NULL dereference.
(Fortunately, the code is not in released kernel, and in fact
currently not reachable.)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Mon, 7 Mar 2011 05:49:47 +0000 (05:49 +0000)]
dsa/mv88e6060: support nonzero mii base address
The mv88e6060 uses either the lower 16 or upper 16 mii addresses,
depending on the value of the EE_CLK/ADDR4 pin. Support both
configurations by using the sw_addr setting as base address.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Mar 2011 04:54:48 +0000 (20:54 -0800)]
ipv4: Cache source address in nexthop entries.
When doing output route lookups, we have to select the source address
if the user has not specified an explicit one.
First, if the route has an explicit preferred source address
specified, then we use that.
Otherwise we search the route's outgoing interface for a suitable
address.
This search can be precomputed and cached at route insertion time.
The only missing part is that we have to refresh this precomputed
value any time addresses are added or removed from the interface, and
this is accomplished by fib_update_nh_saddrs().
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Fri, 4 Mar 2011 09:06:10 +0000 (09:06 +0000)]
ixgbe: fix setting and reporting of advertised speeds
Add the ability to set 100/F on x540.
Fix reporting of advertised modes by adding check for phy.autoneg_advertised
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 3 Mar 2011 09:25:07 +0000 (09:25 +0000)]
ixgbe: fix spelling errors
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 3 Mar 2011 09:25:02 +0000 (09:25 +0000)]
ixgbe: improve logic in ixgbe_init_mbx_params_pf
Use if/then instead of an all-inclusive case statement.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 3 Mar 2011 09:24:56 +0000 (09:24 +0000)]
ixgbe: add function description
Add description for ixgbe_init_eeprom_params_X540 and whitespace fix.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 26 Feb 2011 06:40:16 +0000 (06:40 +0000)]
ixgbe: Enable flow control pause parameter auto-negotiation support
This patch enables flow control pause parameters auto-negotiation support
to 82599 based 10G Base-T, backplane devices and multi-speed fiber optics
modules at 1G speed
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 26 Feb 2011 06:40:11 +0000 (06:40 +0000)]
ixgbe: Add x540 statistic counter definitions
Add defines to accumulate and display x540 PHY statistic counters on
transmit/receive.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 25 Feb 2011 07:49:39 +0000 (07:49 +0000)]
ixgbe: cleanup PHY init
This change cleans up several situations in which we were either stepping
over possible errors, or calling initialization routines multiple times.
Also includes whitespace fixes where applicable.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Yi Zou [Tue, 1 Feb 2011 07:22:16 +0000 (07:22 +0000)]
ixgbe: add support to FCoE DDP in target mode
Add support to the ndo_fcoe_ddp_target() to allow the Intel 82599 device to
also provide DDP offload capability when the upper FCoE protocol stack is
operating as a target.
Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Yi Zou [Tue, 1 Feb 2011 07:22:06 +0000 (07:22 +0000)]
net: add ndo_fcoe_ddp_target() to support FCoE DDP in target mode
The Fiber Channel over Ethernet (FCoE) Direct Data Placement (DDP) can also be
used for FCoE target, where the DDP used for read I/O on an initiator can be
used on an FCoE target to speed up the write I/O to the target from the initiator.
The added ndo_fcoe_ddp_target() works in the similar way as the existing
ndo_fcoe_ddp_setup() to allow the underlying hardware set up the DDP context
accordingly when it gets called from the FCoE target implementation on top
the existing Open-FCoE fcoe/libfc protocol stack so without losing the ability
to provide DDP for read I/O as an initiator, it can also provide DDP offload
to the write I/O coming from the initiator as a target.
Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Mon, 14 Feb 2011 08:19:24 +0000 (08:19 +0000)]
e1000e: fix build issue due to undefined reference to crc32_le
kernel build fails with:
drivers/built-in.o: In function `e1000_lv_jumbo_workaround_ich8lan':
(.text+0x3e7a8): undefined reference to `crc32_le'
when CONFIG_CRC32 is not set or does not match the CONFIG_E1000E
selection.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changli Gao [Wed, 2 Mar 2011 21:07:14 +0000 (21:07 +0000)]
bonding: COW before overwriting the destination MAC address
When there is a ptype handler holding a clone of this skb, whose
destination MAC addresse is overwritten, the owner of this handler may
get a corrupted packet.
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 28 Feb 2011 20:26:31 +0000 (20:26 +0000)]
net: allow handlers to be processed for orig_dev
This was there before, I forgot about this. Allows deliveries to
ptype_base handlers registered for orig_dev. I presume this is still
desired.
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Mar 2011 22:27:38 +0000 (14:27 -0800)]
ipv4: Validate route entry type at insert instead of every lookup.
fib_semantic_match() requires that if the type doesn't signal an
automatic error, it must be of type RTN_UNICAST, RTN_LOCAL,
RTN_BROADCAST, RTN_ANYCAST, or RTN_MULTICAST.
Checking this every route lookup is pointless work.
Instead validate it during route insertion, via fib_create_info().
Also, there was nothing making sure the type value was less than
RTN_MAX, so add that missing check while we're here.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Fri, 4 Mar 2011 21:36:41 +0000 (21:36 +0000)]
batman-adv: Disallow regular interface as mesh device
When trying to associate a net_device with another net_device which
already exists, batman-adv assumes that this interface is a fully
initialized batman mesh interface without checking it. The behaviour
when accessing data behind netdev_priv of a random net_device is
undefined and potentially dangerous.
Reported-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Fri, 18 Feb 2011 12:28:10 +0000 (12:28 +0000)]
batman-adv: Correct rcu refcounting for orig_node
It might be possible that 2 threads access the same data in the same
rcu grace period. The first thread calls call_rcu() to decrement the
refcount and free the data while the second thread increases the
refcount to use the data. To avoid this race condition all refcount
operations have to be atomic.
Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Linus Lüssing [Fri, 18 Feb 2011 12:20:13 +0000 (12:20 +0000)]
batman-adv: Fix possible buffer overflow in softif neigh list output
When printing the soft interface table the number of entries in the
softif neigh list are first being counted and a fitting buffer
allocated. After that the softif neigh list gets locked again and
the buffer printed - which has the following two issues:
For one thing, the softif neigh list might have grown when reacquiring
the rcu lock, which results in writing outside of the allocated buffer.
Furthermore 31 Bytes are not enough for printing an entry with a vid
of more than 2 digits.
The manual buffering is unnecessary, we can safely print to the seq
directly during the rcu_read_lock().
Signed-off-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Linus Lüssing [Sun, 13 Feb 2011 21:13:04 +0000 (21:13 +0000)]
batman-adv: Increase orig_node refcount before releasing rcu read lock
When unicast_send_skb() is increasing the orig_node's refcount another
thread might have been freeing this orig_node already. We need to
increase the refcount in the rcu read lock protected area to avoid that.
Signed-off-by: Linus Lüssing <linus.luessing@ascom.ch> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Tue, 25 Jan 2011 21:52:11 +0000 (21:52 +0000)]
batman-adv: make broadcast seqno operations atomic
Batman-adv could receive several payload broadcasts at the same time
that would trigger access to the broadcast seqno sliding window to
determine whether this is a new broadcast or not. If these incoming
broadcasts are accessing the sliding window simultaneously it could
be left in an inconsistent state. Therefore it is necessary to make
sure this access is atomic.
Reported-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Thu, 10 Feb 2011 14:33:51 +0000 (14:33 +0000)]
batman-adv: Correct rcu refcounting for batman_if
It might be possible that 2 threads access the same data in the same
rcu grace period. The first thread calls call_rcu() to decrement the
refcount and free the data while the second thread increases the
refcount to use the data. To avoid this race condition all refcount
operations have to be atomic.
Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Thu, 10 Feb 2011 14:33:50 +0000 (14:33 +0000)]
batman-adv: Correct rcu refcounting for softif_neigh
It might be possible that 2 threads access the same data in the same
rcu grace period. The first thread calls call_rcu() to decrement the
refcount and free the data while the second thread increases the
refcount to use the data. To avoid this race condition all refcount
operations have to be atomic.
Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Thu, 10 Feb 2011 14:33:49 +0000 (14:33 +0000)]
batman-adv: Correct rcu refcounting for gw_node
It might be possible that 2 threads access the same data in the same
rcu grace period. The first thread calls call_rcu() to decrement the
refcount and free the data while the second thread increases the
refcount to use the data. To avoid this race condition all refcount
operations have to be atomic.
Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Thu, 10 Feb 2011 14:33:53 +0000 (14:33 +0000)]
batman-adv: Correct rcu refcounting for neigh_node
It might be possible that 2 threads access the same data in the same
rcu grace period. The first thread calls call_rcu() to decrement the
refcount and free the data while the second thread increases the
refcount to use the data. To avoid this race condition all refcount
operations have to be atomic.
Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Simon Wunderlich [Wed, 19 Jan 2011 20:01:43 +0000 (20:01 +0000)]
batman-adv: protect bonding with rcu locks
bonding / alternating candidates need to be secured by rcu locks
as well. This patch therefore converts the bonding list
from a plain pointer list to a rcu securable lists and references
the bonding candidates.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Wed, 19 Jan 2011 19:16:10 +0000 (19:16 +0000)]
batman-adv: free neighbors when an interface is deactivated
hardif_disable_interface() calls purge_orig_ref() to immediately free
all neighbors associated with the interface that is going down.
purge_orig_neighbors() checked if the interface status is IF_INACTIVE
which is set to IF_NOT_IN_USE shortly before calling purge_orig_ref().
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
David S. Miller [Sat, 5 Mar 2011 05:35:25 +0000 (21:35 -0800)]
ipv4: Set rt->rt_iif more sanely on output routes.
rt->rt_iif is only ever inspected on input routes, for example DCCP
uses this to populate a route lookup flow key when generating replies
to another packet.
Therefore, setting it to anything other than zero on output routes
makes no sense.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 5 Mar 2011 05:24:47 +0000 (21:24 -0800)]
ipv4: Optimize flow initialization in output route lookup.
We burn a lot of useless cycles, cpu store buffer traffic, and
memory operations memset()'ing the on-stack flow used to perform
output route lookups in __ip_route_output_key().
Only the first half of the flow object members even matter for
output route lookups in this context, specifically:
FIB rules matching cares about:
dst, src, tos, iif, oif, mark
FIB trie lookup cares about:
dst
FIB semantic match cares about:
tos, scope, oif
Therefore only initialize these specific members and elide the
memset entirely.
On Niagara2 this kills about ~300 cycles from the output route
lookup path.
Likely, we can take things further, since all callers of output
route lookups essentially throw away the on-stack flow they use.
So they don't care if we use it as a scratch-pad to compute the
final flow key.
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Eric Dumazet [Fri, 4 Mar 2011 22:33:59 +0000 (14:33 -0800)]
inetpeer: seqlock optimization
David noticed :
------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.
If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.
This makes the case of a create==0 lookup for a not-present entry
really expensive. It is on the order of 600 cpu cycles on my
Niagara2.
I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.
This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------
One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.
After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.
Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 22 Feb 2011 17:26:10 +0000 (17:26 +0000)]
sfc: Use write-combining to reduce TX latency
Based on work by Neil Turton <nturton@solarflare.com> and
Kieran Mansley <kmansley@solarflare.com>.
The BIU has now been verified to handle 3- and 4-dword writes within a
single 128-bit register correctly. This means we can enable write-
combining and only insert write barriers between writes to distinct
registers.
This has been observed to save about 0.5 us when pushing a TX
descriptor to an empty TX queue.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Linus Torvalds [Thu, 3 Mar 2011 23:48:01 +0000 (15:48 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
MAINTAINERS: Add Andy Gospodarek as co-maintainer.
r8169: disable ASPM
RxRPC: Fix v1 keys
AF_RXRPC: Handle receiving ACKALL packets
cnic: Fix lost interrupt on bnx2x
cnic: Prevent status block race conditions with hardware
net: dcbnl: check correct ops in dcbnl_ieee_set()
e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead
igb: fix sparse warning
e1000: fix sparse warning
netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
dccp: fix oops on Reset after close
ipvs: fix dst_lock locking on dest update
davinci_emac: Add Carrier Link OK check in Davinci RX Handler
bnx2x: update driver version to 1.62.00-6
bnx2x: properly calculate lro_mss
bnx2x: perform statistics "action" before state transition.
bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
bnx2x: Fix nvram test for single port devices.
...
Linus Torvalds [Thu, 3 Mar 2011 23:42:35 +0000 (15:42 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill loop_mutex
blktrace: Remove blk_fill_rwbs_rq.
block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()
block: add @force_kblockd to __blk_run_queue()
block: fix kernel-doc format for blkdev_issue_zeroout
blk-throttle: Do not use kblockd workqueue for throtl work
David Howells [Thu, 3 Mar 2011 11:28:58 +0000 (11:28 +0000)]
DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]
When a DNS resolver key is instantiated with an error indication, attempts to
read that key will result in an oops because user_read() is expecting there to
be a payload - and there isn't one [CVE-2011-1076].
Give the DNS resolver key its own read handler that returns the error cached in
key->type_data.x[0] as an error rather than crashing.
Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
amount of data it prints, since the data is not necessarily NUL-terminated.
The buggy code was added in:
commit 4a2d789267e00b5a1175ecd2ddefcc78b83fbf09
Author: Wang Lei <wang840925@gmail.com>
Date: Wed Aug 11 09:37:58 2010 +0100
Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]
This can trivially be reproduced by any user with the following program
compiled with -lkeyutils:
What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:
dns-break: read_key: No such device or address
but instead the kernel oopses.
This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
as both of those cut the data down below the NUL termination that must be
included in the data. Without this dns_resolver_instantiate() will return
-EINVAL and the key will not be instantiated such that it can be read.
The oops looks like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f
PGD 3bdf8067 PUD 385b9067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
CPU 0
Modules linked in:
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com>
cc: Wang Lei <wang840925@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>