Gerrit Renker [Mon, 13 Nov 2006 15:34:38 +0000 (13:34 -0200)]
[DCCPv6]: Choose a genuine initial sequence number
This
* resolves a FIXME - DCCPv6 connections started all with
an initial sequence number of 1;
* provides a redirection `secure_dccpv6_sequence_number'
in case the init_sequence_v6 code should be updated later;
* concentrates the update of S.GAR into dccp_connect_init();
* removes a duplicate dccp_update_gss() in ipv4.c;
* uses inet->dport instead of usin->sin_port, due to the
following assignment in dccp_v4_connect():
inet->dport = usin->sin_port;
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:31:50 +0000 (13:31 -0200)]
[DCCP]: Remove redundant statements in init_sequence (ISS)
This patch removes the following redundancies:
1) The test skb->protocol == htons(ETH_P_IPV6) in dccp_v6_init_sequence
is always true since
* dccp_v6_conn_request() is the only calling function
* dccp_v6_conn_request() redirects all skb's with ETH_P_IP to
dccp_v4_conn_request()
2) The first argument, `struct sock *sk', of dccp_v{4,6}_init_sequence()
is never used.
(This is similar for tcp_v{4,6}_init_sequence, an analogous patch has been
submitted to netdev and merged.)
By the way - are the `sport' / `dport' arguments in the right order?
I have made them consistent among calls but they seem to be in the
reverse order.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:25:41 +0000 (13:25 -0200)]
[DCCP]: Introduce a consistent naming scheme for sysctls
In order to make their function clearer and obtain a consistent naming
scheme to identify sysctls, all existing DCCP sysctls have been prefixed
with `sysctl_dccp', following the same convention as used by TCP.
Feature-specific sysctls retain the `feat' in the middle, although the
`default' has been dropped, since it is obvious from use.
Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:23:52 +0000 (13:23 -0200)]
[DCCP]: Add sysctls to control retransmission behaviour
This adds 3 sysctls which govern the retransmission behaviour of DCCP control
packets (3way handshake, feature negotiation).
It removes 4 FIXMEs from the code.
The close resemblance of sysctl variables to their TCP analogues is emphasised
not only by their name, but also by giving them the same initial values.
This is useful since there is not much practical experience with DCCP yet.
Furthermore, with regard to the previous patch, it is now possible to limit
the number of keepalive-Responses by setting net.dccp.default.request_retries
(also a bit like in TCP).
Lastly, added documentation of all existing DCCP sysctls.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Mon, 13 Nov 2006 15:07:51 +0000 (13:07 -0200)]
[DCCP]: Update comments on precisely which packets can be retransmitted
This updates program documentation: spell out precise conditions about
which packets are eligible for retransmission (which is actually quite
hard to extract from RFC 4340).
It is based on the following table derived from RFC 4340:
+-----------+---------------------------------+---------------------+
| Type | Retransmit? | Remark |
+-----------+---------------------------------+---------------------+
| Request | in client-REQUEST state | sec. 8.1.1 |
| Response | NEVER | SHOULD NOT, 8.1.3 |
| Data | NEVER | unreliable protocol |
| Ack | possible in client-PARTOPEN | sec. 8.1.5 |
| DataAck | NEVER | unreliable protocol |
| CloseReq | only in server-CLOSEREQ state | MUST, sec. 8.3 |
| Close | in node-CLOSING state | MUST, sec. 8.3 |
+-----------+-------------------------------------------------------+
| Reset | only in response to other packets |
| Sync | only in response to sequence-invalid packets (7.5.4) |
| SyncAck | only in response to Sync packets |
+-----------+-------------------------------------------------------+
Hence the only packets eligible for retransmission are:
* Requests in client-REQUEST state (sec. 8.1.1)
* Acks in client-PARTOPEN state (sec. 8.1.5)
* CloseReq in server-CLOSEREQ state (sec. 8.3)
* Close in node-CLOSING state (sec. 8.3)
I had meant to put in a check for these types too, but have left that
for later.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Thomas Graf [Fri, 10 Nov 2006 22:10:15 +0000 (14:10 -0800)]
[NETLINK]: Do precise netlink message allocations where possible
Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.
Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 10 Nov 2006 22:06:49 +0000 (14:06 -0800)]
[TCP]: Remove dead code in init_sequence
This removes two redundancies:
1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
is always true, due to
* tcp_v6_conn_request() is the only function calling this one
* tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]
2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
never used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 10 Nov 2006 19:43:06 +0000 (17:43 -0200)]
[DCCP]: Support for partial checksums (RFC 4340, sec. 9.2)
This patch does the following:
a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
b) provides necessary socket options and documentation as to how to use them
c) basic support and infrastructure for the Minimum Checksum Coverage feature
[RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
interface
In addition, it
(1) fixes two bugs in the DCCPv4 checksum computation:
* pseudo-header used checksum_len instead of skb->len
* incorrect checksum coverage calculation based on dccph_x
(2) removes dccp_v4_verify_checksum() since it reduplicates code of the
checksum computation; code calling this function is updated accordingly.
(3) now uses skb_checksum(), which is safer than checksum_partial() if the
sk_buff has is a non-linear buffer (has pages attached to it).
(4) fixes an outstanding TODO item:
* If P.CsCov is too large for the packet size, drop packet and return.
The code has been tested with applications, the latest version of tcpdump now
comes with support for partial DCCP checksums.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 18:29:14 +0000 (16:29 -0200)]
[DCCP]: Update code comments for Step 2/3
Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340.
All comments have been updated against this document, and the reference to step
2 has been made consistent throughout the files.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Ian McDonald [Fri, 10 Nov 2006 15:09:10 +0000 (13:09 -0200)]
[DCCP]: Fix logfile overflow
This patch fixes data being spewed into the logs continually. As the
code stood if there was a large queue and long delays timeo would go
down to zero and never get reset.
This fixes it by resetting timeo. Put constant into header as well.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 15:01:31 +0000 (13:01 -0200)]
[DCCPv6]: remove forward declarations in ipv6.c
This does the same for ipv6.c as the preceding one does for ipv4.c: Only the
inet_connection_sock_af_ops forward declarations remain, since at least
dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock.
No code change, merely re-ordering.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 14:32:01 +0000 (12:32 -0200)]
[DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG
This patch removes two functions, the send_ack functions of request_sock,
which are not called/used by the DCCP code. It is correct that these
functions are not called, below is a justification why calling these
functions (on a passive socket in the LISTEN/RESPOND state) would mean
a DCCP protocol violation.
Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME,
but as he suggests in the same patch the best thing is to just ditch this
declaration, while doing that also noticed that tcp_tw_count is as well not
defined anywhere, so ditch it too.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 13:46:34 +0000 (11:46 -0200)]
[DCCP]: Simplify jump labels in dccp_v{4,6}_rcv
This is a code simplification and was singled out from the
DCCPv6 Oops patch on
http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html
It mainly makes the code consistent between ipv{4,6}.c for the functions
dccp_v4_rcv
dccp_v6_rcv
and removes the do_time_wait label to simplify code somewhat.
Commiter note: fixed up a compile problem, trivial.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Gerrit Renker [Fri, 10 Nov 2006 13:13:33 +0000 (11:13 -0200)]
[DCCPv6]: Add a FIXME for missing IPV6_PKTOPTIONS
This refers to the possible memory leak pointed out in
http://www.mail-archive.com/dccp@vger.kernel.org/msg00574.html,
fixed by David Miller in
http://www.mail-archive.com/netdev@vger.kernel.org/msg24881.html
and adds a FIXME to point out where code is missing.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Allow normal users to only choose among a restricted set of congestion
control choices. The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.
For example, to allow any choice:
cp /proc/sys/net/ipv4/tcp_available_congestion_control \
/proc/sys/net/ipv4/tcp_allowed_congestion_control
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 10 Nov 2006 00:29:57 +0000 (16:29 -0800)]
[SCTP]: Fix warning
An alternate solution would be to make the digest a pointer, allocate
it in sctp_endpoint_init() and free it in sctp_endpoint_destroy().
I guess I should have originally done it this way...
CC [M] net/sctp/sm_make_chunk.o
net/sctp/sm_make_chunk.c: In function 'sctp_unpack_cookie':
net/sctp/sm_make_chunk.c:1358: warning: initialization discards qualifiers from pointer target type
The reason is that sctp_unpack_cookie() takes a const struct
sctp_endpoint and modifies the digest in it (digest being embedded in
the struct, not a pointer). Make digest a pointer to fix this
warning.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Nov 2006 10:30:37 +0000 (02:30 -0800)]
[NET]: Size listen hash tables using backlog hint
We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)
On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.
This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application (2nd parameter of listen())
For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().
We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Kimdon [Fri, 10 Nov 2006 00:16:21 +0000 (16:16 -0800)]
[PKT_SCHED]: Make sch_fifo.o available when CONFIG_NET_SCHED is not set.
Based on patch by Patrick McHardy.
Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
without requiring CONFIG_NET_SCHED.
The d80211 stack needs a generic fifo qdisc for WME. At present it
uses net/d80211/fifo_qdisc.c which is functionally equivalent to
sch_fifo.c. This patch will allow the d80211 stack to remove
net/d80211/fifo_qdisc.c and use sch_fifo.c instead.
Signed-off-by: David Kimdon <david.kimdon@devicescape.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:20:38 +0000 (15:20 -0800)]
[NET]: Rethink mark field in struct flowi
Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.
The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 9 Nov 2006 23:19:14 +0000 (15:19 -0800)]
[NET]: Turn nfmark into generic mark
nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Thu, 9 Nov 2006 07:02:19 +0000 (23:02 -0800)]
[DECNET]: Don't clear memory twice.
When dn_neigh.c was converted from kmalloc to kzalloc in commit 0da974f4f303a6842516b764507e3c0a03f41e5a it was missed that
dn_neigh_seq_open was actually clearing the allocation twice was
missed.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the selection of an SA for an outgoing packet to be at the same
context as the originating socket/flow. This eliminates the SELinux
policy's ability to use/sendto SAs with contexts other than the socket's.
With this patch applied, the SELinux policy will require one or more of the
following for a socket to be able to communicate with/without SAs:
1. To enable a socket to communicate without using labeled-IPSec SAs:
Fix SO_PEERSEC for tcp sockets to return the security context of
the peer (as represented by the SA from the peer) as opposed to the
SA used by the local/source socket.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>
Since the upstreaming of the mlsxfrm modification a few months back,
testing has resulted in the identification of the following issues/bugs that
are resolved in this patch set.
1. Fix the security context used in the IKE negotiation to be the context
of the socket as opposed to the context of the SPD rule.
2. Fix SO_PEERSEC for tcp sockets to return the security context of
the peer as opposed to the source.
3. Fix the selection of an SA for an outgoing packet to be at the same
context as the originating socket/flow.
The following would be the result of applying this patchset:
- SO_PEERSEC will now correctly return the peer's context.
- IKE deamons will receive the context of the source socket/flow
as opposed to the SPD rule's context so that the negotiated SA
will be at the same context as the source socket/flow.
- The SELinux policy will require one or more of the
following for a socket to be able to communicate with/without SAs:
1. To enable a socket to communicate without using labeled-IPSec SAs:
This Patch: Pass correct security context to IKE for use in negotiation
Fix the security context passed to IKE for use in negotiation to be the
context of the socket as opposed to the context of the SPD rule so that
the SA carries the label of the originating socket/flow.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Sat, 2 Dec 2006 16:29:04 +0000 (08:29 -0800)]
Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/drzeus/mmc:
mmc: correct request error handling
mmc: Flush block queue when removing card
mmc: sdhci high speed support
mmc: Support for high speed SD cards
mmc: Fix mmc_delay() function
mmc: Add support for mmc v4 wide-bus modes
[PATCH] mmc: Add support for mmc v4 high speed mode
trivial change for mmc/Kconfig: MMC_PXA does not mean only PXA255
Make general code cleanups
Add MMC_CAP_{MULTIWRITE,BYTEBLOCK} flags
Platform device error handling cleanup
Move register definitions away from the header file
Change OMAP_MMC_{READ,WRITE} macros to use the host pointer
Replace base with virt_base and phys_base
mmc: constify mmc_host_ops vectors
mmc: remove kernel_thread()
Len Brown [Sat, 2 Dec 2006 07:27:46 +0000 (02:27 -0500)]
Revert "ACPI: SCI interrupt source override"
This reverts commit 281ea49b0c294649a6de47a6f8fbe5611137726b,
which broke ACPI Interrupt source overrides that move
the SCI from one IRQ in PIC mode to another in IOAPIC mode.
If the SCI shared an interrupt line with another device,
this would result in a "irq 18: nobody cared" type failure.
Andy Fleming [Fri, 1 Dec 2006 18:01:06 +0000 (12:01 -0600)]
[PATCH] PHY: Add support for configuring the PHY connection interface
Most PHYs connect to an ethernet controller over a GMII or MII
interface. However, a growing number are connected over
different interfaces, such as RGMII or SGMII.
The ethernet driver will tell the PHY what type of connection it
is by setting it manually, or passing it in through phy_connect
(or phy_attach).
Changes include:
* Updates to documentation
* Updates to PHY Lib consumers
* Changes to PHY Lib to add interface support
* Some minor changes to whitespace in phy.h
* gianfar driver now detects interface and passes appropriate
value to PHY Lib Signed-off-by: Andrew Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
If transmit lock is contended on, then push return code back
and retry at higher level.
Bugfix: If buffer is reallocated because of lack of headroom
and the send is blocked, then drop packet. This is necessary
because caller would end up requeuing a freed skb.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
I noticed this driver (and several others) reinvent their own copy of the
existing CRC library. Don't have the hardware, but tested by extracting
code and comparing result.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
It is possible for the sky2 driver NAPI poll routine to be called with
IRQ's disabled if netpoll is trying to make space in the tx queue. This
is an obscure path, but if it happens, the kfree_skb needs to happen
via softirq. Calling kfree_skb with IRQ's disabled is a not allowed.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Update workarounds for 88E803X based on the latest SysKonnect vendor
driver version (8.41). Tested on EC_U rev A1, only.
These up the receive performance.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>