Vlad Yasevich [Wed, 8 Oct 2008 21:19:01 +0000 (14:19 -0700)]
sctp: shrink sctp_tsnmap some more by removing gabs array
The gabs array in the sctp_tsnmap structure is only used
in one place, sctp_make_sack(). As such, carrying the
array around in the sctp_tsnmap and thus directly in
the sctp_association is rather pointless since most
of the time it's just taking up space. Now, let
sctp_make_sack create and populate it and then throw
it away when it's done.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 8 Oct 2008 21:18:39 +0000 (14:18 -0700)]
sctp: Rework the tsn map to use generic bitmap.
The tsn map currently use is 4K large and is stuck inside
the sctp_association structure making memory references REALLY
expensive. What we really need is at most 4K worth of bits
so the biggest map we would have is 512 bytes. Also, the
map is only really usefull when we have gaps to store and
report. As such, starting with minimal map of say 32 TSNs (bits)
should be enough for normal low-loss operations. We can grow
the map by some multiple of 32 along with some extra room any
time we receive the TSN which would put us outside of the map
boundry. As we close gaps, we can shift the map to rebase
it on the latest TSN we've seen. This saves 4088 bytes per
association just in the map alone along savings from the now
unnecessary structure members.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 8 Oct 2008 21:18:04 +0000 (14:18 -0700)]
inet: cleanup of local_port_range
I noticed sysctl_local_port_range[] and its associated seqlock
sysctl_local_port_range_lock were on separate cache lines.
Moreover, sysctl_local_port_range[] was close to unrelated
variables, highly modified, leading to cache misses.
Moving these two variables in a structure can help data
locality and moving this structure to read_mostly section
helps sharing of this data among cpus.
Cleanup of extern declarations (moved in include file where
they belong), and use of inet_get_local_port_range()
accessor instead of direct access to ports values.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 8 Oct 2008 18:44:17 +0000 (11:44 -0700)]
udp: Improve port randomization
Current UDP port allocation is suboptimal.
We select the shortest chain to chose a port (out of 512)
that will hash in this shortest chain.
First, it can lead to give not so ramdom ports and ease
give attackers more opportunities to break the system.
Second, it can consume a lot of CPU to scan all table
in order to find the shortest chain.
Third, in some pathological cases we can fail to find
a free port even if they are plenty of them.
This patch zap the search for a short chain and only
use one random seed. Problem of getting long chains
should be addressed in another way, since we can
obtain long chains with non random ports.
Based on a report and patch from Vitaly Mayatskikh
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Wed, 8 Oct 2008 18:36:22 +0000 (11:36 -0700)]
pkt_sched: Update qdisc requeue stats in dev_requeue_skb()
After the last change of requeuing there is no info about such
incidents in tc stats. This patch updates the counter, but we should
consider this should differ from previous stats because of additional
checks preventing to repeat this. On the other hand, previous stats
didn't include requeuing of gso_segmented skbs.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Wed, 8 Oct 2008 18:34:06 +0000 (11:34 -0700)]
tcp: fix length used for checksum in a reset
While looking for some common code I came across difference
in checksum calculation between tcp_v6_send_(reset|ack) I
couldn't explain. I checked both v4 and v6 and found out that
both seem to have the same "feature". I couldn't find anything
in rfc nor anywhere else which would state that md5 option
should be ignored like it was in case of reset so I came to
a conclusion that this is probably a genuine bug. I suspect
that addition of md5 just was fooled by the excessive
copy-paste code in those functions and the reset part was
never tested well enough to find out the problem.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:20 +0000 (11:35 +0200)]
netfilter: xtables: provide invoked family value to extensions
By passing in the family through which extensions were invoked, a bit
of data space can be reclaimed. The "family" member will be added to
the parameter structures and the check functions be adjusted.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:18 +0000 (11:35 +0200)]
netfilter: xtables: move extension arguments into compound structure (1/6)
The function signatures for Xtables extensions have grown over time.
It involves a lot of typing/replication, and also a bit of stack space
even if they are not used. Realize an NFWS2008 idea and pack them into
structs. The skb remains outside of the struct so gcc can continue to
apply its optimizations.
This patch does this for match extensions' match functions.
A few ambiguities have also been addressed. The "offset" parameter for
example has been renamed to "fragoff" (there are so many different
offsets already) and "protoff" to "thoff" (there is more than just one
protocol here, so clarify).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:15 +0000 (11:35 +0200)]
netfilter: ebtables: use generic table checking
Ebtables ORs (1 << NF_BR_NUMHOOKS) into the hook mask to indicate that
the extension was called from a base chain. So this also needs to be
present in the extensions' ->hooks.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:14 +0000 (11:35 +0200)]
netfilter: ebt_among: obtain match size through different means
The function signatures will be changed to match those of Xtables, and
the datalen argument will be gone. ebt_among unfortunately relies on
it, so we need to obtain it somehow.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
The TPROXY target implements redirection of non-local TCP/UDP traffic to local
sockets. Additionally, it's possible to manipulate the packet mark if and only
if a socket has been found. (We need this because we cannot use multiple
targets in the same iptables rule.)
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
netfilter: split netfilter IPv4 defragmentation into a separate module
Netfilter connection tracking requires all IPv4 packets to be defragmented.
Both the socket match and the TPROXY target depend on this functionality, so
this patch separates the Netfilter IPv4 defrag hooks into a separate module.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:11 +0000 (11:35 +0200)]
netfilter: enable netfilter in netns
From kernel perspective, allow entrance in nf_hook_slow().
Stuff which uses nf_register_hook/nf_register_hooks, but otherwise not netns-ready:
DECnet netfilter
ipt_CLUSTERIP
nf_nat_standalone.c together with XFRM (?)
IPVS
several individual match modules (like hashlimit)
ctnetlink
NOTRACK
all sorts of queueing and reporting to userspace
L3 and L4 protocol sysctls, bridge sysctls
probably something else
Anyway critical mass has been achieved, there is no reason to hide netfilter any longer.
From userspace perspective, allow to manipulate all sorts of
iptables/ip6tables/arptables rules.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.
Not so sure anymore.
[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:06 +0000 (11:35 +0200)]
netns: export netns list
Conntrack code will use it for
a) removing expectations and helpers when corresponding module is removed, and
b) removing conntracks when L3 protocol conntrack module is removed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:03 +0000 (11:35 +0200)]
netfilter: netns: fix {ip,6}_route_me_harder() in netns
Take netns from skb->dst->dev. It should be safe because, they are called
from LOCAL_OUT hook where dst is valid (though, I'm not exactly sure about
IPVS and queueing packets to userspace).
[Patrick: its safe everywhere since they already expect skb->dst to be set]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Other solution is to add ->ct_net pointer to tuplehashes and still has one
hash, I tried that it's ugly and requires more code deep down in protocol
modules et al.
* propagate netns pointer to where needed, e. g. to conntrack iterators.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:02 +0000 (11:35 +0200)]
netfilter: netns nf_conntrack: add ->ct_net -- pointer from conntrack to netns
Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.
->ct_net is write-once field.
Every conntrack in system has ->ct_net initialized, no exceptions.
->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.
Right now every conntrack is created in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: netns: remove nf_*_net() wrappers
Now that dev_net() exists, the usefullness of them is even less. Also they're
a big problem in resolving circular header dependencies necessary for
NOTRACK-in-netns patch. See below.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:01 +0000 (11:35 +0200)]
netfilter: implement NFPROTO_UNSPEC as a wildcard for extensions
When a match or target is looked up using xt_find_{match,target},
Xtables will also search the NFPROTO_UNSPEC module list. This allows
for protocol-independent extensions (like xt_time) to be reused from
other components (e.g. arptables, ebtables).
Extensions that take different codepaths depending on match->family
or target->family of course cannot use NFPROTO_UNSPEC within the
registration structure (e.g. xt_pkttype).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: Introduce NFPROTO_* constants
The netfilter subsystem only supports a handful of protocols (much
less than PF_*) and even non-PF protocols like ARP and
pseudo-protocols like PF_BRIDGE. By creating NFPROTO_*, we can earn a
few memory savings on arrays that previously were always PF_MAX-sized
and keep the pseudo-protocols to ourselves.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: xt_recent: IPv6 support
This updates xt_recent to support the IPv6 address family.
The new /proc/net/xt_recent directory must be used for this.
The old proc interface can also be configured out.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jan Engelhardt [Wed, 8 Oct 2008 09:35:00 +0000 (11:35 +0200)]
netfilter: rename ipt_recent to xt_recent
Like with other modules (such as ipt_state), ipt_recent.h is changed
to forward definitions to (IOW include) xt_recent.h, and xt_recent.c
is changed to use the new constant names.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>