The dmaw_block is an enum and max_pay_load is u32. Therefore
sparse gives warning about comparison of unsigned and signed value.
Resolve by using min_t to force cast.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This implementation is based on work done by Juliusz Chroboczek
General SFB algorithm can be found in figure 14, page 15:
B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
if (B[i][h{i}].qlen > bin_size)
B[i][h{i}].p_mark += p_increment;
else if (B[i][h{i}].qlen == 0)
B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
ratelimit();
else
mark/drop with probabilty p_min;
I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :
Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.
tc qdisc add dev $DEV parent 1:11 handle 11: \
est 0.5sec 2sec sfb limit 128
tc filter add dev $DEV protocol ip parent 11: handle 3 \
flow hash keys dst divisor 1024
Notes:
1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.
2) ECN is enabled by default, unlike RED/CHOKe/GRED
With help from Patrick McHardy & Andi Kleen
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Patrick McHardy <kaber@trash.net> CC: Andi Kleen <andi@firstfloor.org> CC: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:42:07 +0000 (11:42 +0000)]
be2net: add code to display temperature of ASIC
Add support to display temperature of ASIC via ethtool -S
From: Somnath K <somnath.kotur@emulex.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sun, 20 Feb 2011 11:41:53 +0000 (11:41 +0000)]
be2net: fix to ignore transparent vlan ids wrongly indicated by NIC
With transparent VLAN tagging, the ASIC wrongly indicates packets with VLAN ID.
Strip them off in the driver. The VLAN Tag to be stripped will be given to the host
as an async message.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
About other function, the device does not support.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shan Wei [Sat, 19 Feb 2011 21:57:26 +0000 (21:57 +0000)]
sctp: fix compile warnings in sctp_tsnmap_num_gabs
net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Base on Ilpo's patch about documenting tcp_max_ssthresh.
(see http://marc.info/?l=linux-netdev&m=117950581307310&w=2)
According to errata of RFC3742, fix the number of segments increased
during RTT time.
Just to state the occasion to use this parameter, But
about how to set parameter value, maybe some others can do it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Bohac [Thu, 17 Feb 2011 13:12:08 +0000 (13:12 +0000)]
sctp: fix reporting of unknown parameters
commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 re-worked the
handling of unknown parameters. sctp_init_cause_fixed() can now
return -ENOSPC if there is not enough tailroom in the error
chunk skb. When this happens, the error header is not appended to
the error chunk. In that case, the payload of the unknown parameter
should not be appended either.
Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Fri, 18 Feb 2011 13:30:17 +0000 (13:30 +0000)]
net: dcb: match dcb_app protocol field with 802.1Qaz spec
The dcb_app protocol field is a __u32 however the 802.1Qaz
specification defines it as a 16 bit field. This patch brings
the structure inline with the spec making it a __u16.
CC: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Feb 2011 22:35:56 +0000 (22:35 +0000)]
tcp: fix inet_twsk_deschedule()
Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()
This is caused by inet_twsk_purge(), run from process context,
and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.
Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.
With help from Linus Torvalds and Eric W. Biederman
Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Pavel Emelyanov <xemul@openvz.org> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: stable <stable@kernel.org> (# 2.6.33+) Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
net: deinit automatic LIST_HEAD
net: dont leave active on stack LIST_HEAD
net: provide default_advmss() methods to blackhole dst_ops
tg3: Restrict phy ioctl access
drivers/net: Call netif_carrier_off at the end of the probe
ixgbe: work around for DDP last buffer size
ixgbe: fix panic due to uninitialised pointer
e1000e: flush all writebacks before unload
e1000e: check down flag in tasks
isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
cxgb4vf: Use defined Mailbox Timeout
cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
cxgb4vf: Check driver parameters in the right place ...
pch_gbe: Fix the MAC Address load issue.
iwlwifi: Delete iwl3945_good_plcp_health.
net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING
netfilter: nf_iterate: fix incorrect RCU usage
pch_gbe: Fix the issue that the receiving data is not normal.
...
Linus Torvalds [Fri, 18 Feb 2011 20:36:06 +0000 (12:36 -0800)]
Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: wake up a worker when a rescuer is leaving a gcwq
Eric Dumazet [Thu, 17 Feb 2011 22:59:19 +0000 (22:59 +0000)]
net: deinit automatic LIST_HEAD
commit 9b5e383c11b08784 (net: Introduce
unregister_netdevice_many()) left an active LIST_HEAD() in
rollback_registered(), with possible memory corruption.
Even if device is freed without touching its unreg_list (and therefore
touching the previous memory location holding LISTE_HEAD(single), better
close the bug for good, since its really subtle.
(Same fix for default_device_exit_batch() for completeness)
Reported-by: Michal Hocko <mhocko@suse.cz> Tested-by: Michal Hocko <mhocko@suse.cz> Reported-by: Eric W. Biderman <ebiderman@xmission.com> Tested-by: Eric W. Biderman <ebiderman@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Octavian Purdila <opurdila@ixiacom.com> CC: stable <stable@kernel.org> [.33+] Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 17 Feb 2011 22:54:38 +0000 (22:54 +0000)]
net: dont leave active on stack LIST_HEAD
Eric W. Biderman and Michal Hocko reported various memory corruptions
that we suspected to be related to a LIST head located on stack, that
was manipulated after thread left function frame (and eventually exited,
so its stack was freed and reused).
Eric Dumazet suggested the problem was probably coming from commit 443457242beb (net: factorize
sync-rcu call in unregister_netdevice_many)
This patch fixes __dev_close() and dev_close() to properly deinit their
respective LIST_HEAD(single) before exiting.
Reported-by: Michal Hocko <mhocko@suse.cz> Tested-by: Michal Hocko <mhocko@suse.cz> Reported-by: Eric W. Biderman <ebiderman@xmission.com> Tested-by: Eric W. Biderman <ebiderman@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Feb 2011 19:39:01 +0000 (11:39 -0800)]
net: provide default_advmss() methods to blackhole dst_ops
Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()
Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Feb 2011 19:32:28 +0000 (11:32 -0800)]
Expand CONFIG_DEBUG_LIST to several other list operations
When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.
However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.
So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..
Linus Torvalds [Fri, 18 Feb 2011 01:52:17 +0000 (17:52 -0800)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: add missing frac fb div flag for dce4+
drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
drm/nouveau: fix suspend/resume on GPUs that don't have PM support
drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
drm/nv40: fix tiling-related setup for a number of chipsets
drm/nouveau: fix non-EDIDful native mode selection
drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.
Vasanthy Kolluri [Thu, 17 Feb 2011 13:57:19 +0000 (13:57 +0000)]
enic: Always use single transmit and single receive hardware queues per device
We believe that our earlier patch for supporting multiple hardware
receive queues per enic device requires more internal testing. At this
point, we think that it's best to disable the use of multiple receive
queues. The current patch provides an effective means for the same.
Also, we continue to disallow multiple hardware transmit queues per
device. But change the way we enforce this in order to maintain
consistency with the way receive queues are handled.
Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: Danny Guo <dannguo@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David Wang <dwang2@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Feb 2011 22:08:44 +0000 (14:08 -0800)]
net: Add initial_ref arg to dst_alloc().
This allows avoiding multiple writes to the initial __refcnt.
The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".
Signed-off-by: David S. Miller <davem@davemloft.net>
John Stultz [Sat, 12 Feb 2011 01:45:40 +0000 (17:45 -0800)]
RTC: Revert UIE emulation removal
Uwe pointed out that my alarm based UIE emulation is not sufficient
to replace the older timer/polling based UIE emulation on devices
where there is no alarm irq. This causes rtc devices without alarms
to return -EINVAL to UIE ioctls. The fix is to re-instate the old
timer/polling method for devices without alarm irqs.
Michał Mirosław [Tue, 15 Feb 2011 16:59:17 +0000 (16:59 +0000)]
net: Introduce new feature setting ops
This introduces a new framework to handle device features setting.
It consists of:
- new fields in struct net_device:
+ hw_features - features that hw/driver supports toggling
+ wanted_features - features that user wants enabled, when possible
- new netdev_ops:
+ feat = ndo_fix_features(dev, feat) - API checking constraints for
enabling features or their combinations
+ ndo_set_features(dev) - API updating hardware state to match
changed dev->features
- new ethtool commands:
+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
and trigger device reconfiguration if resulting dev->features
changed
+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 17 Feb 2011 08:53:17 +0000 (08:53 +0000)]
enic: Clean up: Remove a not needed #ifdef
Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: Danny Guo <dannguo@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David Wang <dwang2@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 17 Feb 2011 08:53:12 +0000 (08:53 +0000)]
enic: Bug fix: Reset driver count of registered unicast addresses to zero during device reset
During a device reset, clear the counter for the no. of unicast addresses registered.
Also, rename the routines that update unicast and multicast address lists.
Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: Danny Guo <dannguo@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David Wang <dwang2@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Tue, 15 Feb 2011 12:51:10 +0000 (12:51 +0000)]
tg3: Restrict phy ioctl access
If management firmware is present and the device is down, the firmware
will assume control of the phy. If a phy access were allowed from the
host, it will collide with firmware phy accesses, resulting in
unpredictable behavior. This patch fixes the problem by disallowing phy
accesses during the problematic condition.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 15 Feb 2011 02:08:39 +0000 (02:08 +0000)]
drivers/net: Call netif_carrier_off at the end of the probe
Without calling of netif_carrier_off at the end of the probe the operstate
is unknown when the device is initially opened. By default the carrier is
on so when the device is opened and netif_carrier_on is called the link
watch event is not fired and operstate remains zero (unknown).
This patch fixes this behavior in forcedeth and r8169.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Marciniszyn [Wed, 16 Feb 2011 15:48:25 +0000 (15:48 +0000)]
IB/qib: Prevent double completions after a timeout or RNR error
There is a double completion associated with error handling for RC QPs.
The sequence is:
- The do_rc_ack() routine fields an RNR nack and there are 0
rnr_retries configured on the QP.
- qib_error_qp() stops the pending timer
- qib_rc_send_complete() is called from sdma_complete()
- qib_rc_send_complete() starts the timer because the msb of the psn
just completed says an ack is needed.
- a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
- rc_timeout() calls qib_restart_rc()
- qib_restart_rc() calls qib_send_complete() with a
IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
past
The fix avoids starting the timer since another packet will never
arrive.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Ben Hutchings [Wed, 5 Jan 2011 00:50:41 +0000 (00:50 +0000)]
sfc: Implement hardware acceleration of RFS
Use the existing filter management functions to insert TCP/IPv4 and
UDP/IPv4 4-tuple filters for Receive Flow Steering.
For each channel, track how many RFS filters are being added during
processing of received packets and scan the corresponding number of
table entries for filters that may be reclaimed. Do this in batches
to reduce lock overhead.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Tom Herbert [Wed, 16 Feb 2011 10:27:02 +0000 (10:27 +0000)]
bnx2x: Support for managing RX indirection table
Support fetching and retrieving RX indirection table via ethtool.
Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 17 Feb 2011 10:32:38 +0000 (11:32 +0100)]
netfilter: tproxy: do not assign timewait sockets to skb->sk
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:
if (skb->sk) {
read_lock_bh(&skb->sk->sk_callback_lock);
if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...
in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.
Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.
This does the latter.
If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.
The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.