Jack Steiner [Tue, 8 Nov 2011 00:19:55 +0000 (11:19 +1100)]
x86: reduce clock calibration time during slave cpu startup
Reduce the startup time for slave cpus.
Adds hooks for an arch-specific function for clock calibration. These
hooks are used on x86. If a newly started cpu has the same phys_proc_id
as a core already active, uses the TSC for the delay loop and has a
CONSTANT_TSC, use the already-calculated value of loops_per_jiffy.
This patch reduces the time required to start slave cpus on a 4096 cpu
system from: 465 sec OLD 62 sec NEW
This reduces boot time on a 4096p system by almost 7 minutes. Nice...
Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohua Li [Tue, 8 Nov 2011 00:19:54 +0000 (11:19 +1100)]
x86: tlb flush avoid superflous leave_mm()
If just one page VA tlb is required to be flushed and current task is in
lazy TLB state, doing leave_mm() is superfluous because it flushes the
whole TLB. This can reduce some TLB miss.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
The last parameter to sort() is a pointer to the function used to swap
items. This parameter should be NULL, not 0, when not used. This quiets
the following sparse warning:
warning: Using plain integer as NULL pointer
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Tue, 8 Nov 2011 00:19:53 +0000 (11:19 +1100)]
drivers/power/intel_mid_battery.c: fix build
Seems that nobody's even trying any more.
Cc: Nithish Mahalingam <nithish.mahalingam@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Major Lee <major_lee@wistron.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ludwig Nussel [Tue, 8 Nov 2011 00:19:51 +0000 (11:19 +1100)]
x86: fix mmap random address range
On x86_32 casting the unsigned int result of get_random_int() to long may
result in a negative value. On x86_32 the range of mmap_rnd() therefore
was -255 to 255. The 32bit mode on x86_64 used 0 to 255 as intended.
The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c") in January
2008.
Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shérab [Tue, 8 Nov 2011 00:19:51 +0000 (11:19 +1100)]
arch/x86/platform/iris/iris.c: register a platform device and a platform driver
This makes the iris driver use the platform API, so it is properly exposed
in /sys.
[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout] Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Cc: Len Brown <lenb@kernel.org> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Bligh [Tue, 8 Nov 2011 00:19:50 +0000 (11:19 +1100)]
net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
Problem:
A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.
A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.
Analysis:
The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.
The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.
I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.
Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).
Patch:
The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.
Applicability:
If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.
Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy
Signed-off-by: Alex Bligh <alex@alex.org.uk> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jiri Pirko [Fri, 11 Nov 2011 22:16:48 +0000 (22:16 +0000)]
net: introduce ethernet teaming device
This patch introduces new network device called team. It supposes to be
very fast, simple, userspace-driven alternative to existing bonding
driver.
Userspace library called libteam with couple of demo apps is available
here:
https://github.com/jpirko/libteam
Note it's still in its dipers atm.
team<->libteam use generic netlink for communication. That and rtnl
suppose to be the only way to configure team device, no sysfs etc.
Python binding of libteam was recently introduced.
Daemon providing arpmon/miimon active-backup functionality will be
introduced shortly. All what's necessary is already implemented in
kernel team driver.
v7->v8:
- check ndo_ndo_vlan_rx_[add/kill]_vid functions before calling
them.
- use dev_kfree_skb_any() instead of dev_kfree_skb()
v6->v7:
- transmit and receive functions are not checked in hot paths.
That also resolves memory leak on transmit when no port is
present
v5->v6:
- changed couple of _rcu calls to non _rcu ones in non-readers
v4->v5:
- team_change_mtu() uses team->lock while travesing though port
list
- mac address changes are moved completely to jurisdiction of
userspace daemon. This way the daemon can do FOM1, FOM2 and
possibly other weird things with mac addresses.
Only round-robin mode sets up all ports to bond's address then
enslaved.
- Extended Kconfig text
v3->v4:
- remove redundant synchronize_rcu from __team_change_mode()
- revert "set and clear of mode_ops happens per pointer, not per
byte"
- extend comment of function __team_change_mode()
v2->v3:
- team_change_mtu() uses rcu version of list traversal to unwind
- set and clear of mode_ops happens per pointer, not per byte
- port hashlist changed to be embedded into team structure
- error branch in team_port_enter() does cleanup now
- fixed rtln->rtnl
v1->v2:
- modes are made as modules. Makes team more modular and
extendable.
- several commenters' nitpicks found on v1 were fixed
- several other bugs were fixed.
- note I ignored Eric's comment about roundrobin port selector
as Eric's way may be easily implemented as another mode (mode
"random") in future.
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:32 +0000 (04:34 +0000)]
bnx2x: update driver version to 1.70.35-0
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 13 Nov 2011 04:34:31 +0000 (04:34 +0000)]
bnx2x: Remove on-stack napi struct variable
Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:30 +0000 (04:34 +0000)]
bnx2x: prevent race in statistics flow
The race may cause access of registers while MAC hw block is
in reset state. As a result syslog will show error messages.
We can prevent this by using state from local variable.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 13 Nov 2011 04:34:29 +0000 (04:34 +0000)]
bnx2x: add fan failure event handling
Shut down the device in case of fan failure to prevent HW damage.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:28 +0000 (04:34 +0000)]
bnx2x: remove unused #define
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:27 +0000 (04:34 +0000)]
bnx2x: simplify definition of RX_SGE_MASK_LEN and use it.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:26 +0000 (04:34 +0000)]
bnx2x: DCBX: use #define instead of magic
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:25 +0000 (04:34 +0000)]
bnx2x: propagate DCBX negotiation
We need propagate the DCBX results from PMF to other functions
on the same port, in order to properly update netdev structure
and allow following new ETS and PFC configurations.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>