git.karo-electronics.de Git - linux-beck.git/log

igb_ptp: Include clocksource.h to get CLOCKSOURCE_MASK.

Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: Include clocksource.h to get CLOCKSOURCE_MASK.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fib_trie-next'

Alexander Duyck says:

====================
fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%

These patches are meant to address several performance issues I have seen
in the fib_trie implementation, and fib_table_lookup specifically.  With
these changes in place I have seen a reduction of up to 35 to 75% for the
total time spent in fib_table_lookup depending on the type of search being
performed.

On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
this resulted in a reduction of over 370ns per packet in the total time to
process packets received from an ixgbe interface and route them to a dummy
interface.  This represents a failed lookup in the local trie followed by
a successful search in the main trie.

Baseline Refactor
  ixgbe->dummy routing 1.20Mpps 2.21Mpps
  ------------------------------------------------------------
  processing time per packet 835ns 453ns
  fib_table_lookup 50.1% 418ns 25.0% 113ns
  check_leaf.isra.9 7.9% 66ns    -- --
  ixgbe_clean_rx_irq 5.3% 44ns 9.8% 44ns
  ip_route_input_noref 2.9% 25ns 4.6% 21ns
  pvclock_clocksource_read 2.6% 21ns 4.6% 21ns
  ip_rcv 2.6% 22ns 4.0% 18ns

In the simple case of receiving a frame and dropping it before it can reach
the socket layer I saw a reduction of 40ns per packet.  This represents a
trip through the local trie with the correct leaf found with no need for
any backtracing.

Baseline Refactor
  ixgbe->local receive 2.65Mpps 2.96Mpps
  ------------------------------------------------------------
  processing time per packet 377ns 337ns
  fib_table_lookup 25.1% 95ns 25.8% 87ns
  ixgbe_clean_rx_irq 8.7% 33ns 9.0% 30ns
  check_leaf.isra.9 7.2% 27ns    -- --
  ip_rcv 5.7% 21ns 6.5% 22ns

These changes have resulted in several functions being inlined such as
check_leaf and fib_find_node, but due to the code simplification the
overall size of the code has been reduced.

   text    data     bss     dec     hex filename
  16932     376      16   17324    43ac net/ipv4/fib_trie.o - before
  15259     376       8   15643    3d1b net/ipv4/fib_trie.o - after

Changes since RFC:
  Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
  Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Add tracking value for suffix length

This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode. With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.

By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Remove checks for index >= tnode_child_length from tnode_get_child

For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change.  As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.

I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.

size net/ipv4/fib_trie.o
Before:
   text    data     bss     dec     hex filename
  15506     376       8   15890    3e12 net/ipv4/fib_trie.o

After:
   text    data     bss     dec     hex filename
  14827     376       8   15211    3b6b net/ipv4/fib_trie.o

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: inflate/halve nodes in a more RCU friendly way

This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well. By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.

I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Push tnode flushing down to inflate/halve

This change pushes the tnode freeing down into the inflate and halve
functions.  It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.

I believe this may address a bug in the freeing logic as well.  For some
reason if the freelist got to a certain size we would call
synchronize_rcu().  I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu().  As such that is what I have updated the behavior to be.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Push assignment of child to parent down into inflate/halve

This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize. By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Add functions should_inflate and should_halve

This change pulls the logic for if we should inflate/halve the nodes out
into separate functions.  It also addresses what I believe is a bug where 1
full node is all that is needed to keep a node from ever being halved.

Simple script to reproduce the issue:
modprobe dummy; ifconfig dummy0 up
for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done
ifconfig dummy0:256 10.0.255.33/16 up
for i in `seq 0 254`; do ifconfig dummy0:$i down; done

Results from /proc/net/fib_triestat
Before:
Local:
Aver depth:     3.00
Max depth:      4
Leaves:         17
Prefixes:       18
Internal nodes: 11
  1: 8  2: 2  10: 1
Pointers: 1048
Null ptrs: 1021
Total size: 11  kB
After:
Local:
Aver depth:     3.41
Max depth:      5
Leaves:         17
Prefixes:       18
Internal nodes: 12
  1: 8  2: 3  3: 1
Pointers: 36
Null ptrs: 8
Total size: 3  kB

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Move resize to after inflate/halve

This change consists of a cut/paste of resize to behind inflate and halve
so that I could remove the two function prototypes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Push rcu_read_lock/unlock to callers

This change is to start cleaning up some of the rcu_read_lock/unlock
handling. I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.

A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Use unsigned long for anything dealing with a shift by bits

This change makes it so that anything that can be shifted by, or compared
to a value shifted by bits is updated to be an unsigned long. This is
mostly a precaution against an insanely huge address space that somehow
starts coming close to the 2^32 root node size which would require
something like 1.5 billion addresses.

I chose unsigned long instead of unsigned long long since I do not believe
it is possible to allocate a 32 bit tnode on a 32 bit system as the memory
consumed would be 16GB + 28B which exceeds the addressible space for any
one process.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Update meaning of pos to represent unchecked bits

This change moves the pos value to the other side of the "bits" field.  By
doing this it actually simplifies a significant amount of code in the trie.

For example when halving a tree we know that the bit lost exists at
oldnode->pos, and if we inflate the tree the new bit being add is at
tn->pos.  Previously to find those bits you would have to subtract pos and
bits from the keylength or start with a value of (1 << 31) and then shift
that.

There are a number of spots throughout the code that benefit from this.  In
the case of the hot-path searches the main advantage is that we can drop 2
or more operations from the search path as we no longer need to compute the
value for the index to be shifted by and can instead just use the raw pos
value.

In addition the tkey_extract_bits is now defunct and can be replaced by
get_index since the two operations were doing the same thing, but now
get_index does it much more quickly as it is only an xor and shift versus a
pair of shifts and a subtraction.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Optimize fib_table_insert

This patch updates the fib_table_insert function to take advantage of the
changes made to improve the performance of fib_table_lookup. As a result
the code should be smaller and run faster then the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Optimize fib_find_node

This patch makes use of the same features I made use of for
fib_table_lookup to streamline fib_find_node. The resultant code should be
smaller and run faster than the original.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Optimize fib_table_lookup to avoid wasting time on loops/variables

This patch is meant to reduce the complexity of fib_table_lookup by reducing
the number of variables to the bare minimum while still keeping the same if
not improved functionality versus the original.

Most of this change was started off by the desire to rid the function of
chopped_off and current_prefix_length as they actually added very little to
the function since they only applied when computing the cindex.  I was able
to replace them mostly with just a check for the prefix match.  As long as
the prefix between the key and the node being tested was the same we know
we can search the tnode fully versus just testing cindex 0.

The second portion of the change ended up being a massive reordering.
Originally the calls to check_leaf were up near the start of the loop, and
the backtracing and descending into lower levels of tnodes was later.  This
didn't make much sense as the structure of the tree means the leaves are
always the last thing to be tested.  As such I reordered things so that we
instead have a loop that will delve into the tree and only exit when we
have either found a leaf or we have exhausted the tree.  The advantage of
rearranging things like this is that we can fully inline check_leaf since
there is now only one reference to it in the function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Merge leaf into tnode

This change makes it so that leaf and tnode are the same struct. As a
result there is no need for rt_trie_node anymore since everyting can be
merged into tnode.

On 32b systems this results in the leaf being 4 bytes larger, however I
don't know if that is really an issue as this and an eariler patch that
added bits & pos have increased the size from 20 to 28. If I am not
mistaken slub/slab allocate on power of 2 sizes so 20 was likely being
rounded up to 32 anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Merge tnode_free and leaf_free into node_free

Both the leaf and the tnode had an rcu_head in them, but they had them in
slightly different places. Since we now have them in the same spot and
know that any node with bits == 0 is a leaf and the rest are either vmalloc
or kmalloc tnodes depending on the value of bits it makes it easy to combine
the functions and reduce overhead.

In addition I have taken advantage of the rcu_head pointer to go ahead and
put together a simple linked list instead of using the tnode pointer as
this way we can merge either type of structure for freeing.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Make leaf and tnode more uniform

This change makes some fundamental changes to the way leaves and tnodes are
constructed.  The big differences are:
1.  Leaves now populate pos and bits indicating their full key size.
2.  Trie nodes now mask out their lower bits to be consistent with the leaf
3.  Both structures have been reordered so that rt_trie_node now consisists
    of a much larger region including the pos, bits, and rcu portions of
    the tnode structure.

On 32b systems this will result in the leaf being 4B larger as the pos and
bits values were added to a hole created by the key as it was only 4B in
length.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Update usage stats to be percpu instead of global variables

The trie usage stats were currently being shared by all threads that were
calling fib_table_lookup. As a result when multiple threads were
performing lookups simultaneously the trie would begin to cache bounce
between those threads.

In order to prevent this I have updated the usage stats to use a set of
percpu variables. By doing this we should be able to avoid the cache
bouncing and still make use of these stats.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gre: allow live address change

The GRE tap device supports Ethernet over GRE, but doesn't
care about the source address of the tunnel, therefore it
can be changed without bring device down.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp : multicast notification to the registered listeners

Previously l2tp module did not provide any means for the user space to
get notified when tunnels/sessions are added/modified/deleted.
This change contains the following
- create a multicast group for the listeners to register.
- notify the registered listeners when the tunnels/sessions are
created/modified/deleted.

Signed-off-by: Bill Hong <bhong@brocade.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Sven-Thorsten Dietrich <sven@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: return proper error code from tun_do_read

Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0. This
fixes this by properly returning the error code from __skb_recv_datagram.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Fixed unsigned/signed comparison

Validated that this was actually using the unsigned comparison with gdb.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: replace 0 by NULL for pointers

Fix sparse warning:
net/tipc/link.c:1924:40: warning: Using plain integer as NULL pointer

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'enic-next'

Govindarajulu Varadarajan says:

====================
enic: Check for DMA mapping error

After dma mapping the buffers, enic does not call dma_mapping_error() to check
if mapping is successful.

This series fixes the issue by checking return value of pci_dma_mapping_error()
after pci_map_single().

This is reported by redhat here
https://bugzilla.redhat.com/show_bug.cgi?id=1145016
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

enic: add stats for dma mapping error

This patch adds generic statistics for enic. As of now dma_map_error is the only
member. dma_map_erro is incremented every time dma maping error happens.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: check dma_mapping_error

This patch checks for pci_dma_mapping_error() after dma mapping the data.
If the dma mapping fails we remove the previously queued frags and return
NETDEV_TX_OK.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: make vnic_wq_buf doubly linked

This patch makes vnic_wq_buf doubly liked list. This is needed for dma_mapping
error check, in case some frag's dma map fails, we need to move back and remove
previously queued buffers.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fec-next'

Fugang Duan says:

====================
net: fec: add Wake-on-LAN support

The patch series enable FEC Wake-on-LAN feature for i.MX6q/dl and i.MX6SX SOCs.
FEC HW support sleep mode, when system in suspend status with FEC all clock gate
off, magic packet can wake up system. For different SOCs, there have special SOC
GPR register to let FEC enter sleep mode or exit sleep mode, add these to platform
callback for driver' call.

Patch#1: add WOL interface supports.
Patch#2: add SOC special sleep of/off operations for driver's sleep callback.
Patch#3: add magic pattern support for devicetree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: imx6qdl: enable FEC magic-packet feature

Add FEC magic-packet feature support.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: imx: add FEC sleep mode callback function

i.MX6q/dl, i.MX6SX SOCs enet support sleep mode that magic packet can
wake up system in suspend status. For different SOCs, there have some
SOC specifical GPR register to set sleep on/off mode. So add these to
callback function for driver.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: add Wake-on-LAN support

Support for Wake-on-LAN using Magic Packet. ENET IP supports sleep mode
in low power status, when system enter suspend status, Magic packet can
wake up system even if all SOC clocks are gate. The patch doing below things:
- flagging the device as a wakeup source for the system, as well as
its Wake-on-LAN interrupt
- prepare the hardware for entering WoL mode
- add standard ethtool WOL interface
- enable the ENET interrupt to wake us

Tested on i.MX6q/dl sabresd, sabreauto boards, i.MX6SX arm2 boards.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gianfar: add missing __iomem annotation

Fix the following spare warning:
drivers/net/ethernet/freescale/gianfar.c:3521:60: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:3521:60:    expected unsigned int [noderef] <asn:2>*addr
drivers/net/ethernet/freescale/gianfar.c:3521:60:    got unsigned int [usertype] *rfbptr
drivers/net/ethernet/freescale/gianfar.c:205:16: warning: incorrect type in assignment (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:205:16:    expected unsigned int [usertype] *rfbptr
drivers/net/ethernet/freescale/gianfar.c:205:16:    got unsigned int [noderef] <asn:2>*<noident>
drivers/net/ethernet/freescale/gianfar.c:2918:44: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/freescale/gianfar.c:2918:44:    expected unsigned int [noderef] <asn:2>*addr
drivers/net/ethernet/freescale/gianfar.c:2918:44:    got unsigned int [usertype] *rfbptr

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gianfar: mark the local functions static

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: don't do header check for dodgy gso packets

There's no need to do header check for virtio-net since:

- Host sets dodgy for all gso packets from guest and check the header.
- Host should be prepared for all kinds of evil packets from guest, since
malicious guest can send any kinds of packet.

So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm: sa1100: move irda header to linux/platform_data

In the end asm/mach/irda.h header is not used by anybody except sa1100.
Move the header to the platform data includes dir and rename it to
irda-sa11x0.h.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sxgbe: Use setup_timer

Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ksz884x: Use setup_timer

Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1e: Use setup_timer

Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

atheros: atlx: Use setup_timer

Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,f,d;
@@

-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'timecounter'

Richard Cochran says:

====================
Time Counter fixes and improvements

Several PTP Hardware Clock (PHC) drivers implement the clock in
software using the timecounter/cyclecounter code. This series adds one
simple improvement and one more subtle fix to the shared timecounter
facility. Credit for this series goes to Janusz Użycki, who pointed
the issues out to me off list.

Patch #1 simply move the timecounter code into its own file. When
working on this series, it was really annoying to see half the kernel
recompile after every tweak to the timecounter stuff. There is no
reason to keep this together with the clocksource code.

Patch #2 implements an improved adjtime() method, and patches 3-10
convert all of the drivers over to the new method.

Patch #11 fixes a subtle but important issue with the timecounter WRT
frequency adjustment. As it stands now, a timecounter based PHC will
exhibit a variable frequency resolution (and variable time error)
depending on how often the clock is read.

In timecounter_read_delta(), the expression

   (delta * cc->mult) >> cc->shift;

can lose resolution from the adjusted value of 'mult'. If the value
of 'delta' is too small, then small changes in 'mult' have no effect.
However, if the delta value is large enough, then small changes in
'mult' will have an effect.

Reading the clock too often means smaller 'delta' values which in turn
will spoil the fine adjustments made to 'mult'. Up until now, this
effect did not show up in my testing. The following example explains
why.

The CPTS has an input clock of 250 MHz, and the clock source uses
mult=0x80000000 and shift=29, making the ticks to nanoseconds
conversion like this:

   ticks * 2^31
   ------------
       2^29

Imagine what happens if the clock is read every 10 milliseconds. Ten
milliseconds are about 2500000 ticks, which corresponds to about 21
bits. The product in the numerator has then 52 bits. After the shift
operation, 23 bits are preserved. This results in a frequency
adjustment resolution of about 0.1 ppm (not _too_ bad.)

A frequency resolution of 1 ppm requires 20 bits.
A frequency resolution of 1 ppb requires 30 bits.

For the 250 MHz CPTS clock, reading every 4 seconds yields a 1 ppb
resolution (which is the finest that our API allows).

However, the error can be much higher if the clock is read too often
or if time stamps occur close in time to read operations. In general
it is really not acceptable to allow the rate of clock readings to
influence the clock accuracy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

timecounter: keep track of accumulated fractional nanoseconds

The current timecounter implementation will drop a variable amount
of resolution, depending on the magnitude of the time delta. In
other words, reading the clock too often or too close to a time
stamp conversion will introduce errors into the time values. This
patch fixes the issue by introducing a fractional nanosecond field
that accumulates the low order bits.

Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cpts: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mlx4: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ixgbe: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: igb: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: e1000e: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bnx2x: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: xgbe: convert to timecounter adjtime.

This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

timecounter: provide a helper function to shift the time.

Some PTP Hardware Clock drivers use a struct timecounter to represent
their clock. To adjust the time by a given offset, these drivers all
perform a two step read/write of their timecounter. However, it is
better and simpler just to adjust the offset in one step. This patch
introduces a little routine to help drivers implement the adjtime
method.

Suggested-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

time: move the timecounter/cyclecounter code into its own file.

The timecounter code has almost nothing to do with the clocksource
code. Let it live in its own file. This will help isolate the
timecounter users from the clocksource users in the source tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.

2) Fix receive checksum handling in enic driver, from Govindarajulu
    Varadarajan.

3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
    Herbert Xu.  Also, add code to detect drivers that have this mistake
    in the future.

4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.

5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
    input path,f rom Nicolas Dichtel.

6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.

7) Fix double SKB free in vxlan driver, also from Pravin.

8) When we scrub a packet, which happens when we are switching the
    context of the packet (namespace, etc.), we should reset the
    secmark.  From Thomas Graf.

9) ->ndo_gso_check() needs to do more than return true/false, it also
    has to allow the driver to clear netdev feature bits in order for
    the caller to be able to proceed properly.  From Jesse Gross.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
  netlink/genetlink: pass network namespace to bind/unbind
  ne2k-pci: Add pci_disable_device in error handling
  bonding: change error message to debug message in __bond_release_one()
  genetlink: pass multicast bind/unbind to families
  netlink: call unbind when releasing socket
  netlink: update listeners directly when removing socket
  genetlink: pass only network namespace to genl_has_listeners()
  netlink: rename netlink_unbind() to netlink_undo_bind()
  net: Generalize ndo_gso_check to ndo_features_check
  net: incorrect use of init_completion fixup
  neigh: remove next ptr from struct neigh_table
  net: xilinx: Remove unnecessary temac_property in the driver
  net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
  net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
  openvswitch: fix odd_ptr_err.cocci warnings
  Bluetooth: Fix accepting connections when not using mgmt
  Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
  brcmfmac: Do not crash if platform data is not populated
  ipw2200: select CFG80211_WEXT
  ...

Merge tag 'linux-kselftest-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
"Fix exec test compile warnings"

* tag 'linux-kselftest-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/exec: Use %zu to format size_t

Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
"A set of three minor cifs fixes"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: make new inode cache when file type is different
  Fix signed/unsigned pointer warning
  Convert MessageID in smb2_hdr to LE

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull UDF & isofs fixes from Jan Kara:
"A couple of UDF fixes of handling of corrupted media and one iso9660
  fix of the same"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Reduce repeated dereferences
  udf: Check component length before reading it
  udf: Check path length when reading symlink
  udf: Verify symlink size before loading it
  udf: Verify i_size when loading inode
  isofs: Fix unchecked printing of ER records

Merge tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI material from Rafael J Wysocki:
"These are fixes (operating performance points library, cpufreq-dt
  driver, cpufreq core, ACPI backlight, cpupower tool), cleanups
  (cpuidle), new processor IDs for the RAPL (Running Average Power
  Limit) power capping driver, and a modification of the generic power
  domains framework allowing modular drivers to call one of its helper
  functions.

  Specifics:

   - Fix for a potential NULL pointer dereference in the cpufreq core
     due to an initialization race condition (Ethan Zhao).

   - Fixes for abuse of the OPP (Operating Performance Points) API
     related to RCU and other minor issues in the OPP library and the
     cpufreq-dt driver (Dmitry Torokhov).

   - cpuidle governors cleanup making them measure idle duration in a
     better way without using the CPUIDLE_FLAG_TIME_INVALID flag which
     allows that flag to be dropped from the ACPI cpuidle driver and
     from the core too (Len Brown).

   - New ACPI backlight blacklist entries for Samsung machines without a
     working native backlight interface that need to use the ACPI
     backlight instead (Aaron Lu).

   - New CPU IDs of future Intel Xeon CPUs for the Intel RAPL power
     capping driver (Jacob Pan).

   - Generic power domains framework modification to export the
     of_genpd_get_from_provider() function to modular drivers that will
     allow future driver modifications to be based on the mainline (Amit
     Daniel Kachhap).

   - Two fixes for the cpupower tool (Michal Privoznik, Prarit
     Bhargava)"

* tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Add some Samsung models to disable_native_backlight list
  tools / cpupower: Fix no idle state information return value
  tools / cpupower: Correctly detect if running as root
  cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
  cpufreq-dt: defer probing if OPP table is not ready
  PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
  PM / OPP: fix warning in of_free_opp_table()
  PM / OPP: add some lockdep annotations
  powercap / RAPL: add IDs for future Xeon CPUs
  PM / Domains: Export of_genpd_get_from_provider function
  cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
  cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
  cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID

genetlink: A genl_bind() to an out-of-range multicast group should not WARN().

Users can request to bind to arbitrary multicast groups, so warning
when the requested group number is out of range is not appropriate.

And with the warning removed, and the 'err' variable properly given
an initial value, we can remove 'found' altogether.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'spi-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few driver specific fixes here, the DMA burst size increase in the
  spfi driver is a fix to make the hardware happier in some situations"

* tag 'spi-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: img-spfi: Increase DMA burst size
  spi: img-spfi: Enable controller before starting TX DMA
  spi: sh-msiof: Add runtime PM lock in initializing

Merge tag 'regulator-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull one regulator fix from Mark Brown:
"One fix here, a fix for the voltage mapping on one of the s2mps11
  regulators which broke systems using it including apparently the
  Gear 2 smartwatches"

* tag 'regulator-v3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: s2mps11: Fix dw_mmc failure on Gear 2

Merge tag 'mmc-v3.19-2' of git://git.linaro.org/people/ulf.hansson/mmc

Pull one MMC fix from Ulf Hansson:
"MMC core:

- Fix selection of buswidth for mmc hosts supporting 1-bit only"

* tag 'mmc-v3.19-2' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: core: stop trying to switch width when only one bit is supported

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux

Pull thermal management updates from Zhang Rui:
"First of all, the most important change is the thermal cpu cooling
  fixes.  The major fix here is to have proper sequencing between
  cpufreq layer and thermal cpu cooling registration.  A take away of
  this fix is an improvement in the thermal drivers code.  Thermal
  drivers that require cpu cooling do not need to check for cpufreq
  layer.  The requirement now is to propagate the error code, if any,
  while registering cpu cooling device.  Thanks to Viresh for
  implementing the required CPUfreq changes.

  Second, a new driver is introduced for int340x processor thermal
  device.  Given that int340x thermal is disabled by default, and this
  processor thermal device is only available on limited platforms, plus
  the driver does nothing but exposes some thermal limitation
  information for user space to use, thus I think it is safe to include
  it in this pull request after missing 3.19-rc2.

  Specifics:

   - Thermal cpu cooling fixes and cleanups.

   - introduce INT340X processor thermal reporting device driver.

   - several small fixes and cleanups for int340x thermal drivers"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (43 commits)
  Thermal/int340x/int3403: Free acpi notification handler
  Thermal/int340x/processor_thermal: Fix memory leak
  Thermal/int340x/int3403: Fix memory leak
  thermal: int340x: Introduce processor reporting device
  thermal: int340x_thermal: drop owner assignment from platform_drivers
  thermal: drop owner assignment from platform_drivers
  thermal: cpu_cooling: document node in struct cpufreq_cooling_device
  thermal/powerclamp: add ids for future xeon cpus
  Thermal/int340x: Handle properly the case when _trt or _art acpi entry is missing
  thermal: cpu_cooling: return ERR_PTR() for !CPU_THERMAL or !THERMAL_OF
  thermal: cpu_cooling: small memory leak on error
  thermal: ti-soc-thermal: Do not print error message in the EPROBE_DEFER case
  thermal: db8500: Do not print error message in the EPROBE_DEFER case
  thermal: imx: Do not print error message in the EPROBE_DEFER case
  thermal: Fix cdev registration with THERMAL_NO_LIMIT on 64bit
  drivers: thermal: Remove ARCH_HAS_BANDGAP dependency for samsung
  thermal:core:fix: Check return code of the ->get_max_state() callback
  thermal: cpu_cooling: update copyright tags
  thermal: cpu_cooling: Use cpufreq_dev->freq_table for finding level/freq
  thermal: cpu_cooling: Store frequencies in descending order
  ...

mm: get rid of radix tree gfp mask for pagecache_get_page

Commit 2457aec63745 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.

Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks.  This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.

Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree.  Just make
sure that the radix tree gets only the sane subset of the mask (e.g.  do
not pass __GFP_WRITE).

Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'acpi-video'

* acpi-video:
ACPI / video: Add some Samsung models to disable_native_backlight list

Merge branches 'pm-domains', 'powercap' and 'pm-tools'

* pm-domains:
  PM / Domains: Export of_genpd_get_from_provider function

* powercap:
  powercap / RAPL: add IDs for future Xeon CPUs

* pm-tools:
  tools / cpupower: Fix no idle state information return value
  tools / cpupower: Correctly detect if running as root

Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
  cpufreq-dt: defer probing if OPP table is not ready

* pm-cpuidle:
  cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
  cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
  cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID

Merge branch 'pm-opp'

* pm-opp:
  PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
  PM / OPP: fix warning in of_free_opp_table()
  PM / OPP: add some lockdep annotations

mmc: core: stop trying to switch width when only one bit is supported

mmc_select_bus_width() will try to switch to MMC_BUS_WIDTH_4 even if
MMC_CAP_4_BIT_DATA and MMC_CAP_8_BIT_DATA are not set in host->caps.
Return as soon as possible when those flags are not set

Fixes: 577fb13199b1 (mmc: rework selection of bus speed mode)
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: <stable@vger.kernel.org> # 3.17
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Linux 3.19-rc2

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"The important fixes are for two bugs introduced by the merge window.

  On top of this, add a couple of WARN_ONs and stop spamming dmesg on
  pretty much every boot of a virtual machine"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: warn on more invariant breakage
  kvm: fix sorting of memslots with base_gfn == 0
  kvm: x86: drop severity of "generation wraparound" message
  kvm: x86: vmx: reorder some msr writing

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"An embarrassing bug in lustre patches from this cycle ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[regression] braino in "lustre: use is_root_inode()"

kvm: warn on more invariant breakage

Modifying a non-existent slot is not allowed. Also check that the
first loop doesn't move a deleted slot beyond the used part of
the mslots array.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: fix sorting of memslots with base_gfn == 0

Before commit 0e60b0799fed (kvm: change memslot sorting rule from size
to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
that a valid memslot couldn't have its sorting key equal to zero.
On the other hand, a valid memslot can have base_gfn == 0, and invalid
memslots are identified by base_gfn == npages == 0.

Because of this, commit 0e60b0799fed broke the invariant that invalid
memslots are at the end of the mslots array. When a memslot with
base_gfn == 0 was created, any invalid memslot before it were left
in place.

This can be fixed by changing the insertion to use a ">=" comparison
instead of "<=", but some care is needed to avoid breaking the case
of deleting a memslot; see the comment in update_memslots.

Thanks to Tiejun Chen for posting an initial patch for this bug.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'sound-3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a couple of fixes for the new Intel Skylake HD-audio support"

* tag 'sound-3.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda_intel: apply the Seperate stream_tag for Skylake
ALSA: hda_controller: Separate stream_tag for input and output streams.

kvm: x86: drop severity of "generation wraparound" message

Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.

Cc: stable@vger.kernel.org
Fixes: 7a2e8aaf0f6873b47bc2347f216ea5b0e4c258ab
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: x86: vmx: reorder some msr writing

The commit 34a1cd60d17f, "x86: vmx: move some vmx setting from
vmx_init() to hardware_setup()", tried to refactor some codes
specific to vmx hardware setting into hardware_setup(), but some
msr writing should depend on our previous setting condition like
enable_apicv, enable_ept and so on.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

netlink/genetlink: pass network namespace to bind/unbind

Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ne2k-pci: Add pci_disable_device in error handling

For linux-3.18.0
The driver lacks pci_disable_device in error handling code of
ne2k_pci_init_one, so the device enabled by pci_enable_device is not
disabled when errors occur.
This patch fixes this problem.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: change error message to debug message in __bond_release_one()

In __bond_release_one(), when the interface is not a slave or not a slave of
"this" master, it log error message.

The message actually should be a debug message matching what bond_enslave()
does.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netlink_multicast'

Johannes Berg says:

====================
netlink/genetlink cleanups & multicast improvements

I'm looking at using the multicast group functionality in a way that would
benefit from knowing when there are subscribers to avoid collecting the
required data when there aren't any. During this I noticed that the unbind
for multicast groups doesn't actually work - it's never called when sockets
are closed. Luckily, nobody actually uses the functionality.

While looking at the code trying to find why it's not called and where the
multicast listeners are actually removed, I found the potential cleanup in
patch 3. Patch 2 also has a cleanup for a generic netlink API in this area.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: pass multicast bind/unbind to families

In order to make the newly fixed multicast bind/unbind
functionality in generic netlink, pass them down to the
appropriate family.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: call unbind when releasing socket

Currently, netlink_unbind() is only called when the socket
explicitly unbinds, which limits its usefulness (luckily
there are no users of it yet anyway.)

Call netlink_unbind() also when a socket is released, so it
becomes possible to track listeners with this callback and
without also implementing a netlink notifier (and checking
netlink_has_listeners() in there.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: update listeners directly when removing socket

The code is now confusing to read - first in one function down
(netlink_remove) any group subscriptions are implicitly removed
by calling __sk_del_bind_node(), but the subscriber database is
only updated far later by calling netlink_update_listeners().

Move the latter call to just after removal from the list so it
is easier to follow the code.

This also enables moving the locking inside the kernel-socket
conditional, which improves the normal socket destruction path.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: pass only network namespace to genl_has_listeners()

There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: rename netlink_unbind() to netlink_undo_bind()

The new name is more expressive - this isn't a generic unbind
function but rather only a little undo helper for use only in
netlink_bind().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[regression] braino in "lustre: use is_root_inode()"

In one of the places (ll_md_blocking_ast()) we had open-coded
!is_root_inode(inode) and replaced it with is_root_inode(inode).
See the last chunk of f76c23:
-                   inode != inode->i_sb->s_root->d_inode)
+                   is_root_inode(inode))
should've been
+                   !is_root_inode(inode))
obviously...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
Here's one more bluetooth pull request for 3.19. We've got two fixes:

- Fix for accepting connections with old user space versions of BlueZ
- Fix for Bluetooth controllers that don't have a public address

Both of these are regressions that were introduced in 3.17, so the
appropriate Cc: stable annotations are provided.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2014-12-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

o Paul made a Kconfig dependency fix to ipw2200, it was not possible to
  enable that driver because Wireless Extensions is now disabled by default.

o Mika fixed brcmfmac not to crash when platform data is not populated

o Emmanuel provided few fixes to iwlwifi, he says:

  "I have here new device IDs and a fix for double free bug I
  introduced. I also fix an issue with the RFKILL interrupt - the HW
  needs us to ACK the interrupt again after we reset it. Liad fixes an
  issue with the firmware debugging infrastructure. While working on
  torture scenarios of firmware restarts, Eliad found an issue which
  he fixed."

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Generalize ndo_gso_check to ndo_features_check

GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f22 ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: incorrect use of init_completion fixup

The second init_completion call should be a reinit_completion here.

patch is against 3.18.0 linux-next

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: remove next ptr from struct neigh_table

After commit
d7480fd3b173 ("neigh: remove dynamic neigh table registration support"),
this field is not used anymore.

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: xilinx: Remove unnecessary temac_property in the driver

This property is no longer used in the code yet the code looks for it in the device tree.
It does not cause an error if it's not in the tree.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'parisc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc build fix from Helge Deller:
"This unbreaks the kernel compilation on parisc with gcc-4.9"

* 'parisc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix out-of-register compiler error in ldcw inline assembler function

net: phy: micrel: use generic config_init for KSZ8021/KSZ8031

Use generic config_init callback also for KSZ8021 and KSZ8031.

This has been avoided this far due to commit b838b4aced99 ("phy/micrel:
KSZ8031RNL RMII clock reconfiguration bug"), which claims that the PHY
becomes unresponsive if the broadcast-disable flag is set before
configuring the clock mode.

Turns out that the problem seemingly worked-around by the above
mentioned commit was really due to a hardware-configuration issue, where
the PHY was in fact strapped to address 3 rather than 0.

Tested-by: Bruno Thomsen <bth@kamstrup.dk>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding

When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:

[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360

These are reliably reproduced in a network topology of:

container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch

When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet.  Note that the warning is
associated with the container eth0.

        The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.

The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.

This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

parisc: fix out-of-register compiler error in ldcw inline assembler function

The __ldcw macro has a problem when its argument needs to be reloaded from
memory. The output memory operand and the input register operand both need to
be reloaded using a register in class R1_REGS when generating 64-bit code.
This fails because there's only a single register in the class. Instead, use a
memory clobber. This also makes the __ldcw macro a compiler memory barrier.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Helge Deller <deller@gmx.de>

ALSA: hda_intel: apply the Seperate stream_tag for Skylake

The total stream number of Skylake's input and output stream
exceeds 15, which will cause some streams do not work because
of the overflow on SDxCTL.STRM field if using the legacy
stream tag allocation method.

This patch uses the new stream tag allocation method by add
the flag AZX_DCAPS_SEPARATE_STREAM_TAG for Skylake platform.

Signed-off-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda_controller: Separate stream_tag for input and output streams.

Implemented separate stream_tag assignment for input and output streams.
According to hda specification stream tag must be unique throughout the
input streams group, however an output stream might use a stream tag
which is already in use by an input stream. This change is necessary
to support HW which provides a total of more than 15 stream DMA engines
which with legacy implementation causes an overflow on SDxCTL.STRM
field (and the whole SDxCTL register) and as a result usage of
Reserved value 0 in the SDxCTL.STRM field which confuses HDA controller.

Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com>
Signed-off-by: Jayachandran B <jayachandran.b@intel.com>
Signed-off-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Xmas fixes pull:

  core:
      one atomic fix, revert the WARN_ON dumb buffers patch.

  agp:
      fixup Dave J.

  nouveau:
      fix 3.18 regression for old userspace

  tegra fixes:
      vblank and iommu fixes

  amdkfd:
      fix bugs shown by testing with userspace, init apertures once

  msm:
      hdmi fixes and cleanup

  i915:
      misc fixes

  There is also a link ordering fix that I've asked to be cc'ed to you,
  putting iommu before gpu, it fixes an issue with amdkfd when things
  are all in the kernel, but I didn't like sending it via my tree
  without discussion.

  I'll probably be a bit on/off for a few weeks with pulls now, due to
  holidays and LCA, so don't be surprised if stuff gets a bit backed up,
  and things end up a bit large due to lag"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
  Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2"
  agp: Fix up email address & attributions in AGP MODULE_AUTHOR tags
  nouveau: bring back legacy mmap handler
  drm/msm/hdmi: rework HDMI IRQ handler
  drm/msm/hdmi: enable regulators before clocks to avoid warnings
  drm/msm/mdp5: update irqs on crtc<->encoder link change
  drm/msm: block incoming update on pending updates
  drm/atomic: fix potential null ptr on plane enable
  drm/msm: Deletion of unnecessary checks before the function call "release_firmware"
  drm/msm: Deletion of unnecessary checks before two function calls
  drm/tegra: dc: Select root window for event dispatch
  drm/tegra: gem: Use the proper size for GEM objects
  drm/tegra: gem: Flush buffer objects upon allocation
  drm/tegra: dc: Fix a potential race on page-flip completion
  drm/tegra: dc: Consistently use the same pipe
  drm/irq: Add drm_crtc_vblank_count()
  drm/irq: Add drm_crtc_handle_vblank()
  drm/irq: Add drm_crtc_send_vblank_event()
  drm/i915: Disable PSMI sleep messages on all rings around context switches
  drm/i915: Force the CS stall for invalidate flushes
  ...