git.karo-electronics.de Git - linux-beck.git/log

veth: Don't segment multiple tagged packets on veth device

Veth devices don't need to segment multiple tagged packets.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: Don't segment multiple tagged packets on macvlan device

Macvlan/macvtap devices don't need to segment multiple tagged packets
since the lower devices can segment them.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: add gro capability

Straightforward patch to add GRO processing to virtio_net.

napi_complete_done() usage allows more aggressive aggregation,
opted-in by setting /sys/class/net/xxx/gro_flush_timeout

Tested:

Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec,
Rick Jones reported following results.

One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs
and Intel 82599ES-based NICs. So, two "before" and two "after" VMs.
The OpenStack compute nodes were running OpenStack Kilo, with VxLAN
encapsulation being used through OVS so no GRO coming-up the host
stack.  The compute nodes themselves were running a 3.14-based kernel.

Single-stream netperf, CPU utilizations and thus service demands are
based on intra-guest reported CPU.

Throughput Mbit/s, bigger is better
        Min     Median  Average Max
4.2.0-rc3+      1364    1686    1678    1938
4.2.0-rc3+flush1k       1824    2269    2275    2647

Send Service Demand, smaller is better
        Min     Median  Average Max
4.2.0-rc3+      0.236   0.558   0.524   0.802
4.2.0-rc3+flush1k       0.176   0.503   0.471   0.738

Receive Service Demand, smaller is better.
        Min     Median  Average Max
4.2.0-rc3+      1.906   2.188   2.191   2.531
4.2.0-rc3+flush1k       0.448   0.529   0.533   0.692

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Rick Jones <rick.jones2@hp.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: linearize skb in case frags would not fit into tx descriptor

Suggested-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: enable support for scattered packets

rocker supports the transmission of scattered packets, so let the kernel
know about it by setting the NETIF_F_SG bit in the device's features.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filters

Add skb->hash to the __sk_buff offset map, so it can be accessed from
an eBPF program. We currently already do this for classic BPF filters,
but not yet on eBPF, it might be useful as a demuxer in combination with
helpers like bpf_clone_redirect(), toy example:

  __section("cls-lb") int ingress_main(struct __sk_buff *skb)
  {
    unsigned int which = 3 + (skb->hash & 7);
    /* bpf_skb_store_bytes(skb, ...); */
    /* bpf_l{3,4}_csum_replace(skb, ...); */
    bpf_clone_redirect(skb, which, 0);
    return -1;
  }

I was thinking whether to add skb_get_hash(), but then concluded the
raw skb->hash seems fine in this case: we can directly access the hash
w/o extra eBPF helper function call, it's filled out by many NICs on
ingress, and in case the entropy level would not be sufficient, people
can still implement their own specific sw fallback hash mix anyway.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Correct logic for pvid configuration.

Commit 05cc5a39ddb7 ("bnx2x: add vlan filtering offload") has introduced
an incorrect logic for checking whether pvid should be configured for
a vf, causing the hypervisor driver to send unneeded ramrods for all of
the vfs each time a pvid has changed.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.c

All four conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Must teardown SR-IOV before unregistering netdev in igb driver, from
    Alex Williamson.

2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.

3) Default route selection in ipv4 should take the prefix length, table
    ID, and TOS into account, from Julian Anastasov.

4) sch_plug must have a reset method in order to purge all buffered
    packets when the qdisc is reset, likewise for sch_choke, from WANG
    Cong.

5) Fix deadlock and races in slave_changelink/br_setport in bridging.
    From Nikolay Aleksandrov.

6) mlx4 bug fixes (wrong index in port even propagation to VFs,
    overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
    Morgenstein, and Or Gerlitz.

7) Turn off klog message about SCTP userspace interface compat that
    makes no sense at all, from Daniel Borkmann.

8) Fix unbounded restarts of inet frag eviction process, causing NMI
    watchdog soft lockup messages, from Florian Westphal.

9) Suspend/resume fixes for r8152 from Hayes Wang.

10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
    Sabrina Dubroca.

11) Fix performance regression when removing a lot of routes from the
    ipv4 routing tables, from Alexander Duyck.

12) Fix device leak in AF_PACKET, from Lars Westerhoff.

13) AF_PACKET also has a header length comparison bug due to signedness,
    from Alexander Drozdov.

14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.

15) Memory leaks, TSO stats, watchdog timeout and other fixes to
    thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.

16) act_bpf can leak memory when replacing programs, from Daniel
    Borkmann.

17) WOL packet fixes in gianfar driver, from Claudiu Manoil.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  stmmac: fix missing MODULE_LICENSE in stmmac_platform
  gianfar: Enable device wakeup when appropriate
  gianfar: Fix suspend/resume for wol magic packet
  gianfar: Fix warning when CONFIG_PM off
  act_pedit: check binding before calling tcf_hash_release()
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: sched: fix refcount imbalance in actions
  r8152: reset device when tx timeout
  r8152: add pre_reset and post_reset
  qlcnic: Fix corruption while copying
  act_bpf: fix memory leaks when replacing bpf programs
  net: thunderx: Fix for crash while BGX teardown
  net: thunderx: Add PCI driver shutdown routine
  net: thunderx: Fix crash when changing rss with mutliple traffic flows
  net: thunderx: Set watchdog timeout value
  net: thunderx: Wakeup TXQ only if CQE_TX are processed
  net: thunderx: Suppress alloc_pages() failure warnings
  net: thunderx: Fix TSO packet statistic
  net: thunderx: Fix memory leak when changing queue count
  net: thunderx: Fix RQ_DROP miscalculation
  ...

Merge branch 'ipv6-auto-flow-labels'

Tom Herbert says:

====================
ipv6: Turn on auto IPv6 flow labels by default

BSD (MacOS) has already turned on flow labels by default and this does
not seem to be causing any problems in the Internet. Let's go ahead
and turn them on by default. We'll continue to monitor for any devices
start choking on them.

Flow labels are important since they are the desired solution for
network devices to perform ECMP and RSS (RFC6437 and RFC6438).
Traditionally, devices perform a 5-tuple hash on packets that
includes port numbers. For the most part, these devices can only
compute 5-tuple hashes for TCP and UDP. This severely limits our ability
to get good network load balancing for other protocols (IPIP, GRE,ESP,
etc.), and hence we are limited in using other protocols. Unfortunately,
this method is accepted as the de facto standard to the extent that
there are several proposals to encapsulate protocols in UDP _just_ for
the purposes for getting ECMP to work. With hosts generating flow labels
and devices taking them as input into ECMP (several already do), we can
start to fix this fundamental problem.

This patch set:
- Changes IPV6_FLOWINFO sockopt to be opt-out of flow labels for
   connections rather than opt-in
- Disable flow label state ranges sysctl by default
- Enable auto flow labels sysctl by default

v2:
  - Added functions to create an skb->hash based on flowi4 and flowi6.
    These are called in output path when creating a packet
  - Call skb_get_hash_flowi6 in ip6_make_flowlabel
  - Implement the auto_flowlabels sysctl as a mode for auto flowlabels.
    There are four modes which correspond to flow labels being enabled
    and whether socket option can be used to opt in or opt out of
    using them
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Enable auto flow labels by default

Initialize auto_flowlabels to one. This enables automatic flow labels,
individual socket may disable them using the IPV6_AUTOFLOWLABEL socket
option.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Disable flowlabel state ranges by default

Per RFC6437 stateful flow labels (e.g. labels set by flow label manager)
cannot "disturb" nodes taking part in stateless flow labels. While the
ranges only reduce the flow label entropy by one bit, it is conceivable
that this might bias the algorithm on some routers causing a load
imbalance. For best results on the Internet we really need the full
20 bits.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Implement different admin modes for automatic flow labels

Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel

We can't call skb_get_hash here since the packet is not complete to do
flow_dissector. Create hash based on flowi6 instead.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add functions to get skb->hash based on flow structures

Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff
hash from flowi6 and flowi4 structures respectively. These functions
can be called when creating a packet in the output path where the new
sk_buff does not yet contain a fully formed packet that is parsable by
flow dissector.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"Filipe fixed up a hard to trigger ENOSPC regression from our merge
  window pull, and we have a few other smaller fixes"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix quick exhaustion of the system array in the superblock
  btrfs: its btrfs_err() instead of btrfs_error()
  btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
  btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()

Merge tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This became a relative big update as it includes the collected ASoC
  fixes.  There are a few fixes in ASoC core side, mostly for DAPM and
  the new topology API.  The rest are various ASoC driver-specific
  fixes, as well as the usual HD-audio and USB-audio quirks"

* tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
  ALSA: hda - Fix MacBook Pro 5,2 quirk
  ALSA: hda - Fix race between PM ops and HDA init/probe
  ALSA: usb-audio: add dB range mapping for some devices
  ALSA: hda - Apply a fixup to Dell Vostro 5480
  ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
  ALSA: hda - Apply fixup for another Toshiba Satellite S50D
  ALSA: fireworks: add support for AudioFire2 quirk
  ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
  ALSA: hda - fix cs4210_spdif_automute()
  ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
  ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
  ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
  ASoC: dapm: Don't add prefix to widget stream name
  ASoC: rt5645: Check if codec is initialized in workqueue handler
  ASoC: Intel: Get correct usage_count value to load firmware
  ASoC: topology: Fix to add dapm mixer info
  ASoC: zx: spdif: Fix devm_ioremap_resource return value check
  ASoC: zx: i2s: Fix devm_ioremap_resource return value check
  ASoC: mediatek: Use platform_of_node for machine drivers
  ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
  ...

Merge branch 'dsa-netconsole'

Florian Fainelli says:

====================
net: GENET, SYSTEMPORT and DSA netconsole

This patch series adds support for netconsole in the GENET, SYSTEMPORT and DSA
drivers.

A small refactoring to the DSA transmit path is required to avoid duplicating
the dsa_netpoll_send_skb() into each and every tagging protocol supported.

Testing on e.g: mv643xx_eth and/or e1000e would be much appreciated!

Changes in v2:

- properly disable/enable interrupts in GENET and SYSTEMPORT

- pass the reallocated SKB back to dsa_slave_xmit() in case a tag protocol had to
alter the original SKB
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add netconsole support

Add support for using DSA slave network devices with netconsole, which
requires us to allocate and free custom netpoll instances and invoke the
parent network device poll controller callback.

In order for netconsole to work, we need to construct the DSA tag, but
not queue the skb for transmission on the master network device xmit
function.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Refactor transmit path to eliminate duplication

All tagging protocols do the same thing: increment device statistics,
make room for the tag to be inserted, create the tag, invoke the parent
network device transmit function.

In order to prepare for adding netpoll support, which requires the tag
creation, but not using the parent network device transmit function, do
some little refactoring which eliminates duplication between the 4
tagging protocols supported.

We need to return a sk_buff pointer back to the caller because the tag
specific transmit function may have to reallocate the original skb (e.g:
tag_trailer.c) and this is the one we should be transmitting, not the
original sk_buff we were passed.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: Add netconsole support

Implement a poll controller for netconsole which invokes the RX
interrupt handler to poll for incoming packets, and cleans up all TX
queues by invoking the TX interrupt handler.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Add netconsole support

Implement a poll controller for netconsole which invokes both of our
interrupt handlers for the different RX/TX queues.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: fix missing MODULE_LICENSE in stmmac_platform

Commit 50649ab14982 ("stmmac: drop driver from stmmac platform code")
was a bit overzealous in removing code and dropped the MODULE_*
macro's that are still needed since stmmac_platform can be a module.
Fix this by putting the macro's remvoed in 50649ab14982 back.

This fixes the following errors when used as a module:
  stmmac_platform: module license 'unspecified' taints kernel.
  Disabling lock debugging due to kernel taint
  stmmac_platform: Unknown symbol devm_kmalloc (err 0)
  stmmac_platform: Unknown symbol stmmac_suspend (err 0)
  stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
  stmmac_platform: Unknown symbol platform_get_resource (err 0)
  stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
  stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
  stmmac_platform: Unknown symbol of_alias_get_id (err 0)
  stmmac_platform: Unknown symbol stmmac_resume (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)

Fixes: 50649ab14982 ("stmmac: drop driver from stmmac platform code")
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gianfar-wol-fixes'

Claudiu Manoil says:

====================
gianfar: wol magic packet fixes

These changes were already validated as part of FSL SDK.
Patch 2 fixes occasional wake-on magic packet failures during
traffic, probably due to incorrect traffic stop/ device halt
sequence and incorrect usage of txlock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Enable device wakeup when appropriate

The wol_en flag is 0 by default anyway, and we have the
following inconsistency: a MAGIC packet wol capable eth
interface is registered as a wake-up source but unable
to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
Calling set_wakeup_enable() at netdev open is just redundant
because wol_en is 0 by default.
Let only ethtool call set_wakeup_enable() for now.

The bflock is obviously obsoleted, its utility has been corroded
over time. The bitfield flags used today in gianfar are accessed
only on the init/ config path, with no real possibility of
concurrency - nothing that would justify smth. like bflock.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix suspend/resume for wol magic packet

If we disable NAPI in the first place we can mask the device's
interrupts (and halt it) without fearing that imask may be
concurrently accessed from interrupt context, so there's
no need to do local_irq_save() around gfar_halt_nodisable().
lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
buggy routines.  The txlock is currently used in the driver only
to manage TX congestion, it has nothing to do with halting the
device.  With these changes, the TX processing is stopped before
gfar_halt().

Compact gfar_halt() is used instead of gfar_halt_nodisable(),
as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
gfar_start() re-enables all these blocks on resume.  Enabling
the magic-packet mode remains the same, note that the RX block
is re-enabled just before entering sleep mode.

Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
that the interrupt line must remain active during sleep in order
to wake the system by magic packet (MAG) reception interrupt.
(On some systems the MAG interrupt did trigger w/o this flag
as well, but on others it didn't.)

Without these fixes, when suspended during fair Tx traffic the
interface occasionally failed to be woken up by magic packet.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix warning when CONFIG_PM off

CC      drivers/net/ethernet/freescale/gianfar.o
drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
defined but not used [-Wunused-function]
static void lock_tx_qs(struct gfar_private *priv)
             ^
drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
defined but not used [-Wunused-function]
static void unlock_tx_qs(struct gfar_private *priv)
             ^

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add tlb_dynamic_lb netlink support

tlb_dynamic_lb could be set only via sysfs, this patch allows it to be
set via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2015-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
Major changes:

mwifiex:

* add TX DATA Pause support
* add multichannel and TDLS channel switch support

ath10k:

* enable VHT for IBSS
* initial work to support qca99x0 and the corresponding 10.4 firmware branch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: MC allocations must be restored following an entity reset

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: allow ethtool selftest and MC reboot to complete on an unprivileged function

The policy in the net driver is to attempt MCDI commands and
then handle any EPERM error codes appropriately when returned
by unprivileged functions.
The ethtool selftest contains some tests which are useful on
an unprivileged function, such as the event queue interrupt
tests, but other tests cannot be performed as the function
does not have the required permissions.

If a test returns -EPERM, act as though the test was not run
and continue.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add driver for aquantia phy

This patch added driver to support Aquantia PHYs AQ1202, AQ2104, AQR105,
AQR405, which accessed through clause 45.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

br2684: Remove unnecessary formatting macros b1 and bs

Use vsprintf extension %pI4 instead.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: disable the capability of zero length

The UEFI driver would enable zero length, and the Linux driver doesn't
need it. Zero length let the hw complete the transfer with length 0,
when there is no received packet. It would add the load of USB host
controller and reduce the performance.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: expose COLLECT_METADATA flag to user space

Two vxlan driver flags FLOWBASED and COLLECT_METADATA need to be set to
make use of its new flow mode. The former already exposed. Expose the latter.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_pedit: check binding before calling tcf_hash_release()

When we share an action within a filter, the bind refcnt
should increase, therefore we should not call tcf_hash_release().

Fixes: 1a29321ed045 ("net_sched: act: Dont increment refcnt on replace")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mpls-build-fix'

Roopa Prabhu says:

====================
af_mpls: fix undefined reference to ip6_route_output with CONFIG_IPV6=n

This patch series uses ipv6_stub_impl.ipv6_dst_lookup instead of
ip6_route_output. Follows the vxlan drivers usage of
ipv6_stub_impl.ipv6_dst_lookup.

There is no sk in the af_mpls context from where
ipv6_stub_impl.ipv6_dst_lookup is used. sk appears to be needed
to get the namespace 'net' and is optional otherwise. This patch series
changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains
optional.

v1 - v2: use IS_BUILTIN

v2 - v3: Use new Kconfig option that depends on (IPV6 || IPV6=n) as
suggested by Dave. Also uses IS_ERR as suggested by Thomas.

v3 - v4: Include missed case of (MPLS_ROUTING=y && IPV6=m) reported by
Dave.

v4 - v5: Use ipv6_stub_impl.ipv6_dst_lookup as suggested by Hannes

v5 - v6: protect against null ipv6_stub by statically declaring
a ipv6_dst_lookup NOP func
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

af_mpls: fix undefined reference to ip6_route_output

Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.

This patch uses ipv6_stub_impl.ipv6_dst_lookup instead of
ip6_route_output. And wraps affected code under
IS_ENABLED(CONFIG_INET) and IS_ENABLED(CONFIG_IPV6).

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument

This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
for use cases where sk is not available (like mpls).
sk appears to be needed to get the namespace 'net' and is optional
otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
to take net argument. sk remains optional.

All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
to pass net. I have modified them to use already available
'net' in the scope of the call. I can change them to
sock_net(sk) to avoid any unintended change in behaviour if sock
namespace is different. They dont seem to be from code inspection.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add helpers to access tunnel metadata

Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata:
bpf_skb_[gs]et_tunnel_key(skb, key, size, flags)
skb: pointer to skb
key: pointer to 'struct bpf_tunnel_key'
size: size of 'struct bpf_tunnel_key'
flags: room for future extensions

First eBPF program that uses these helpers will allocate per_cpu
metadata_dst structures that will be used on TX.
On RX metadata_dst is allocated by tunnel driver.

Typical usage for TX:
struct bpf_tunnel_key tkey;
... populate tkey ...
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);

RX:
struct bpf_tunnel_key tkey = {};
bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0);
... lookup or redirect based on tkey ...

'struct bpf_tunnel_key' will be extended in the future by adding
elements to the end and the 'size' argument will indicate which fields
are populated, thereby keeping backwards compatibility.
The 'flags' argument may be used as well when the 'size' is not enough or
to indicate completely different layout of bpf_tunnel_key.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver

Repost patch of driver for LAN7800 family of USB 2.0 & USB 3.0 to Gigabit Ethernet.
- remove module param which can be configurable by standard mechanism.
- remove other module parms except msg_level per review comment.
- update to handle byte swap for statistics structure correctly.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf_jit_disasm: also support reading jit dump from file

This patch adds support to read the dmesg BPF JIT dump also from a
file instead of the klog buffer. I found this quite useful when going
through some 'before/after patch' logs. It also fixes a regex leak
found by valgrind when no image dump was found.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"These fixes are all for the AMD IOMMU driver:

   - A regression with HSA caused by the conversion of the driver to
     default domains.  The fixes make sure that an HSA device can still
     be attached to an IOMMUv2 domain and that these domains also allow
     non-IOMMUv2 capable devices.

   - Fix iommu=pt mode which did not work because the dma_ops where set
     to nommu_ops, which breaks devices that can only do 32bit DMA.

   - Fix an issue with non-PCI devices not working, because there are no
     dma_ops for them.  This issue was discovered recently as new AMD
     x86 platforms have non-PCI devices too"

* tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Allow non-ATS devices in IOMMUv2 domains
  iommu/amd: Set global dma_ops if swiotlb is disabled
  iommu/amd: Use swiotlb in passthrough mode
  iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
  iommu/amd: Use iommu core for passthrough mode
  iommu/amd: Use iommu_attach_group()

Merge tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel

Pull drm intel fixes from Daniel Vetter:
"I delayed my -fixes pull a bit hoping that I could include a fix for
  the dp mst stuff but looks a bit more nasty than that.  So just 3
  other regression fixes, one 4.2 other two cc: stable"

* tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Declare the swizzling unknown for L-shaped configurations
  drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
  drm/i915: Replace WARN inside I915_READ64_2x32 with retry loop

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This has a bunch of nouveau fixes, as Ben has been hibernating and has
  lots of small fixes for lots of bugs across nouveau.

  Radeon has one major fix for hdmi/dp audio regression that is larger
  than Alex would like, but seems to fix up a fair few bugs, along with
  some misc fixes.

  And a few msm fixes, one of which is also a bit large.

  But nothing in here seems insane or crazy for this stage, just more
  than I'd like"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
  drm/msm/mdp5: release SMB (shared memory blocks) in various cases
  drm/msm: change to uninterruptible wait in atomic commit
  drm/msm: mdp4: Fix drm_framebuffer dereference crash
  drm/msm: fix msm_gem_prime_get_sg_table()
  drm/amdgpu: add new parameter to seperate map and unmap
  drm/amdgpu: hdp_flush is not needed for inside IB
  drm/amdgpu: different emit_ib for gfx and compute
  drm/amdgpu: information leak in amdgpu_info_ioctl()
  drm/amdgpu: clean up init sequence for failures
  drm/radeon/combios: add some validation of lvds values
  drm/radeon: rework audio modeset to handle non-audio hdmi features
  drm/radeon: rework audio detect (v4)
  drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
  drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
  drm/nouveau/nouveau/ttm: fix tiled system memory with Maxwell
  drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
  drm/nouveau/fbcon/g80: reduce PUSH_SPACE alloc, fire ring on accel init
  drm/nouveau/fbcon/gf100-: reduce RING_SPACE allocation
  drm/nouveau/fbcon/nv11-: correctly account for ring space usage
  drm/nouveau/bios: add proper support for opcode 0x59
  ...

iommu/amd: Allow non-ATS devices in IOMMUv2 domains

With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.

Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>

Revert "ath9k: export HW random number generator"

This reverts commit 6301566e0b2dafa7d6779598621bca867962a0a2. Oleksij Rempel
noticed that the randomness doesn't look to be good enough and Stephan Mueller
commented:

"I would say that the discussed RNG does not seem fit for hooking it up with the
hwrandom framework."

http://lkml.kernel.org/g/3945775.m5HblJPgiO@tauon.atsec.com

So let's the revert the patch until we are sure that we can trust this random
generator.

Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Merge tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
"There are a couple of recently found, long standing remote attribute
  corruption fixes caused by log recovery getting confused after a
  crash, and the new DAX code in XFS (merged in 4.2-rc1) needs to
  actually use the DAX fault path on read faults.

  Summary:

   - remote attribute log recovery corruption fixes

   - DAX page faults need to use direct mappings, not a page cache
     mapping"

* tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: remote attributes need to be considered data
  xfs: remote attribute headers contain an invalid LSN
  xfs: call dax_fault on read page faults for DAX

Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: separate link aggregation from link layer

We continue the work on separating the roles of the link aggregation and
link layers, as well as making code cleanups in general.

This second commit batch focuses on moving the orchestration of link
failover and synchronization to the node level, as well as preparing the
node lock structure for further future impovements. We also make some
changes to message delivery between link and socket layer, in order to
make this mechanism safer and less obscure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: clean up link creation

We simplify the link creation function tipc_link_create() and the way
the link struct it is connected to the node struct. In particular, we
remove the duplicate initialization of some fields which are anyway set
in tipc_link_reset().

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: use temporary, non-protected skb queue for bundle reception

Currently, when we extract small messages from a message bundle, or
when many messages have accumulated in the link arrival queue, those
messages are added one by one to the lock protected link input queue.
This may increase contention with the reader of that queue, in
the function tipc_sk_rcv().

This commit introduces a temporary, unprotected input queue in
tipc_link_rcv() for such cases. Only when the arrival queue has been
emptied, and the function is ready to return, does it splice the whole
temporary queue into the real input queue.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: remove implicit message delivery in node_unlock()

After the most recent changes, all access calls to a link which
may entail addition of messages to the link's input queue are
postpended by an explicit call to tipc_sk_rcv(), using a reference
to the correct queue.

This means that the potentially hazardous implicit delivery, using
tipc_node_unlock() in combination with a binary flag and a cached
queue pointer, now has become redundant.

This commit removes this implicit delivery mechanism both for regular
data messages and for binding table update messages.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: make resetting of links non-atomic

In order to facilitate future improvements to the locking structure, we
want to make resetting and establishing of links non-atomic. I.e., the
functions tipc_node_link_up() and tipc_node_link_down() should be called
from outside the node lock context, and grab/release the node lock
themselves. This requires that we can freeze the link state from the
moment it is set to RESETTING or PEER_RESET in one lock context until
it is set to RESET or ESTABLISHING in a later context. The recently
introduced link FSM makes this possible, so we are now ready to introduce
the above change.

This commit implements this.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move received discovery data evaluation inside node.c

The node lock is currently grabbed and and released in the function
tipc_disc_rcv() in the file discover.c. As a preparation for the next
commits, we need to move this node lock handling, along with the code
area it is covering, to node.c.

This commit introduces this change.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: merge link->exec_mode and link->state into one FSM

Until now, we have been handling link failover and synchronization
by using an additional link state variable, "exec_mode". This variable
is not independent of the link FSM state, something causing a risk of
inconsistencies, apart from the fact that it clutters the code.

The conditions are now in place to define a new link FSM that covers
all existing use cases, including failover and synchronization, and
eliminate the "exec_mode" field altogether. The FSM must also support
non-atomic resetting of links, which will be introduced later.

The new link FSM is shown below, with 7 states and 8 events.
Only events leading to state change are shown as edges.

+------------------------------------+
|RESET_EVT                           |
|                                    |
|                             +--------------+
|           +-----------------|   SYNCHING   |-----------------+
|           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
|           |                  A            |                  |
|           |                  |            |                  |
|           |                  |            |                  |
|           |                  |SYNCH_      |SYNCH_            |
|           |                  |BEGIN_EVT   |END_EVT           |
|           |                  |            |                  |
|           V                  |            V                  V
|    +-------------+          +--------------+          +------------+
|    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
|    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
|           |        EVT        |    A         RESET_EVT       |
|           |                   |    |                         |
|           |                   |    |                         |
|           |    +--------------+    |                         |
|  RESET_EVT|    |RESET_EVT          |ESTABLISH_EVT            |
|           |    |                   |                         |
|           |    |                   |                         |
|           V    V                   |                         |
|    +-------------+          +--------------+        RESET_EVT|
+--->|    RESET    |--------->| ESTABLISHING |<----------------+
     +-------------+ PEER_    +--------------+
      |           A  RESET_EVT       |
      |           |                  |
      |           |                  |
      |FAILOVER_  |FAILOVER_         |FAILOVER_
      |BEGIN_EVT  |END_EVT           |BEGIN_EVT
      |           |                  |
      V           |                  |
     +-------------+                 |
     | FAILINGOVER |<----------------+
     +-------------+

These changes are fully backwards compatible.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move protocol message sending away from link FSM

The implementation of the link FSM currently takes decisions about and
sends out link protocol messages. This is unnecessary, since such
actions are not the result of any link state change, and are even
decided based on non-FSM state information ("silent_intv_cnt").

We now move the sending of unicast link protocol messages to the
function tipc_link_timeout(), and the initial broadcast synchronization
message to tipc_node_link_up(). The latter is done because a link
instance should not need to know whether it is the first or second
link to a destination. Such information is now restricted to and
handled by the link aggregation layer in node.c

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move link synch and failover to link aggregation level

Link failover and synchronization have until now been handled by the
links themselves, forcing them to have knowledge about and to access
parallel links in order to make the two algorithms work correctly.

In this commit, we move the control part of this functionality to the
link aggregation level in node.c, which is the right location for this.
As a result, the two algorithms become easier to follow, and the link
implementation becomes simpler.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: extend node FSM

In the next commit, we will move link synch/failover orchestration to
the link aggregation level. In order to do this, we first need to extend
the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER,
plus four new events to enter and leave those states.

This commit introduces this change, without yet making use of it.
The node FSM now looks as follows:

                           +-----------------------------------------+
                           |                            PEER_DOWN_EVT|
                           |                                         |
  +------------------------+----------------+                        |
  |SELF_DOWN_EVT           |                |                        |
  |                        |                |                        |
  |              +-----------+          +-----------+                |
  |              |NODE_      |          |NODE_      |                |
  |   +----------|FAILINGOVER|<---------|SYNCHING   |------------+   |
  |   |SELF_     +-----------+ FAILOVER_+-----------+    PEER_   |   |
  |   |DOWN_EVT   |         A  BEGIN_EVT A         |     DOWN_EVT|   |
  |   |           |         |            |         |             |   |
  |   |           |         |            |         |             |   |
  |   |           |FAILOVER_|FAILOVER_   |SYNCH_   |SYNCH_       |   |
  |   |           |END_EVT  |BEGIN_EVT   |BEGIN_EVT|END_EVT      |   |
  |   |           |         |            |         |             |   |
  |   |           |         |            |         |             |   |
  |   |           |        +--------------+        |             |   |
  |   |           +------->|   SELF_UP_   |<-------+             |   |
  |   |   +----------------|   PEER_UP    |------------------+   |   |
  |   |   |SELF_DOWN_EVT   +--------------+     PEER_DOWN_EVT|   |   |
  |   |   |                   A          A                   |   |   |
  |   |   |                   |          |                   |   |   |
  |   |   |        PEER_UP_EVT|          |SELF_UP_EVT        |   |   |
  |   |   |                   |          |                   |   |   |
  V   V   V                   |          |                   V   V   V
+------------+       +-----------+    +-----------+       +------------+
|SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
|PEER_LEAVING|<------|PEER_COMING|    |SELF_COMING|------>|SELF_LEAVING|
+------------+ SELF_ +-----------+    +-----------+ PEER_ +------------+
       |       DOWN_EVT       A          A          DOWN_EVT     |
       |                      |          |                       |
       |                      |          |                       |
       |           SELF_UP_EVT|          |PEER_UP_EVT            |
       |                      |          |                       |
       |                      |          |                       |
       |PEER_DOWN_EVT       +--------------+        SELF_DOWN_EVT|
       +------------------->|  SELF_DOWN_  |<--------------------+
                            |  PEER_DOWN   |
                            +--------------+

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: reverse call order for link_reset()->node_link_down()

In many cases the call order when a link is reset goes as follows:
tipc_node_xx()->tipc_link_reset()->tipc_node_link_down()

This is not the right order if we want the node to be in control,
so in this commit we change the order to:
tipc_node_xx()->tipc_node_link_down()->tipc_link_reset()

The fact that tipc_link_reset() now is called from only one
location with a well-defined state will also facilitate later
simplifications of tipc_link_reset() and the link FSM.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move all link_reset() calls to link aggregation level

In line with our effort to let the node level have full control over
its links, we want to move all link reset calls from link.c to node.c.
Some of the calls can be moved by simply moving the calling function,
when this is the right thing to do. For the remaining calls we use
the now established technique of returning a TIPC_LINK_DOWN_EVT
flag from tipc_link_rcv(), whereafter we perform the reset call when
the call returns.

This change serves as a preparation for the coming commits.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate function tipc_link_activate()

The function tipc_link_activate() is redundant, since it mostly performs
settings that have already been done in a preceding tipc_link_reset().

There are three exceptions to this:
- The actual state change to TIPC_LINK_WORKING. This should anyway be done
  in the FSM, and not in a separate function.
- Registration of the link with the bearer. This should be done by the
  node, since we don't want the link to have any knowledge about its
  specific bearer.
- Call to tipc_node_link_up() for user access registration. With the new
  role distribution between link aggregation and link level this becomes
  the wrong call order; tipc_node_link_up() should instead be called
  directly as a result of a TIPC_LINK_UP event, hence by the node itself.

This commit implements those changes.

Tested-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-07-30

Here's a set of Bluetooth & 802.15.4 patches intended for the 4.3 kernel.

- Cleanups & fixes to mac802154
- Refactoring of Intel Bluetooth HCI driver
- Various coding style fixes to Bluetooth HCI drivers
- Support for Intel Lightning Peak Bluetooth devices
- Generic class code in interface descriptor in btusb to match more HW
- Refactoring of Bluetooth HS code together with a new config option
- Support for BCM4330B1 Broadcom UART controller

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket

The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
netns of kernel sockets.")
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: add sysctl option accept_ra_min_hop_limit

Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4.  Processing Received Router Advertisements
   A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
   and Retrans Timer) may contain a value denoting that it is
   unspecified.  In such cases, the parameter should be ignored and the
   host should continue using whatever value it is already using.

   If the received Cur Hop Limit value is non-zero, the host SHOULD set
   its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: fix refcount imbalance in actions

Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.

Test case 1:

  FOO="1,6 0 0 4294967295,"
  BAR="1,6 0 0 4294967294,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
     action bpf bytecode "$FOO"
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 1 bind 1
  tc actions replace action bpf bytecode "$BAR" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 1 ref 2 bind 1
  tc actions replace action bpf bytecode "$FOO" index 1
  tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 3 bind 1

Test case 2:

  FOO="1,6 0 0 4294967295,"
  tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
  tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
     index 1 ref 1 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 2 bind 1
  tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
  tc actions show action gact
    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 3 bind 1

What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:

  1) We do a late binding of an action. In that case, we leave the
     tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
     handler. This is correctly handeled.

  2) We replace the given action, or we try to add one without replacing
     and find out that the action at a specific index already exists
     (thus, we go out with error in that case).

In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.

Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.

One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.

After the patch, above test cases have correct reference count again.

Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: spi_ks8995: clean up ks8995_registers_read/write

The change removes redundant sysfs binary file boundary checks,
since this task is already done on caller side in fs/sysfs/file.c

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152-fixes'

Hayes Wang says:

====================
r8152: device reset

v3:
For patch #2, remove cancel_delayed_work().

v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.

For patch #2, replace the original method with usb_queue_reset_device() to reset
the device.

v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: reset device when tx timeout

The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: add pre_reset and post_reset

Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ALSA: hda - Fix MacBook Pro 5,2 quirk

MacBook Pro 5,2 with ALC889 codec had already a fixup entry, but this
seems not working correctly, a fix for pin NID 0x15 is needed in
addition. It's equivalent with the fixup for MacBook Air 1,1, so use
this instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102131
Reported-and-tested-by: Jeffery Miller <jefferym@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'bpf-next'

Daniel Borkmann says:

====================
Minor BPF updates

Various minor misc updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: also show process name/pid in bpf_jit_dump

It can be useful for testing to see the actual process/pid who is loading
a given filter. I was running some BPF test program and noticed unusual
filter loads from time to time, triggered by some other application in the
background. bpf_jit_disasm is still working after this change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, x86/sparc: show actual number of passes in bpf_jit_dump

When bpf_jit_compile() got split into two functions via commit
f3c2af7ba17a ("net: filter: x86: split bpf_jit_compile()"), bpf_jit_dump()
was changed to always show 0 as number of compiler passes. Change it to
dump the actual number. Also on sparc, we count passes starting from 0,
so add 1 for the debug dump as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: provide helper that indicates eBPF was migrated

During recent discussions we had with Michael, we found that it would
be useful to have an indicator that tells the JIT that an eBPF program
had been migrated from classic instructions into eBPF instructions, as
only in that case A and X need to be cleared in the prologue. Such eBPF
programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC.
Thus, introduce a small helper for cde66c2d88da ("s390/bpf: Only clear
A and X for converted BPF programs") and possibly others in future.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_bpf: assign type to native eBPF test cases

As JITs start to perform optimizations whether to clear A and X on eBPF
programs in the prologue, we should actually assign a program type to the
native eBPF test cases. It doesn't really matter which program type, as
these instructions don't go through the verifier, but it needs to be a
type != BPF_PROG_TYPE_UNSPEC. This reflects eBPF programs loaded via bpf(2)
system call (!= type unspec) vs. classic BPF to eBPF migrations (== type
unspec).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"The main change is support for keyboards and touchpads found in 2015
  editions of Macbooks"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Revert "Input: zforce - don't overwrite the stack"
  Input: bcm5974 - add support for the 2015 Macbook Pro
  HID: apple: Add support for the 2015 Macbook Pro
  Input: bcm5974 - prepare for a new trackpad generation
  Input: synaptics - dump ext10 capabilities as well

bnx2x: Fix compilation when CONFIG_BNX2X_SRIOV is not set

Commit 05cc5a39ddb7 ("bnx2x: add vlan filtering offload") has broken
compilation when CONFIG_BNX2X_SRIOV is not set.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'hwmon-for-linus-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Two patches headed for -stable.

  nct7802: Fix integer overflow seen when writing voltage limits
  nct7904: Rename pwm attributes to match hwmon ABI"

* tag 'hwmon-for-linus-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct7802) Fix integer overflow seen when writing voltage limits
  hwmon: (nct7904) Rename pwm attributes to match hwmon ABI

drm/i915: Declare the swizzling unknown for L-shaped configurations

The old style of memory interleaving swizzled upto the end of the
first even bank of memory, and then used the remainder as unswizzled on
the unpaired bank - i.e. swizzling is not constant for all memory. This
causes problems when we try to migrate memory and so the kernel prevents
migration at all when we detect L-shaped inconsistent swizzling.
However, this issue also extends to userspace who try to manually detile
into memory as the swizzling for an individual page is unknown (it
depends on its physical address only known to the kernel), userspace
cannot correctly swizzle.

Note that this is a new attempt for the previously merged one,
reverted in

commit d82c0ba6e306f079407f07003e53c262d683397b
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jul 14 12:29:27 2015 +0200

Revert "drm/i915: Declare the swizzling unknown for L-shaped configurations"

This is cc: stable since we need it to fix up troubles with wc cpu
mmaps that userspace recently started to use widely.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91105
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[danvet: Add note about previous (failed attempt).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt

If the device does not support the aliasing ppgtt, we must translate
user bind requests (PIN_USER) from LOCAL_BIND to a GLOBAL_BIND. However,
since this is device specific we cannot do this conveniently in the
upper layers and so must manage the vma->bound flags in the backend.

Partial revert of commit 75d04a3773ecee617847de963ae4195d6aa74c28 [4.2-rc1]
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Apr 28 17:56:17 2015 +0300

drm/i915/gtt: Allocate va range only if vma is not bound

Note this was spotted by Daniel originally, but we dropped the ball in
getting the fix in before the bug going wild. Sorry all.

Reported-by: Vincent Legoll vincent.legoll@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91133
References: https://bugs.freedesktop.org/show_bug.cgi?id=90224
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Bluetooth: 6lowpan: Fix possible race

This patch fix a possible race after calling register_netdev. After
calling netdev_register it could be possible that netdev_ops callbacks
use the uninitialized private data of lowpan_dev. By moving the
initialization of this data before netdev_register we can be sure that
initialized private data is be used after netdev_register.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

mac802154: Fix memory corruption with global deferred transmit state.

When transmitting a packet via a mac802154 driver that can sleep in
its transmit function, mac802154 defers the call to the driver's
transmit function to a per-device workqueue.

However, mac802154 uses a single global work_struct for this, which
means that if you have more than one registered mac802154 interface
in the system, and you transmit on more than one of them at the same
time, you'll very easily cause memory corruption.

This patch moves the deferred transmit processing state from global
variables to struct ieee802154_local, and this seems to fix the memory
corruption issue.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

at86rf230: remove hrtimer on 1 usec delay

According Documentation/timers/timers-howto.txt the usually case for
setting up a hrtimer takes > ~10us. So we should use udelay in this
case so we are sure that the state change was done, before doing the
state change assert.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: cmtp: Do not use list_for_each_safe when not needed

There is no need to use the safe version of list_for_each here.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: btusb: match generic class code in interface descriptor

btusb currently has a generic match on USB device descriptors:
        { USB_DEVICE_INFO(0xe0, 0x01, 0x01) },

However, http://www.usb.org/developers/defined_class states:

  Base Class E0h (Wireless Controller)
  This base class is defined for devices that are Wireless controllers.
  Values not shown in the table below are reserved. These class codes are
  to be used in Interface Descriptors, with the exception of the Bluetooth
  class code which can also be used in a Device Descriptor.

Add a match on the interface descriptors accordingly.

This fixes compatibility with the RTL8723AU device shown below.
This device conforms to the USB Interface Association Descriptor
specification, which requires the device to have class ef/02/01.
The extra IAD descriptor then specifies that interfaces 0 and 1
belong to the same function/driver, which is true. Provided that
the Bluetooth device class spec accepts use of the IAD, I imagine that
technically, all btusb devices should be configured like this.

T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0bda ProdID=0724 Rev= 2.00
S:  Manufacturer=Realtek
S:  Product=802.11n WLAN Adapter
S:  SerialNumber=00e04c000001
C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA
A:  FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=01
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:* If#= 2 Alt= 0 #EPs= 4 Cls=ff(vend.) Sub=ff Prot=ff Driver=rtl8723au
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=500us

Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Move create/accept phy link completed callback to amp.c

To avoid amp module hooks from hci_event.c

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Move amp assoc read/write completed callback to amp.c

To avoid amp module hooks from hci_event.c

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Move get info completed callback to a2mp.c

To avoid a2mp module hooks from hci_event.c and send
getinfo response operation only required by a2mp module,
we can move this callback to a2mp.c

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Move high speed specific event under BT_HS option

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Add BT_HS config option

Move A2MP Module under BT_HS config option and allow
the user have flexible option to choose the feature only
they need

a2mp_discover_amp() & a2mp_channel_create() are a2mp module
entry point for master and slave, and this is dynamic
invoked depends on the userspace or remote request, then
we defined their implementation depends on BT_HS config

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: btbcm: Add BCM4330B1 UART device

Add "waiting for configuration" address.
Add lmp_subver and firmware name for BCM4330B1 controller.

Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

iommu/amd: Set global dma_ops if swiotlb is disabled

Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu/amd: Use swiotlb in passthrough mode

In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains

Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu/amd: Use iommu core for passthrough mode

Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu/amd: Use iommu_attach_group()

Since the conversion to default domains the
iommu_attach_device function only works for devices with
their own group. But this isn't always true for current
IOMMUv2 capable devices, so use iommu_attach_group instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>

Merge branch 'mlxsw'

Jiri Pirko says:

====================
Introduce Mellanox Technologies Switch ASICs switchdev drivers

This patchset introduces Mellanox Technologies Switch driver infrastructure
and support for SwitchX-2 ASIC.

The driver is divided into 3 logical parts:
1) Bus - implements switch bus interface. Currently only PCI bus is
   implemented, but more buses will be added in the future. Namely I2C
   and SGMII.
   (patch #2)
2) Driver - implemements of ASIC-specific functions.
   Currently SwitchX-2 ASIC is supported, but a plan exists to introduce
   support for Spectrum ASIC in the near future.
   (patch #4)
3) Core - infrastructure that glues buses and drivers together.
   It implements register access logic (EMADs) and takes care of RX traps
   and events.
   (patch #1 and #3)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Introduce Mellanox SwitchX-2 ASIC support

Benefit from the previously introduced Mellanox Switch infrastructure and
add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
It implements bare minimum for getting device to work on slow-path.
Fast-path offload functionality is going to be added soon.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add interface to access registers and process events

Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
the host and the device in order to configure the available device registers.
Another use case is notifications sent from the device to the host,
letting it know about certain events, such as port up / down.

Add the ability to construct EMADs with provisions to construct and
parse the registers' payloads. Implement EMAD transaction layer
which is responsible for the reliable transmission of EMADs. Also, add
an infrastructure used by the switch driver to register for particular
events generated by the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add PCI bus implementation

Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
includes firmware initialization, async queues manipulation and command
interface implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>