From: Ingo Molnar Date: Wed, 23 Oct 2013 07:51:02 +0000 (+0200) Subject: manual merge of x86/reboot X-Git-Tag: next-20131024~43^2~2 X-Git-Url: https://git.karo-electronics.de/?a=commitdiff_plain;h=a4ed8bf47d4ca36f5d8eef92c804a9ee6962e554;hp=e56e57f6613d5ed5c3127419341d1aa989a11971;p=karo-tx-linux.git manual merge of x86/reboot Signed-off-by: Ingo Molnar --- diff --git a/Documentation/ABI/stable/sysfs-bus-usb b/Documentation/ABI/stable/sysfs-bus-usb index 2be603c52a24..a6b685724740 100644 --- a/Documentation/ABI/stable/sysfs-bus-usb +++ b/Documentation/ABI/stable/sysfs-bus-usb @@ -37,8 +37,8 @@ Description: that the USB device has been connected to the machine. This file is read-only. Users: - PowerTOP - http://www.lesswatts.org/projects/powertop/ + PowerTOP + https://01.org/powertop/ What: /sys/bus/usb/device/.../power/active_duration Date: January 2008 @@ -57,8 +57,8 @@ Description: will give an integer percentage. Note that this does not account for counter wrap. Users: - PowerTOP - http://www.lesswatts.org/projects/powertop/ + PowerTOP + https://01.org/powertop/ What: /sys/bus/usb/devices/-...:-/supports_autosuspend Date: January 2008 diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index 9d43e7670841..efe449bdf811 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -1,6 +1,6 @@ What: /sys/devices/.../power/ Date: January 2009 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power directory contains attributes allowing the user space to check and modify some power @@ -8,7 +8,7 @@ Description: What: /sys/devices/.../power/wakeup Date: January 2009 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power/wakeup attribute allows the user space to check if the device is enabled to wake up the system @@ -34,7 +34,7 @@ Description: What: /sys/devices/.../power/control Date: January 2009 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power/control attribute allows the user space to control the run-time power management of the device. @@ -53,7 +53,7 @@ Description: What: /sys/devices/.../power/async Date: January 2009 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../async attribute allows the user space to enable or diasble the device's suspend and resume callbacks to @@ -79,7 +79,7 @@ Description: What: /sys/devices/.../power/wakeup_count Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_count attribute contains the number of signaled wakeup events associated with the device. This @@ -88,7 +88,7 @@ Description: What: /sys/devices/.../power/wakeup_active_count Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_active_count attribute contains the number of times the processing of wakeup events associated with @@ -98,7 +98,7 @@ Description: What: /sys/devices/.../power/wakeup_abort_count Date: February 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_abort_count attribute contains the number of times the processing of a wakeup event associated with @@ -109,7 +109,7 @@ Description: What: /sys/devices/.../power/wakeup_expire_count Date: February 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_expire_count attribute contains the number of times a wakeup event associated with the device has @@ -119,7 +119,7 @@ Description: What: /sys/devices/.../power/wakeup_active Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_active attribute contains either 1, or 0, depending on whether or not a wakeup event associated with @@ -129,7 +129,7 @@ Description: What: /sys/devices/.../power/wakeup_total_time_ms Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_total_time_ms attribute contains the total time of processing wakeup events associated with the @@ -139,7 +139,7 @@ Description: What: /sys/devices/.../power/wakeup_max_time_ms Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_max_time_ms attribute contains the maximum time of processing a single wakeup event associated @@ -149,7 +149,7 @@ Description: What: /sys/devices/.../power/wakeup_last_time_ms Date: September 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_last_time_ms attribute contains the value of the monotonic clock corresponding to the time of @@ -160,7 +160,7 @@ Description: What: /sys/devices/.../power/wakeup_prevent_sleep_time_ms Date: February 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../wakeup_prevent_sleep_time_ms attribute contains the total time the device has been preventing @@ -189,7 +189,7 @@ Description: What: /sys/devices/.../power/pm_qos_latency_us Date: March 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power/pm_qos_resume_latency_us attribute contains the PM QoS resume latency limit for the given device, @@ -207,7 +207,7 @@ Description: What: /sys/devices/.../power/pm_qos_no_power_off Date: September 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power/pm_qos_no_power_off attribute is used for manipulating the PM QoS "no power off" flag. If @@ -222,7 +222,7 @@ Description: What: /sys/devices/.../power/pm_qos_remote_wakeup Date: September 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/devices/.../power/pm_qos_remote_wakeup attribute is used for manipulating the PM QoS "remote wakeup required" diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index 217772615d02..205a73878441 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -1,6 +1,6 @@ What: /sys/power/ Date: August 2006 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power directory will contain files that will provide a unified interface to the power management @@ -8,7 +8,7 @@ Description: What: /sys/power/state Date: August 2006 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/state file controls the system power state. Reading from this file returns what states are supported, @@ -22,7 +22,7 @@ Description: What: /sys/power/disk Date: September 2006 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/disk file controls the operating mode of the suspend-to-disk mechanism. Reading from this file returns @@ -67,7 +67,7 @@ Description: What: /sys/power/image_size Date: August 2006 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/image_size file controls the size of the image created by the suspend-to-disk mechanism. It can be written a @@ -84,7 +84,7 @@ Description: What: /sys/power/pm_trace Date: August 2006 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/pm_trace file controls the code which saves the last PM event point in the RTC across reboots, so that you can @@ -133,7 +133,7 @@ Description: What: /sys/power/pm_async Date: January 2009 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/pm_async file controls the switch allowing the user space to enable or disable asynchronous suspend and resume @@ -146,7 +146,7 @@ Description: What: /sys/power/wakeup_count Date: July 2010 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/wakeup_count file allows user space to put the system into a sleep state while taking into account the @@ -161,7 +161,7 @@ Description: What: /sys/power/reserved_size Date: May 2011 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/reserved_size file allows user space to control the amount of memory reserved for allocations made by device @@ -175,7 +175,7 @@ Description: What: /sys/power/autosleep Date: April 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/autosleep file can be written one of the strings returned by reads from /sys/power/state. If that happens, a @@ -192,7 +192,7 @@ Description: What: /sys/power/wake_lock Date: February 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/wake_lock file allows user space to create wakeup source objects and activate them on demand (if one of @@ -219,7 +219,7 @@ Description: What: /sys/power/wake_unlock Date: February 2012 -Contact: Rafael J. Wysocki +Contact: Rafael J. Wysocki Description: The /sys/power/wake_unlock file allows user space to deactivate wakeup sources created with the help of /sys/power/wake_lock. diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index fe397f90a34f..6c9d9d37c83a 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -87,7 +87,10 @@ X!Iinclude/linux/kobject.h !Ekernel/printk/printk.c !Ekernel/panic.c !Ekernel/sys.c -!Ekernel/rcupdate.c +!Ekernel/rcu/srcu.c +!Ekernel/rcu/tree.c +!Ekernel/rcu/tree_plugin.h +!Ekernel/rcu/update.c Device Resource Management diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl index d16d21b7a3b7..46347f603353 100644 --- a/Documentation/DocBook/genericirq.tmpl +++ b/Documentation/DocBook/genericirq.tmpl @@ -87,7 +87,7 @@ Rationale - The original implementation of interrupt handling in Linux is using + The original implementation of interrupt handling in Linux uses the __do_IRQ() super-handler, which is able to deal with every type of interrupt logic. @@ -111,19 +111,19 @@ - This split implementation of highlevel IRQ handlers allows us to + This split implementation of high-level IRQ handlers allows us to optimize the flow of the interrupt handling for each specific - interrupt type. This reduces complexity in that particular codepath + interrupt type. This reduces complexity in that particular code path and allows the optimized handling of a given type. The original general IRQ implementation used hw_interrupt_type structures and their ->ack(), ->end() [etc.] callbacks to differentiate the flow control in the super-handler. This leads to - a mix of flow logic and lowlevel hardware logic, and it also leads - to unnecessary code duplication: for example in i386, there is a - ioapic_level_irq and a ioapic_edge_irq irq-type which share many - of the lowlevel details but have different flow handling. + a mix of flow logic and low-level hardware logic, and it also leads + to unnecessary code duplication: for example in i386, there is an + ioapic_level_irq and an ioapic_edge_irq IRQ-type which share many + of the low-level details but have different flow handling. A more natural abstraction is the clean separation of the @@ -132,23 +132,23 @@ Analysing a couple of architecture's IRQ subsystem implementations reveals that most of them can use a generic set of 'irq flow' - methods and only need to add the chip level specific code. + methods and only need to add the chip-level specific code. The separation is also valuable for (sub)architectures - which need specific quirks in the irq flow itself but not in the - chip-details - and thus provides a more transparent IRQ subsystem + which need specific quirks in the IRQ flow itself but not in the + chip details - and thus provides a more transparent IRQ subsystem design. - Each interrupt descriptor is assigned its own highlevel flow + Each interrupt descriptor is assigned its own high-level flow handler, which is normally one of the generic - implementations. (This highlevel flow handler implementation also + implementations. (This high-level flow handler implementation also makes it simple to provide demultiplexing handlers which can be found in embedded platforms on various architectures.) The separation makes the generic interrupt handling layer more flexible and extensible. For example, an (sub)architecture can - use a generic irq-flow implementation for 'level type' interrupts + use a generic IRQ-flow implementation for 'level type' interrupts and add a (sub)architecture specific 'edge type' implementation. @@ -172,9 +172,9 @@ There are three main levels of abstraction in the interrupt code: - Highlevel driver API - Highlevel IRQ flow handlers - Chiplevel hardware encapsulation + High-level driver API + High-level IRQ flow handlers + Chip-level hardware encapsulation @@ -189,16 +189,16 @@ which are assigned to this interrupt. - Whenever an interrupt triggers, the lowlevel arch code calls into - the generic interrupt code by calling desc->handle_irq(). - This highlevel IRQ handling function only uses desc->irq_data.chip + Whenever an interrupt triggers, the low-level architecture code calls + into the generic interrupt code by calling desc->handle_irq(). + This high-level IRQ handling function only uses desc->irq_data.chip primitives referenced by the assigned chip descriptor structure. - Highlevel Driver API + High-level Driver API - The highlevel Driver API consists of following functions: + The high-level Driver API consists of following functions: request_irq() free_irq() @@ -216,7 +216,7 @@ - Highlevel IRQ flow handlers + High-level IRQ flow handlers The generic layer provides a set of pre-defined irq-flow methods: @@ -228,7 +228,7 @@ handle_edge_eoi_irq handle_bad_irq - The interrupt flow handlers (either predefined or architecture + The interrupt flow handlers (either pre-defined or architecture specific) are assigned to specific interrupts by the architecture either during bootup or during device initialization. @@ -297,7 +297,7 @@ desc->irq_data.chip->irq_unmask(); handle_fasteoi_irq provides a generic implementation for interrupts, which only need an EOI at the end of - the handler + the handler. The following control flow is implemented (simplified excerpt): @@ -394,7 +394,7 @@ if (desc->irq_data.chip->irq_eoi) The generic functions are intended for 'clean' architectures and chips, which have no platform-specific IRQ handling quirks. If an architecture needs to implement quirks on the 'flow' level then it can do so by - overriding the highlevel irq-flow handler. + overriding the high-level irq-flow handler. @@ -419,9 +419,9 @@ if (desc->irq_data.chip->irq_eoi) - Chiplevel hardware encapsulation + Chip-level hardware encapsulation - The chip level hardware descriptor structure irq_chip + The chip-level hardware descriptor structure irq_chip contains all the direct chip relevant functions, which can be utilized by the irq flow implementations. @@ -429,14 +429,14 @@ if (desc->irq_data.chip->irq_eoi) irq_mask_ack() - Optional, recommended for performance irq_mask() irq_unmask() - irq_eoi() - Optional, required for eoi flow handlers + irq_eoi() - Optional, required for EOI flow handlers irq_retrigger() - Optional irq_set_type() - Optional irq_set_wake() - Optional These primitives are strictly intended to mean what they say: ack means ACK, masking means masking of an IRQ line, etc. It is up to the flow - handler(s) to use these basic units of lowlevel functionality. + handler(s) to use these basic units of low-level functionality. @@ -445,7 +445,7 @@ if (desc->irq_data.chip->irq_eoi) __do_IRQ entry point The original implementation __do_IRQ() was an alternative entry - point for all types of interrupts. It not longer exists. + point for all types of interrupts. It no longer exists. This handler turned out to be not suitable for all @@ -468,11 +468,11 @@ if (desc->irq_data.chip->irq_eoi) Generic interrupt chip - To avoid copies of identical implementations of irq chips the + To avoid copies of identical implementations of IRQ chips the core provides a configurable generic interrupt chip implementation. Developers should check carefuly whether the generic chip fits their needs before implementing the same - functionality slightly different themself. + functionality slightly differently themselves. !Ekernel/irq/generic-chip.c diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 7703ec73a9bb..91266193b8f4 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -202,8 +202,8 @@ over a rather long period of time, but improvements are always welcome! updater uses call_rcu_sched() or synchronize_sched(), then the corresponding readers must disable preemption, possibly by calling rcu_read_lock_sched() and rcu_read_unlock_sched(). - If the updater uses synchronize_srcu() or call_srcu(), - the the corresponding readers must use srcu_read_lock() and + If the updater uses synchronize_srcu() or call_srcu(), then + the corresponding readers must use srcu_read_lock() and srcu_read_unlock(), and with the same srcu_struct. The rules for the expedited primitives are the same as for their non-expedited counterparts. Mixing things up will result in confusion and diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 8e9359de1d28..6f3a0057548e 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -12,12 +12,12 @@ CONFIG_RCU_CPU_STALL_TIMEOUT This kernel configuration parameter defines the period of time that RCU will wait from the beginning of a grace period until it issues an RCU CPU stall warning. This time period is normally - sixty seconds. + 21 seconds. This configuration parameter may be changed at runtime via the /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however this parameter is checked only at the beginning of a cycle. - So if you are 30 seconds into a 70-second stall, setting this + So if you are 10 seconds into a 40-second stall, setting this sysfs parameter to (say) five will shorten the timeout for the -next- stall, or the following warning for the current stall (assuming the stall lasts long enough). It will not affect the @@ -32,7 +32,7 @@ CONFIG_RCU_CPU_STALL_VERBOSE also dump the stacks of any tasks that are blocking the current RCU-preempt grace period. -RCU_CPU_STALL_INFO +CONFIG_RCU_CPU_STALL_INFO This kernel configuration parameter causes the stall warning to print out additional per-CPU diagnostic information, including @@ -43,7 +43,8 @@ RCU_STALL_DELAY_DELTA Although the lockdep facility is extremely useful, it does add some overhead. Therefore, under CONFIG_PROVE_RCU, the RCU_STALL_DELAY_DELTA macro allows five extra seconds before - giving an RCU CPU stall warning message. + giving an RCU CPU stall warning message. (This is a cpp + macro, not a kernel configuration parameter.) RCU_STALL_RAT_DELAY @@ -52,7 +53,8 @@ RCU_STALL_RAT_DELAY However, if the offending CPU does not detect its own stall in the number of jiffies specified by RCU_STALL_RAT_DELAY, then some other CPU will complain. This delay is normally set to - two jiffies. + two jiffies. (This is a cpp macro, not a kernel configuration + parameter.) When a CPU detects that it is stalling, it will print a message similar to the following: @@ -86,7 +88,12 @@ printing, there will be a spurious stall-warning message: INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffies) -This is rare, but does happen from time to time in real life. +This is rare, but does happen from time to time in real life. It is also +possible for a zero-jiffy stall to be flagged in this case, depending +on how the stall warning and the grace-period initialization happen to +interact. Please note that it is not possible to entirely eliminate this +sort of false positive without resorting to things like stop_machine(), +which is overkill for this sort of problem. If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, more information is printed with the stall-warning message, for example: @@ -216,4 +223,5 @@ that portion of the stack which remains the same from trace to trace. If you can reliably trigger the stall, ftrace can be quite helpful. RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE -and with RCU's event tracing. +and with RCU's event tracing. For information on RCU's event tracing, +see include/trace/events/rcu.h. diff --git a/Documentation/acpi/dsdt-override.txt b/Documentation/acpi/dsdt-override.txt index febbb1ba4d23..784841caa6e6 100644 --- a/Documentation/acpi/dsdt-override.txt +++ b/Documentation/acpi/dsdt-override.txt @@ -4,4 +4,4 @@ CONFIG_ACPI_CUSTOM_DSDT builds the image into the kernel. When to use this method is described in detail on the Linux/ACPI home page: -http://www.lesswatts.org/projects/acpi/overridingDSDT.php +https://01.org/linux-acpi/documentation/overriding-dsdt diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index d18ecd827c40..929d9904f74b 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX @@ -6,6 +6,8 @@ capability.txt - Generic Block Device Capability (/sys/block//capability) cfq-iosched.txt - CFQ IO scheduler tunables +cmdline-partition.txt + - how to specify block device partitions on kernel command line data-integrity.txt - Block data integrity deadline-iosched.txt diff --git a/Documentation/block/cmdline-partition.txt b/Documentation/block/cmdline-partition.txt index 2bbf4cc40c3f..525b9f6d7fb4 100644 --- a/Documentation/block/cmdline-partition.txt +++ b/Documentation/block/cmdline-partition.txt @@ -1,9 +1,9 @@ -Embedded device command line partition +Embedded device command line partition parsing ===================================================================== -Read block device partition table from command line. -The partition used for fixed block device (eMMC) embedded device. -It is no MBR, save storage space. Bootloader can be easily accessed +Support for reading the block device partition table from the command line. +It is typically used for fixed block (eMMC) embedded devices. +It has no MBR, so saves storage space. Bootloader can be easily accessed by absolute address of data on the block device. Users can easily change the partition. diff --git a/Documentation/devicetree/bindings/memory.txt b/Documentation/devicetree/bindings/memory.txt deleted file mode 100644 index eb2469365593..000000000000 --- a/Documentation/devicetree/bindings/memory.txt +++ /dev/null @@ -1,168 +0,0 @@ -*** Memory binding *** - -The /memory node provides basic information about the address and size -of the physical memory. This node is usually filled or updated by the -bootloader, depending on the actual memory configuration of the given -hardware. - -The memory layout is described by the following node: - -/ { - #address-cells = <(n)>; - #size-cells = <(m)>; - memory { - device_type = "memory"; - reg = <(baseaddr1) (size1) - (baseaddr2) (size2) - ... - (baseaddrN) (sizeN)>; - }; - ... -}; - -A memory node follows the typical device tree rules for "reg" property: -n: number of cells used to store base address value -m: number of cells used to store size value -baseaddrX: defines a base address of the defined memory bank -sizeX: the size of the defined memory bank - - -More than one memory bank can be defined. - - -*** Reserved memory regions *** - -In /memory/reserved-memory node one can create child nodes describing -particular reserved (excluded from normal use) memory regions. Such -memory regions are usually designed for the special usage by various -device drivers. A good example are contiguous memory allocations or -memory sharing with other operating system on the same hardware board. -Those special memory regions might depend on the board configuration and -devices used on the target system. - -Parameters for each memory region can be encoded into the device tree -with the following convention: - -[(label):] (name) { - compatible = "linux,contiguous-memory-region", "reserved-memory-region"; - reg = <(address) (size)>; - (linux,default-contiguous-region); -}; - -compatible: one or more of: - - "linux,contiguous-memory-region" - enables binding of this - region to Contiguous Memory Allocator (special region for - contiguous memory allocations, shared with movable system - memory, Linux kernel-specific). - - "reserved-memory-region" - compatibility is defined, given - region is assigned for exclusive usage for by the respective - devices. - -reg: standard property defining the base address and size of - the memory region - -linux,default-contiguous-region: property indicating that the region - is the default region for all contiguous memory - allocations, Linux specific (optional) - -It is optional to specify the base address, so if one wants to use -autoconfiguration of the base address, '0' can be specified as a base -address in the 'reg' property. - -The /memory/reserved-memory node must contain the same #address-cells -and #size-cells value as the root node. - - -*** Device node's properties *** - -Once regions in the /memory/reserved-memory node have been defined, they -may be referenced by other device nodes. Bindings that wish to reference -memory regions should explicitly document their use of the following -property: - -memory-region = <&phandle_to_defined_region>; - -This property indicates that the device driver should use the memory -region pointed by the given phandle. - - -*** Example *** - -This example defines a memory consisting of 4 memory banks. 3 contiguous -regions are defined for Linux kernel, one default of all device drivers -(named contig_mem, placed at 0x72000000, 64MiB), one dedicated to the -framebuffer device (labelled display_mem, placed at 0x78000000, 8MiB) -and one for multimedia processing (labelled multimedia_mem, placed at -0x77000000, 64MiB). 'display_mem' region is then assigned to fb@12300000 -device for DMA memory allocations (Linux kernel drivers will use CMA is -available or dma-exclusive usage otherwise). 'multimedia_mem' is -assigned to scaler@12500000 and codec@12600000 devices for contiguous -memory allocations when CMA driver is enabled. - -The reason for creating a separate region for framebuffer device is to -match the framebuffer base address to the one configured by bootloader, -so once Linux kernel drivers starts no glitches on the displayed boot -logo appears. Scaller and codec drivers should share the memory -allocations. - -/ { - #address-cells = <1>; - #size-cells = <1>; - - /* ... */ - - memory { - reg = <0x40000000 0x10000000 - 0x50000000 0x10000000 - 0x60000000 0x10000000 - 0x70000000 0x10000000>; - - reserved-memory { - #address-cells = <1>; - #size-cells = <1>; - - /* - * global autoconfigured region for contiguous allocations - * (used only with Contiguous Memory Allocator) - */ - contig_region@0 { - compatible = "linux,contiguous-memory-region"; - reg = <0x0 0x4000000>; - linux,default-contiguous-region; - }; - - /* - * special region for framebuffer - */ - display_region: region@78000000 { - compatible = "linux,contiguous-memory-region", "reserved-memory-region"; - reg = <0x78000000 0x800000>; - }; - - /* - * special region for multimedia processing devices - */ - multimedia_region: region@77000000 { - compatible = "linux,contiguous-memory-region"; - reg = <0x77000000 0x4000000>; - }; - }; - }; - - /* ... */ - - fb0: fb@12300000 { - status = "okay"; - memory-region = <&display_region>; - }; - - scaler: scaler@12500000 { - status = "okay"; - memory-region = <&multimedia_region>; - }; - - codec: codec@12600000 { - status = "okay"; - memory-region = <&multimedia_region>; - }; -}; diff --git a/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt index 6d1c0988cfc7..c67b975c8906 100644 --- a/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt @@ -1,11 +1,11 @@ -* Samsung Exynos specific extensions to the Synopsis Designware Mobile +* Samsung Exynos specific extensions to the Synopsys Designware Mobile Storage Host Controller -The Synopsis designware mobile storage host controller is used to interface +The Synopsys designware mobile storage host controller is used to interface a SoC with storage medium such as eMMC or SD/MMC cards. This file documents -differences between the core Synopsis dw mshc controller properties described -by synopsis-dw-mshc.txt and the properties used by the Samsung Exynos specific -extensions to the Synopsis Designware Mobile Storage Host Controller. +differences between the core Synopsys dw mshc controller properties described +by synopsys-dw-mshc.txt and the properties used by the Samsung Exynos specific +extensions to the Synopsys Designware Mobile Storage Host Controller. Required Properties: diff --git a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt index 8a3d91d47b6a..c559f3f36309 100644 --- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt @@ -1,11 +1,11 @@ -* Rockchip specific extensions to the Synopsis Designware Mobile +* Rockchip specific extensions to the Synopsys Designware Mobile Storage Host Controller -The Synopsis designware mobile storage host controller is used to interface +The Synopsys designware mobile storage host controller is used to interface a SoC with storage medium such as eMMC or SD/MMC cards. This file documents -differences between the core Synopsis dw mshc controller properties described -by synopsis-dw-mshc.txt and the properties used by the Rockchip specific -extensions to the Synopsis Designware Mobile Storage Host Controller. +differences between the core Synopsys dw mshc controller properties described +by synopsys-dw-mshc.txt and the properties used by the Rockchip specific +extensions to the Synopsys Designware Mobile Storage Host Controller. Required Properties: diff --git a/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt similarity index 93% rename from Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt rename to Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt index cdcebea9c6f5..066a78b034ca 100644 --- a/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt @@ -1,14 +1,14 @@ -* Synopsis Designware Mobile Storage Host Controller +* Synopsys Designware Mobile Storage Host Controller -The Synopsis designware mobile storage host controller is used to interface +The Synopsys designware mobile storage host controller is used to interface a SoC with storage medium such as eMMC or SD/MMC cards. This file documents differences between the core mmc properties described by mmc.txt and the -properties used by the Synopsis Designware Mobile Storage Host Controller. +properties used by the Synopsys Designware Mobile Storage Host Controller. Required Properties: * compatible: should be - - snps,dw-mshc: for controllers compliant with synopsis dw-mshc. + - snps,dw-mshc: for controllers compliant with synopsys dw-mshc. * #address-cells: should be 1. * #size-cells: should be 0. diff --git a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt index df204e18e030..6a2a1160a70d 100644 --- a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt +++ b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt @@ -9,12 +9,15 @@ compulsory and any optional properties, common to all SD/MMC drivers, as described in mmc.txt, can be used. Additionally the following tmio_mmc-specific optional bindings can be used. +Required properties: +- compatible: "renesas,sdhi-shmobile" - a generic sh-mobile SDHI unit + "renesas,sdhi-sh7372" - SDHI IP on SH7372 SoC + "renesas,sdhi-sh73a0" - SDHI IP on SH73A0 SoC + "renesas,sdhi-r8a73a4" - SDHI IP on R8A73A4 SoC + "renesas,sdhi-r8a7740" - SDHI IP on R8A7740 SoC + "renesas,sdhi-r8a7778" - SDHI IP on R8A7778 SoC + "renesas,sdhi-r8a7779" - SDHI IP on R8A7779 SoC + "renesas,sdhi-r8a7790" - SDHI IP on R8A7790 SoC + Optional properties: - toshiba,mmc-wrprotect-disable: write-protect detection is unavailable - -When used with Renesas SDHI hardware, the following compatibility strings -configure various model-specific properties: - -"renesas,sh7372-sdhi": (default) compatible with SH7372 -"renesas,r8a7740-sdhi": compatible with R8A7740: certain MMC/SD commands have to - wait for the interface to become idle. diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt index 2c6be0377f55..d2ea4605d078 100644 --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt @@ -86,6 +86,7 @@ General Properties: Clock Properties: + - fsl,cksel Timer reference clock source. - fsl,tclk-period Timer reference clock period in nanoseconds. - fsl,tmr-prsc Prescaler, divides the output clock. - fsl,tmr-add Frequency compensation value. @@ -97,7 +98,7 @@ Clock Properties: clock. You must choose these carefully for the clock to work right. Here is how to figure good values: - TimerOsc = system clock MHz + TimerOsc = selected reference clock MHz tclk_period = desired clock period nanoseconds NominalFreq = 1000 / tclk_period MHz FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) @@ -114,6 +115,20 @@ Clock Properties: Pulse Per Second (PPS) signal, since this will be offered to the PPS subsystem to synchronize the Linux clock. + Reference clock source is determined by the value, which is holded + in CKSEL bits in TMR_CTRL register. "fsl,cksel" property keeps the + value, which will be directly written in those bits, that is why, + according to reference manual, the next clock sources can be used: + + <0> - external high precision timer reference clock (TSEC_TMR_CLK + input is used for this purpose); + <1> - eTSEC system clock; + <2> - eTSEC1 transmit clock; + <3> - RTC clock input. + + When this attribute is not used, eTSEC system clock will serve as + IEEE 1588 timer reference clock. + Example: ptp_clock@24E00 { @@ -121,6 +136,7 @@ Example: reg = <0x24E00 0xB0>; interrupts = <12 0x8 13 0x8>; interrupt-parent = < &ipic >; + fsl,cksel = <1>; fsl,tclk-period = <10>; fsl,tmr-prsc = <100>; fsl,tmr-add = <0x999999A4>; diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt b/Documentation/devicetree/bindings/pci/designware-pcie.txt index eabcb4b5db6e..e216af356847 100644 --- a/Documentation/devicetree/bindings/pci/designware-pcie.txt +++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt @@ -1,4 +1,4 @@ -* Synopsis Designware PCIe interface +* Synopsys Designware PCIe interface Required properties: - compatible: should contain "snps,dw-pcie" to identify the diff --git a/Documentation/devicetree/bindings/tty/serial/qca,ar9330-uart.txt b/Documentation/devicetree/bindings/serial/qca,ar9330-uart.txt similarity index 100% rename from Documentation/devicetree/bindings/tty/serial/qca,ar9330-uart.txt rename to Documentation/devicetree/bindings/serial/qca,ar9330-uart.txt diff --git a/Documentation/x86/efi-stub.txt b/Documentation/efi-stub.txt similarity index 100% rename from Documentation/x86/efi-stub.txt rename to Documentation/efi-stub.txt diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1a036cd972fb..17431e91a1b1 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -480,6 +480,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Format: ,, See header of drivers/net/hamradio/baycom_ser_hdx.c. + blkdevparts= Manual partition parsing of block device(s) for + embedded devices based on command line input. + See Documentation/block/cmdline-partition.txt + boot_delay= Milliseconds to delay each printk during boot. Values larger than 10 seconds (10000) are changed to no delay (0). @@ -1357,7 +1361,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. pages. In the event, a node is too small to have both kernelcore and Movable pages, kernelcore pages will take priority and other nodes will have a larger number - of kernelcore pages. The Movable zone is used for the + of Movable pages. The Movable zone is used for the allocation of pages that may be reclaimed or moved by the page migration subsystem. This means that HugeTLB pages may not be allocated from this zone. @@ -1971,6 +1975,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. noapic [SMP,APIC] Tells the kernel to not make use of any IOAPICs that may be present in the system. + nokaslr [X86] + Disable kernel base offset ASLR (Address Space + Layout Randomization) if built into the kernel. + noautogroup Disable scheduler automatic task group creation. nobats [PPC] Do not use BATs for mapping kernel lowmem @@ -2595,7 +2603,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ramdisk_size= [RAM] Sizes of RAM disks in kilobytes See Documentation/blockdev/ramdisk.txt. - rcu_nocbs= [KNL,BOOT] + rcu_nocbs= [KNL] In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs. Invocation of these CPUs' RCU callbacks will @@ -2608,7 +2616,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. real-time workloads. It can also improve energy efficiency for asymmetric multiprocessors. - rcu_nocb_poll [KNL,BOOT] + rcu_nocb_poll [KNL] Rather than requiring that offloaded CPUs (specified by rcu_nocbs= above) explicitly awaken the corresponding "rcuoN" kthreads, @@ -2619,126 +2627,145 @@ bytes respectively. Such letter suffixes can also be entirely omitted. energy efficiency by requiring that the kthreads periodically wake up to do the polling. - rcutree.blimit= [KNL,BOOT] + rcutree.blimit= [KNL] Set maximum number of finished RCU callbacks to process in one batch. - rcutree.fanout_leaf= [KNL,BOOT] + rcutree.rcu_fanout_leaf= [KNL] Increase the number of CPUs assigned to each leaf rcu_node structure. Useful for very large systems. - rcutree.jiffies_till_first_fqs= [KNL,BOOT] + rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to first attempt to force quiescent states. Units are jiffies, minimum value is zero, and maximum value is HZ. - rcutree.jiffies_till_next_fqs= [KNL,BOOT] + rcutree.jiffies_till_next_fqs= [KNL] Set delay between subsequent attempts to force quiescent states. Units are jiffies, minimum value is one, and maximum value is HZ. - rcutree.qhimark= [KNL,BOOT] + rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks over which batch limiting is disabled. - rcutree.qlowmark= [KNL,BOOT] + rcutree.qlowmark= [KNL] Set threshold of queued RCU callbacks below which batch limiting is re-enabled. - rcutree.rcu_cpu_stall_suppress= [KNL,BOOT] - Suppress RCU CPU stall warning messages. - - rcutree.rcu_cpu_stall_timeout= [KNL,BOOT] - Set timeout for RCU CPU stall warning messages. - - rcutree.rcu_idle_gp_delay= [KNL,BOOT] + rcutree.rcu_idle_gp_delay= [KNL] Set wakeup interval for idle CPUs that have RCU callbacks (RCU_FAST_NO_HZ=y). - rcutree.rcu_idle_lazy_gp_delay= [KNL,BOOT] + rcutree.rcu_idle_lazy_gp_delay= [KNL] Set wakeup interval for idle CPUs that have only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y). Lazy RCU callbacks are those which RCU can prove do nothing more than free memory. - rcutorture.fqs_duration= [KNL,BOOT] + rcutorture.fqs_duration= [KNL] Set duration of force_quiescent_state bursts. - rcutorture.fqs_holdoff= [KNL,BOOT] + rcutorture.fqs_holdoff= [KNL] Set holdoff time within force_quiescent_state bursts. - rcutorture.fqs_stutter= [KNL,BOOT] + rcutorture.fqs_stutter= [KNL] Set wait time between force_quiescent_state bursts. - rcutorture.irqreader= [KNL,BOOT] - Test RCU readers from irq handlers. + rcutorture.gp_exp= [KNL] + Use expedited update-side primitives. - rcutorture.n_barrier_cbs= [KNL,BOOT] + rcutorture.gp_normal= [KNL] + Use normal (non-expedited) update-side primitives. + If both gp_exp and gp_normal are set, do both. + If neither gp_exp nor gp_normal are set, still + do both. + + rcutorture.n_barrier_cbs= [KNL] Set callbacks/threads for rcu_barrier() testing. - rcutorture.nfakewriters= [KNL,BOOT] + rcutorture.nfakewriters= [KNL] Set number of concurrent RCU writers. These just stress RCU, they don't participate in the actual test, hence the "fake". - rcutorture.nreaders= [KNL,BOOT] + rcutorture.nreaders= [KNL] Set number of RCU readers. - rcutorture.onoff_holdoff= [KNL,BOOT] + rcutorture.object_debug= [KNL] + Enable debug-object double-call_rcu() testing. + + rcutorture.onoff_holdoff= [KNL] Set time (s) after boot for CPU-hotplug testing. - rcutorture.onoff_interval= [KNL,BOOT] + rcutorture.onoff_interval= [KNL] Set time (s) between CPU-hotplug operations, or zero to disable CPU-hotplug testing. - rcutorture.shuffle_interval= [KNL,BOOT] + rcutorture.rcutorture_runnable= [BOOT] + Start rcutorture running at boot time. + + rcutorture.shuffle_interval= [KNL] Set task-shuffle interval (s). Shuffling tasks allows some CPUs to go into dyntick-idle mode during the rcutorture test. - rcutorture.shutdown_secs= [KNL,BOOT] + rcutorture.shutdown_secs= [KNL] Set time (s) after boot system shutdown. This is useful for hands-off automated testing. - rcutorture.stall_cpu= [KNL,BOOT] + rcutorture.stall_cpu= [KNL] Duration of CPU stall (s) to test RCU CPU stall warnings, zero to disable. - rcutorture.stall_cpu_holdoff= [KNL,BOOT] + rcutorture.stall_cpu_holdoff= [KNL] Time to wait (s) after boot before inducing stall. - rcutorture.stat_interval= [KNL,BOOT] + rcutorture.stat_interval= [KNL] Time (s) between statistics printk()s. - rcutorture.stutter= [KNL,BOOT] + rcutorture.stutter= [KNL] Time (s) to stutter testing, for example, specifying five seconds causes the test to run for five seconds, wait for five seconds, and so on. This tests RCU's ability to transition abruptly to and from idle. - rcutorture.test_boost= [KNL,BOOT] + rcutorture.test_boost= [KNL] Test RCU priority boosting? 0=no, 1=maybe, 2=yes. "Maybe" means test if the RCU implementation under test support RCU priority boosting. - rcutorture.test_boost_duration= [KNL,BOOT] + rcutorture.test_boost_duration= [KNL] Duration (s) of each individual boost test. - rcutorture.test_boost_interval= [KNL,BOOT] + rcutorture.test_boost_interval= [KNL] Interval (s) between each boost test. - rcutorture.test_no_idle_hz= [KNL,BOOT] + rcutorture.test_no_idle_hz= [KNL] Test RCU's dyntick-idle handling. See also the rcutorture.shuffle_interval parameter. - rcutorture.torture_type= [KNL,BOOT] + rcutorture.torture_type= [KNL] Specify the RCU implementation to test. - rcutorture.verbose= [KNL,BOOT] + rcutorture.verbose= [KNL] Enable additional printk() statements. + rcupdate.rcu_expedited= [KNL] + Use expedited grace-period primitives, for + example, synchronize_rcu_expedited() instead + of synchronize_rcu(). This reduces latency, + but can increase CPU utilization, degrade + real-time latency, and degrade energy efficiency. + + rcupdate.rcu_cpu_stall_suppress= [KNL] + Suppress RCU CPU stall warning messages. + + rcupdate.rcu_cpu_stall_timeout= [KNL] + Set timeout for RCU CPU stall warning messages. + rdinit= [KNL] Format: Run specified binary instead of /init from the ramdisk, @@ -3467,11 +3494,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. default x2apic cluster mode on platforms supporting x2apic. - x86_mrst_timer= [X86-32,APBT] - Choose timer option for x86 Moorestown MID platform. + x86_intel_mid_timer= [X86-32,APBT] + Choose timer option for x86 Intel MID platform. Two valid options are apbt timer only and lapic timer plus one apbt timer for broadcast timer. - x86_mrst_timer=apbt_only | lapic_and_apbt + x86_intel_mid_timer=apbt_only | lapic_and_apbt xen_emul_unplug= [HW,X86,XEN] Unplug Xen emulated devices @@ -3485,6 +3512,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the unplug protocol never -- do not unplug even if version check succeeds + xen_nopvspin [X86,XEN] + Disables the ticketlock slowpath using Xen PV + optimizations. + xirc2ps_cs= [NET,PCMCIA] Format: ,,,,,[,[,[,]]] diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 32351bfabf20..827104fb9364 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -181,12 +181,17 @@ To reduce its OS jitter, do any of the following: make sure that this is safe on your particular system. d. It is not possible to entirely get rid of OS jitter from vmstat_update() on CONFIG_SMP=y systems, but you - can decrease its frequency by writing a large value to - /proc/sys/vm/stat_interval. The default value is HZ, - for an interval of one second. Of course, larger values - will make your virtual-memory statistics update more - slowly. Of course, you can also run your workload at - a real-time priority, thus preempting vmstat_update(). + can decrease its frequency by writing a large value + to /proc/sys/vm/stat_interval. The default value is + HZ, for an interval of one second. Of course, larger + values will make your virtual-memory statistics update + more slowly. Of course, you can also run your workload + at a real-time priority, thus preempting vmstat_update(), + but if your workload is CPU-bound, this is a bad idea. + However, there is an RFC patch from Christoph Lameter + (based on an earlier one from Gilad Ben-Yossef) that + reduces or even eliminates vmstat overhead for some + workloads at https://lkml.org/lkml/2013/9/4/379. e. If running on high-end powerpc servers, build with CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS daemon from running on each CPU every second or so. diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index dd2f7b26ca30..72d010689751 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -46,16 +46,14 @@ With these hooks we provide the following statistics: contentions - number of lock acquisitions that had to wait wait time min - shortest (non-0) time we ever had to wait for a lock max - longest time we ever had to wait for a lock - total - total time we spend waiting on this lock + total - total time we spend waiting on this lock + avg - average time spent waiting on this lock acq-bounces - number of lock acquisitions that involved x-cpu data acquisitions - number of times we took the lock hold time min - shortest (non-0) time we ever held the lock - max - longest time we ever held the lock - total - total time this lock was held - -From these number various other statistics can be derived, such as: - - hold time average = hold time total / acquisitions + max - longest time we ever held the lock + total - total time this lock was held + avg - average time this lock was held These numbers are gathered per lock class, per read/write state (when applicable). @@ -84,37 +82,38 @@ Look at the current lock statistics: # less /proc/lock_stat -01 lock_stat version 0.3 -02 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total -04 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +01 lock_stat version 0.4 +02----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg +04----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 05 -06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34 -07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88 -08 --------------- -09 &mm->mmap_sem 487 [] do_page_fault+0x466/0x928 -10 &mm->mmap_sem 179 [] sys_mprotect+0xcd/0x21d -11 &mm->mmap_sem 279 [] sys_mmap+0x75/0xce -12 &mm->mmap_sem 76 [] sys_munmap+0x32/0x59 -13 --------------- -14 &mm->mmap_sem 270 [] sys_mmap+0x75/0xce -15 &mm->mmap_sem 431 [] do_page_fault+0x466/0x928 -16 &mm->mmap_sem 138 [] sys_munmap+0x32/0x59 -17 &mm->mmap_sem 145 [] sys_mprotect+0xcd/0x21d +06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99 +07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 +08 --------------- +09 &mm->mmap_sem 1 [] khugepaged_scan_mm_slot+0x57/0x280 +19 &mm->mmap_sem 96 [] __do_page_fault+0x1d4/0x510 +11 &mm->mmap_sem 34 [] vm_mmap_pgoff+0x87/0xd0 +12 &mm->mmap_sem 17 [] vm_munmap+0x41/0x80 +13 --------------- +14 &mm->mmap_sem 1 [] dup_mmap+0x2a/0x3f0 +15 &mm->mmap_sem 60 [] SyS_mprotect+0xe9/0x250 +16 &mm->mmap_sem 41 [] __do_page_fault+0x1d4/0x510 +17 &mm->mmap_sem 68 [] vm_mmap_pgoff+0x87/0xd0 18 -19 ............................................................................................................................................................................................... +19............................................................................................................................................................................................................................. 20 -21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41 -22 ----------- -23 dcache_lock 179 [] _atomic_dec_and_lock+0x34/0x54 -24 dcache_lock 113 [] d_alloc+0x19a/0x1eb -25 dcache_lock 99 [] d_rehash+0x1b/0x44 -26 dcache_lock 104 [] d_instantiate+0x36/0x8a -27 ----------- -28 dcache_lock 192 [] _atomic_dec_and_lock+0x34/0x54 -29 dcache_lock 98 [] d_rehash+0x1b/0x44 -30 dcache_lock 72 [] d_alloc+0x19a/0x1eb -31 dcache_lock 112 [] d_instantiate+0x36/0x8a +21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48 +22 --------------- +23 unix_table_lock 45 [] unix_create1+0x16e/0x1b0 +24 unix_table_lock 47 [] unix_release_sock+0x31/0x250 +25 unix_table_lock 15 [] unix_find_other+0x117/0x230 +26 unix_table_lock 5 [] unix_autobind+0x11f/0x1b0 +27 --------------- +28 unix_table_lock 39 [] unix_release_sock+0x31/0x250 +29 unix_table_lock 49 [] unix_create1+0x16e/0x1b0 +30 unix_table_lock 20 [] unix_find_other+0x117/0x230 +31 unix_table_lock 4 [] unix_autobind+0x11f/0x1b0 + This excerpt shows the first two lock class statistics. Line 01 shows the output version - each time the format changes this will be updated. Line 02-04 @@ -131,30 +130,30 @@ The integer part of the time values is in us. Dealing with nested locks, subclasses may appear: -32............................................................................................................................................................................................... +32........................................................................................................................................................................................................................... 33 -34 &rq->lock: 13128 13128 0.43 190.53 103881.26 97454 3453404 0.00 401.11 13224683.11 +34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82 35 --------- -36 &rq->lock 645 [] task_rq_lock+0x43/0x75 -37 &rq->lock 297 [] try_to_wake_up+0x127/0x25a -38 &rq->lock 360 [] select_task_rq_fair+0x1f0/0x74a -39 &rq->lock 428 [] scheduler_tick+0x46/0x1fb +36 &rq->lock 645 [] task_rq_lock+0x43/0x75 +37 &rq->lock 297 [] try_to_wake_up+0x127/0x25a +38 &rq->lock 360 [] select_task_rq_fair+0x1f0/0x74a +39 &rq->lock 428 [] scheduler_tick+0x46/0x1fb 40 --------- -41 &rq->lock 77 [] task_rq_lock+0x43/0x75 -42 &rq->lock 174 [] try_to_wake_up+0x127/0x25a -43 &rq->lock 4715 [] double_rq_lock+0x42/0x54 -44 &rq->lock 893 [] schedule+0x157/0x7b8 +41 &rq->lock 77 [] task_rq_lock+0x43/0x75 +42 &rq->lock 174 [] try_to_wake_up+0x127/0x25a +43 &rq->lock 4715 [] double_rq_lock+0x42/0x54 +44 &rq->lock 893 [] schedule+0x157/0x7b8 45 -46............................................................................................................................................................................................... +46........................................................................................................................................................................................................................... 47 -48 &rq->lock/1: 11526 11488 0.33 388.73 136294.31 21461 38404 0.00 37.93 109388.53 +48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84 49 ----------- -50 &rq->lock/1 11526 [] double_rq_lock+0x4f/0x54 +50 &rq->lock/1 11526 [] double_rq_lock+0x4f/0x54 51 ----------- -52 &rq->lock/1 5645 [] double_rq_lock+0x42/0x54 -53 &rq->lock/1 1224 [] schedule+0x157/0x7b8 -54 &rq->lock/1 4336 [] double_rq_lock+0x4f/0x54 -55 &rq->lock/1 181 [] try_to_wake_up+0x127/0x25a +52 &rq->lock/1 5645 [] double_rq_lock+0x42/0x54 +53 &rq->lock/1 1224 [] schedule+0x157/0x7b8 +54 &rq->lock/1 4336 [] double_rq_lock+0x4f/0x54 +55 &rq->lock/1 181 [] try_to_wake_up+0x127/0x25a Line 48 shows statistics for the second subclass (/1) of &rq->lock class (subclass starts from 0), since in this case, as line 50 suggests, @@ -163,16 +162,16 @@ double_rq_lock actually acquires a nested lock of two spinlocks. View the top contending locks: # grep : /proc/lock_stat | head - &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60 - &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38 - dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 - &inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74 - &zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06 - &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 - &q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52 - &rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62 - &rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63 - tasklist_lock-W: 15 15 1.45 10.87 32.70 1201 7390 0.58 62.55 13648.47 + clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71 + tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59 + &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59 + &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69 + &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93 + tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72 + tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89 + rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89 + &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27 + rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76 Clear the statistics: diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index a46ddb85e83a..85c362d8ea34 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -28,6 +28,7 @@ ALC269/270/275/276/28x/29x alc269-dmic Enable ALC269(VA) digital mic workaround alc271-dmic Enable ALC271X digital mic workaround inv-dmic Inverted internal mic workaround + headset-mic Indicates a combined headset (headphone+mic) jack lenovo-dock Enables docking station I/O for some Lenovos dell-headset-multi Headset jack, which can also be used as mic-in dell-headset-dock Headset jack (without mic-in), and also dock I/O @@ -296,6 +297,12 @@ Cirrus Logic CS4206/4207 imac27 IMac 27 Inch auto BIOS setup (default) +Cirrus Logic CS4208 +=================== + mba6 MacBook Air 6,1 and 6,2 + gpio0 Enable GPIO 0 amp + auto BIOS setup (default) + VIA VT17xx/VT18xx/VT20xx ======================== auto BIOS setup (default) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 9d4c1d18ad44..4273b2d71a27 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -355,6 +355,82 @@ utilize. ============================================================== +numa_balancing + +Enables/disables automatic page fault based NUMA memory +balancing. Memory is moved automatically to nodes +that access it often. + +Enables/disables automatic NUMA memory balancing. On NUMA machines, there +is a performance penalty if remote memory is accessed by a CPU. When this +feature is enabled the kernel samples what task thread is accessing memory +by periodically unmapping pages and later trapping a page fault. At the +time of the page fault, it is determined if the data being accessed should +be migrated to a local memory node. + +The unmapping of pages and trapping faults incur additional overhead that +ideally is offset by improved memory locality but there is no universal +guarantee. If the target workload is already bound to NUMA nodes then this +feature should be disabled. Otherwise, if the system overhead from the +feature is too high then the rate the kernel samples for NUMA hinting +faults may be controlled by the numa_balancing_scan_period_min_ms, +numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, +numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and +numa_balancing_migrate_deferred. + +============================================================== + +numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, +numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb + +Automatic NUMA balancing scans tasks address space and unmaps pages to +detect if pages are properly placed or if the data should be migrated to a +memory node local to where the task is running. Every "scan delay" the task +scans the next "scan size" number of pages in its address space. When the +end of the address space is reached the scanner restarts from the beginning. + +In combination, the "scan delay" and "scan size" determine the scan rate. +When "scan delay" decreases, the scan rate increases. The scan delay and +hence the scan rate of every task is adaptive and depends on historical +behaviour. If pages are properly placed then the scan delay increases, +otherwise the scan delay decreases. The "scan size" is not adaptive but +the higher the "scan size", the higher the scan rate. + +Higher scan rates incur higher system overhead as page faults must be +trapped and potentially data must be migrated. However, the higher the scan +rate, the more quickly a tasks memory is migrated to a local node if the +workload pattern changes and minimises performance impact due to remote +memory accesses. These sysctls control the thresholds for scan delays and +the number of pages scanned. + +numa_balancing_scan_period_min_ms is the minimum time in milliseconds to +scan a tasks virtual memory. It effectively controls the maximum scanning +rate for each task. + +numa_balancing_scan_delay_ms is the starting "scan delay" used for a task +when it initially forks. + +numa_balancing_scan_period_max_ms is the maximum time in milliseconds to +scan a tasks virtual memory. It effectively controls the minimum scanning +rate for each task. + +numa_balancing_scan_size_mb is how many megabytes worth of pages are +scanned for a given scan. + +numa_balancing_settle_count is how many scan periods must complete before +the schedule balancer stops pushing the task towards a preferred node. This +gives the scheduler a chance to place the task on an alternative node if the +preferred node is overloaded. + +numa_balancing_migrate_deferred is how many page migrations get skipped +unconditionally, after a page migration is skipped because a page is shared +with other tasks. This reduces page migration overhead, and determines +how much stronger the "move task near its memory" policy scheduler becomes, +versus the "move memory near its task" memory management policy, for workloads +with shared memory. + +============================================================== + osrelease, ostype & version: # cat osrelease diff --git a/MAINTAINERS b/MAINTAINERS index c53fe9559642..016a5c44b0f3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -237,11 +237,11 @@ F: drivers/platform/x86/acer-wmi.c ACPI M: Len Brown -M: Rafael J. Wysocki +M: Rafael J. Wysocki L: linux-acpi@vger.kernel.org -W: http://www.lesswatts.org/projects/acpi/ -Q: http://patchwork.kernel.org/project/linux-acpi/list/ -T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux +W: https://01.org/linux-acpi +Q: https://patchwork.kernel.org/project/linux-acpi/list/ +T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm S: Supported F: drivers/acpi/ F: drivers/pnp/pnpacpi/ @@ -256,21 +256,21 @@ F: drivers/pci/*/*/*acpi* ACPI FAN DRIVER M: Zhang Rui L: linux-acpi@vger.kernel.org -W: http://www.lesswatts.org/projects/acpi/ +W: https://01.org/linux-acpi S: Supported F: drivers/acpi/fan.c ACPI THERMAL DRIVER M: Zhang Rui L: linux-acpi@vger.kernel.org -W: http://www.lesswatts.org/projects/acpi/ +W: https://01.org/linux-acpi S: Supported F: drivers/acpi/*thermal* ACPI VIDEO DRIVER M: Zhang Rui L: linux-acpi@vger.kernel.org -W: http://www.lesswatts.org/projects/acpi/ +W: https://01.org/linux-acpi S: Supported F: drivers/acpi/video.c @@ -824,15 +824,21 @@ S: Maintained F: arch/arm/mach-gemini/ ARM/CSR SIRFPRIMA2 MACHINE SUPPORT -M: Barry Song +M: Barry Song L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux.git S: Maintained F: arch/arm/mach-prima2/ +F: drivers/clk/clk-prima2.c +F: drivers/clocksource/timer-prima2.c +F: drivers/clocksource/timer-marco.c F: drivers/dma/sirf-dma.c F: drivers/i2c/busses/i2c-sirf.c +F: drivers/input/misc/sirfsoc-onkey.c +F: drivers/irqchip/irq-sirfsoc.c F: drivers/mmc/host/sdhci-sirf.c F: drivers/pinctrl/sirf/ +F: drivers/rtc/rtc-sirfsoc.c F: drivers/spi/spi-sirf.c ARM/EBSA110 MACHINE SUPPORT @@ -2294,7 +2300,7 @@ S: Maintained F: drivers/net/ethernet/ti/cpmac.c CPU FREQUENCY DRIVERS -M: Rafael J. Wysocki +M: Rafael J. Wysocki M: Viresh Kumar L: cpufreq@vger.kernel.org L: linux-pm@vger.kernel.org @@ -2325,7 +2331,7 @@ S: Maintained F: drivers/cpuidle/cpuidle-big_little.c CPUIDLE DRIVERS -M: Rafael J. Wysocki +M: Rafael J. Wysocki M: Daniel Lezcano L: linux-pm@vger.kernel.org S: Maintained @@ -2640,6 +2646,18 @@ F: include/linux/device-mapper.h F: include/linux/dm-*.h F: include/uapi/linux/dm-*.h +DIGI NEO AND CLASSIC PCI PRODUCTS +M: Lidza Louina +L: driverdev-devel@linuxdriverproject.org +S: Maintained +F: drivers/staging/dgnc/ + +DIGI EPCA PCI PRODUCTS +M: Lidza Louina +L: driverdev-devel@linuxdriverproject.org +S: Maintained +F: drivers/staging/dgap/ + DIOLAN U2C-12 I2C DRIVER M: Guenter Roeck L: linux-i2c@vger.kernel.org @@ -3535,7 +3553,7 @@ F: fs/freevxfs/ FREEZER M: Pavel Machek -M: "Rafael J. Wysocki" +M: "Rafael J. Wysocki" L: linux-pm@vger.kernel.org S: Supported F: Documentation/power/freezing-of-tasks.txt @@ -3606,6 +3624,12 @@ L: linux-scsi@vger.kernel.org S: Odd Fixes (e.g., new signatures) F: drivers/scsi/fdomain.* +GCOV BASED KERNEL PROFILING +M: Peter Oberparleiter +S: Maintained +F: kernel/gcov/ +F: Documentation/gcov.txt + GDT SCSI DISK ARRAY CONTROLLER DRIVER M: Achim Leubner L: linux-scsi@vger.kernel.org @@ -3871,7 +3895,7 @@ F: drivers/video/hgafb.c HIBERNATION (aka Software Suspend, aka swsusp) M: Pavel Machek -M: "Rafael J. Wysocki" +M: "Rafael J. Wysocki" L: linux-pm@vger.kernel.org S: Supported F: arch/x86/power/ @@ -4321,7 +4345,7 @@ F: drivers/video/i810/ INTEL MENLOW THERMAL DRIVER M: Sujith Thomas L: platform-driver-x86@vger.kernel.org -W: http://www.lesswatts.org/projects/acpi/ +W: https://01.org/linux-acpi S: Supported F: drivers/platform/x86/intel_menlow.c @@ -4458,6 +4482,13 @@ L: linux-serial@vger.kernel.org S: Maintained F: drivers/tty/serial/ioc3_serial.c +IOMMU DRIVERS +M: Joerg Roedel +L: iommu@lists.linux-foundation.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git +S: Maintained +F: drivers/iommu/ + IP MASQUERADING M: Juanjo Ciarlante S: Maintained @@ -6904,7 +6935,7 @@ M: "Paul E. McKenney" S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git F: Documentation/RCU/torture.txt -F: kernel/rcutorture.c +F: kernel/rcu/torture.c RDC R-321X SoC M: Florian Fainelli @@ -6931,8 +6962,9 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git F: Documentation/RCU/ X: Documentation/RCU/torture.txt F: include/linux/rcu* -F: kernel/rcu* -X: kernel/rcutorture.c +X: include/linux/srcu.h +F: kernel/rcu/ +X: kernel/rcu/torture.c REAL TIME CLOCK (RTC) SUBSYSTEM M: Alessandro Zummo @@ -7257,11 +7289,13 @@ S: Maintained F: kernel/sched/ F: include/linux/sched.h F: include/uapi/linux/sched.h +F: kernel/wait.c +F: include/linux/wait.h SCORE ARCHITECTURE -M: Chen Liqin +M: Chen Liqin M: Lennox Wu -W: http://www.sunplusct.com +W: http://www.sunplus.com S: Supported F: arch/score/ @@ -7619,8 +7653,8 @@ M: "Paul E. McKenney" W: http://www.rdrop.com/users/paulmck/RCU/ S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git -F: include/linux/srcu* -F: kernel/srcu* +F: include/linux/srcu.h +F: kernel/rcu/srcu.c SMACK SECURITY MODULE M: Casey Schaufler @@ -8070,7 +8104,7 @@ F: drivers/sh/ SUSPEND TO RAM M: Len Brown M: Pavel Machek -M: "Rafael J. Wysocki" +M: "Rafael J. Wysocki" L: linux-pm@vger.kernel.org S: Supported F: Documentation/power/ @@ -8725,9 +8759,8 @@ F: Documentation/hid/hiddev.txt F: drivers/hid/usbhid/ USB/IP DRIVERS -M: Matt Mooney L: linux-usb@vger.kernel.org -S: Maintained +S: Orphan F: drivers/staging/usbip/ USB ISP116X DRIVER @@ -9367,6 +9400,7 @@ F: arch/arm64/include/asm/xen/ XEN NETWORK BACKEND DRIVER M: Ian Campbell +M: Wei Liu L: xen-devel@lists.xenproject.org (moderated for non-subscribers) L: netdev@vger.kernel.org S: Supported diff --git a/Makefile b/Makefile index 8d0668f473ba..126321d2e6ad 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 12 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -rc6 NAME = One Giant Leap for Frogkind # *DOCUMENTATION* diff --git a/arch/Kconfig b/arch/Kconfig index 1feb169274fe..ded747c7b74c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -286,9 +286,6 @@ config HAVE_PERF_USER_STACK_DUMP config HAVE_ARCH_JUMP_LABEL bool -config HAVE_ARCH_MUTEX_CPU_RELAX - bool - config HAVE_RCU_TABLE_FREE bool @@ -356,6 +353,18 @@ config HAVE_CONTEXT_TRACKING config HAVE_VIRT_CPU_ACCOUNTING bool +config HAVE_VIRT_CPU_ACCOUNTING_GEN + bool + default y if 64BIT + help + With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. + Before enabling this option, arch code must be audited + to ensure there are no races in concurrent read/write of + cputime_t. For example, reading/writing 64-bit cputime_t on + some 32-bit arches may require multiple accesses, so proper + locking is needed to protect against concurrent accesses. + + config HAVE_IRQ_TIME_ACCOUNTING bool help @@ -393,6 +402,16 @@ config HAVE_UNDERSCORE_SYMBOL_PREFIX Some architectures generate an _ in front of C symbols; things like module loading and assembly files need to know about this. +config HAVE_IRQ_EXIT_ON_IRQ_STACK + bool + help + Architecture doesn't only execute the irq handler on the irq stack + but also irq_exit(). This way we can process softirqs on this irq + stack instead of switching to a new one when we call __do_softirq() + in the end of an hardirq. + This spares a stack switch and improves cache usage on softirq + processing. + # # ABI hall of shame # diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index a6e85f448c1c..f01fb505ad52 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index d8dd660898b9..5943f7f9d325 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -46,3 +46,4 @@ generic-y += ucontext.h generic-y += user.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h index f158197ac5b0..b6a8c2dfbe6e 100644 --- a/arch/arc/include/asm/spinlock.h +++ b/arch/arc/include/asm/spinlock.h @@ -45,7 +45,14 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) static inline void arch_spin_unlock(arch_spinlock_t *lock) { - lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__; + unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__; + + __asm__ __volatile__( + " ex %0, [%1] \n" + : "+r" (tmp) + : "r"(&(lock->slock)) + : "memory"); + smp_mb(); } diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h index 32420824375b..30c9baffa96f 100644 --- a/arch/arc/include/asm/uaccess.h +++ b/arch/arc/include/asm/uaccess.h @@ -43,7 +43,7 @@ * Because it essentially checks if buffer end is within limit and @len is * non-ngeative, which implies that buffer start will be within limit too. * - * The reason for rewriting being, for majorit yof cases, @len is generally + * The reason for rewriting being, for majority of cases, @len is generally * compile time constant, causing first sub-expression to be compile time * subsumed. * @@ -53,7 +53,7 @@ * */ #define __user_ok(addr, sz) (((sz) <= TASK_SIZE) && \ - (((addr)+(sz)) <= get_fs())) + ((addr) <= (get_fs() - (sz)))) #define __access_ok(addr, sz) (unlikely(__kernel_ok) || \ likely(__user_ok((addr), (sz)))) diff --git a/arch/arc/kernel/ptrace.c b/arch/arc/kernel/ptrace.c index 333238564b67..5d76706139dd 100644 --- a/arch/arc/kernel/ptrace.c +++ b/arch/arc/kernel/ptrace.c @@ -102,7 +102,7 @@ static int genregs_set(struct task_struct *target, REG_IGNORE_ONE(pad2); REG_IN_CHUNK(callee, efa, cregs); /* callee_regs[r25..r13] */ REG_IGNORE_ONE(efa); /* efa update invalid */ - REG_IN_ONE(stop_pc, &ptregs->ret); /* stop_pc: PC update */ + REG_IGNORE_ONE(stop_pc); /* PC updated via @ret */ return ret; } diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c index ee6ef2f60a28..7e95e1a86510 100644 --- a/arch/arc/kernel/signal.c +++ b/arch/arc/kernel/signal.c @@ -101,7 +101,6 @@ SYSCALL_DEFINE0(rt_sigreturn) { struct rt_sigframe __user *sf; unsigned int magic; - int err; struct pt_regs *regs = current_pt_regs(); /* Always make any pending restarted system calls return -EINTR */ @@ -119,15 +118,16 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!access_ok(VERIFY_READ, sf, sizeof(*sf))) goto badframe; - err = restore_usr_regs(regs, sf); - err |= __get_user(magic, &sf->sigret_magic); - if (err) + if (__get_user(magic, &sf->sigret_magic)) goto badframe; if (unlikely(is_do_ss_needed(magic))) if (restore_altstack(&sf->uc.uc_stack)) goto badframe; + if (restore_usr_regs(regs, sf)) + goto badframe; + /* Don't restart from sigreturn */ syscall_wont_restart(regs); @@ -190,6 +190,15 @@ setup_rt_frame(int signo, struct k_sigaction *ka, siginfo_t *info, if (!sf) return 1; + /* + * w/o SA_SIGINFO, struct ucontext is partially populated (only + * uc_mcontext/uc_sigmask) for kernel's normal user state preservation + * during signal handler execution. This works for SA_SIGINFO as well + * although the semantics are now overloaded (the same reg state can be + * inspected by userland: but are they allowed to fiddle with it ? + */ + err |= stash_usr_regs(sf, regs, set); + /* * SA_SIGINFO requires 3 args to signal handler: * #1: sig-no (common to any handler) @@ -213,14 +222,6 @@ setup_rt_frame(int signo, struct k_sigaction *ka, siginfo_t *info, magic = MAGIC_SIGALTSTK; } - /* - * w/o SA_SIGINFO, struct ucontext is partially populated (only - * uc_mcontext/uc_sigmask) for kernel's normal user state preservation - * during signal handler execution. This works for SA_SIGINFO as well - * although the semantics are now overloaded (the same reg state can be - * inspected by userland: but are they allowed to fiddle with it ? - */ - err |= stash_usr_regs(sf, regs, set); err |= __put_user(magic, &sf->sigret_magic); if (err) return err; diff --git a/arch/arc/kernel/time.c b/arch/arc/kernel/time.c index 0e51e69cf30d..3fde7de3ea67 100644 --- a/arch/arc/kernel/time.c +++ b/arch/arc/kernel/time.c @@ -227,12 +227,9 @@ void __attribute__((weak)) arc_local_timer_setup(unsigned int cpu) { struct clock_event_device *clk = &per_cpu(arc_clockevent_device, cpu); - clockevents_calc_mult_shift(clk, arc_get_core_freq(), 5); - - clk->max_delta_ns = clockevent_delta2ns(ARC_TIMER_MAX, clk); clk->cpumask = cpumask_of(cpu); - - clockevents_register_device(clk); + clockevents_config_and_register(clk, arc_get_core_freq(), + 0, ARC_TIMER_MAX); /* * setup the per-cpu timer IRQ handler - for all cpus diff --git a/arch/arc/kernel/unaligned.c b/arch/arc/kernel/unaligned.c index 28d170060747..7ff5b5c183bb 100644 --- a/arch/arc/kernel/unaligned.c +++ b/arch/arc/kernel/unaligned.c @@ -245,6 +245,12 @@ int misaligned_fixup(unsigned long address, struct pt_regs *regs, regs->status32 &= ~STATUS_DE_MASK; } else { regs->ret += state.instr_len; + + /* handle zero-overhead-loop */ + if ((regs->ret == regs->lp_end) && (regs->lp_count)) { + regs->ret = regs->lp_start; + regs->lp_count--; + } } return 0; diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 3f7714d8d2d2..323baf07fdce 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -54,6 +54,7 @@ config ARM select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 + select HAVE_VIRT_CPU_ACCOUNTING_GEN select IRQ_FORCED_THREADING select KTIME_SCALAR select MODULES_USE_ELF_REL @@ -2217,8 +2218,7 @@ config NEON config KERNEL_MODE_NEON bool "Support for NEON in kernel mode" - default n - depends on NEON + depends on NEON && AEABI help Say Y to include support for NEON in kernel mode. diff --git a/arch/arm/Makefile b/arch/arm/Makefile index a37a50f575a2..db50b626be98 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -296,10 +296,15 @@ archprepare: # Convert bzImage to zImage bzImage: zImage -zImage Image xipImage bootpImage uImage: vmlinux +BOOT_TARGETS = zImage Image xipImage bootpImage uImage +INSTALL_TARGETS = zinstall uinstall install + +PHONY += bzImage $(BOOT_TARGETS) $(INSTALL_TARGETS) + +$(BOOT_TARGETS): vmlinux $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $(boot)/$@ -zinstall uinstall install: vmlinux +$(INSTALL_TARGETS): $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $@ %.dtb: | scripts diff --git a/arch/arm/boot/Makefile b/arch/arm/boot/Makefile index 84aa2caf07ed..ec2f8065f955 100644 --- a/arch/arm/boot/Makefile +++ b/arch/arm/boot/Makefile @@ -95,24 +95,24 @@ initrd: @test "$(INITRD)" != "" || \ (echo You must specify INITRD; exit -1) -install: $(obj)/Image - $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ +install: + $(CONFIG_SHELL) $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" \ $(obj)/Image System.map "$(INSTALL_PATH)" -zinstall: $(obj)/zImage - $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ +zinstall: + $(CONFIG_SHELL) $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" \ $(obj)/zImage System.map "$(INSTALL_PATH)" -uinstall: $(obj)/uImage - $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ +uinstall: + $(CONFIG_SHELL) $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" \ $(obj)/uImage System.map "$(INSTALL_PATH)" zi: - $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ + $(CONFIG_SHELL) $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" \ $(obj)/zImage System.map "$(INSTALL_PATH)" i: - $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ + $(CONFIG_SHELL) $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" \ $(obj)/Image System.map "$(INSTALL_PATH)" subdir- := bootp compressed dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index e95af3f5433b..802720e3e8fd 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -41,6 +41,8 @@ dtb-$(CONFIG_ARCH_AT91) += sama5d33ek.dtb dtb-$(CONFIG_ARCH_AT91) += sama5d34ek.dtb dtb-$(CONFIG_ARCH_AT91) += sama5d35ek.dtb +dtb-$(CONFIG_ARCH_ATLAS6) += atlas6-evb.dtb + dtb-$(CONFIG_ARCH_BCM2835) += bcm2835-rpi-b.dtb dtb-$(CONFIG_ARCH_BCM) += bcm11351-brt.dtb \ bcm28155-ap.dtb diff --git a/arch/arm/boot/dts/armada-370-netgear-rn102.dts b/arch/arm/boot/dts/armada-370-netgear-rn102.dts index 05e4485a8225..8ac2ac1f69cc 100644 --- a/arch/arm/boot/dts/armada-370-netgear-rn102.dts +++ b/arch/arm/boot/dts/armada-370-netgear-rn102.dts @@ -27,6 +27,25 @@ }; soc { + ranges = ; + + pcie-controller { + status = "okay"; + + /* Connected to Marvell SATA controller */ + pcie@1,0 { + /* Port 0, Lane 0 */ + status = "okay"; + }; + + /* Connected to FL1009 USB 3.0 controller */ + pcie@2,0 { + /* Port 1, Lane 0 */ + status = "okay"; + }; + }; + internal-regs { serial@12000 { clock-frequency = <200000000>; @@ -57,6 +76,11 @@ marvell,pins = "mpp56"; marvell,function = "gpio"; }; + + poweroff: poweroff { + marvell,pins = "mpp8"; + marvell,function = "gpio"; + }; }; mdio { @@ -89,22 +113,6 @@ pwm_polarity = <0>; }; }; - - pcie-controller { - status = "okay"; - - /* Connected to Marvell SATA controller */ - pcie@1,0 { - /* Port 0, Lane 0 */ - status = "okay"; - }; - - /* Connected to FL1009 USB 3.0 controller */ - pcie@2,0 { - /* Port 1, Lane 0 */ - status = "okay"; - }; - }; }; }; @@ -160,7 +168,7 @@ button@1 { label = "Power Button"; linux,code = <116>; /* KEY_POWER */ - gpios = <&gpio1 30 1>; + gpios = <&gpio1 30 0>; }; button@2 { @@ -176,4 +184,11 @@ }; }; + gpio_poweroff { + compatible = "gpio-poweroff"; + pinctrl-0 = <&poweroff>; + pinctrl-names = "default"; + gpios = <&gpio0 8 1>; + }; + }; diff --git a/arch/arm/boot/dts/armada-xp.dtsi b/arch/arm/boot/dts/armada-xp.dtsi index def125c0eeaa..3058522f5aad 100644 --- a/arch/arm/boot/dts/armada-xp.dtsi +++ b/arch/arm/boot/dts/armada-xp.dtsi @@ -70,6 +70,8 @@ timer@20300 { compatible = "marvell,armada-xp-timer"; + clocks = <&coreclk 2>, <&refclk>; + clock-names = "nbclk", "fixed"; }; coreclk: mvebu-sar@18230 { @@ -169,4 +171,13 @@ }; }; }; + + clocks { + /* 25 MHz reference crystal */ + refclk: oscillator { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <25000000>; + }; + }; }; diff --git a/arch/arm/boot/dts/at91sam9x5.dtsi b/arch/arm/boot/dts/at91sam9x5.dtsi index cf78ac0b04b1..e74dc15efa9d 100644 --- a/arch/arm/boot/dts/at91sam9x5.dtsi +++ b/arch/arm/boot/dts/at91sam9x5.dtsi @@ -190,12 +190,12 @@ AT91_PIOA 8 AT91_PERIPH_A AT91_PINCTRL_NONE>; /* PA8 periph A */ }; - pinctrl_uart2_rts: uart2_rts-0 { + pinctrl_usart2_rts: usart2_rts-0 { atmel,pins = ; /* PB0 periph B */ }; - pinctrl_uart2_cts: uart2_cts-0 { + pinctrl_usart2_cts: usart2_cts-0 { atmel,pins = ; /* PB1 periph B */ }; @@ -556,6 +556,7 @@ interrupts = <12 IRQ_TYPE_LEVEL_HIGH 0>; dmas = <&dma0 1 AT91_DMA_CFG_PER_ID(0)>; dma-names = "rxtx"; + pinctrl-names = "default"; #address-cells = <1>; #size-cells = <0>; status = "disabled"; @@ -567,6 +568,7 @@ interrupts = <26 IRQ_TYPE_LEVEL_HIGH 0>; dmas = <&dma1 1 AT91_DMA_CFG_PER_ID(0)>; dma-names = "rxtx"; + pinctrl-names = "default"; #address-cells = <1>; #size-cells = <0>; status = "disabled"; diff --git a/arch/arm/boot/dts/atlas6.dtsi b/arch/arm/boot/dts/atlas6.dtsi index 8678e0c11119..6db4f81d4795 100644 --- a/arch/arm/boot/dts/atlas6.dtsi +++ b/arch/arm/boot/dts/atlas6.dtsi @@ -181,6 +181,8 @@ interrupts = <17>; fifosize = <128>; clocks = <&clks 13>; + sirf,uart-dma-rx-channel = <21>; + sirf,uart-dma-tx-channel = <2>; }; uart1: uart@b0060000 { @@ -199,6 +201,8 @@ interrupts = <19>; fifosize = <128>; clocks = <&clks 15>; + sirf,uart-dma-rx-channel = <6>; + sirf,uart-dma-tx-channel = <7>; }; usp0: usp@b0080000 { @@ -206,7 +210,10 @@ compatible = "sirf,prima2-usp"; reg = <0xb0080000 0x10000>; interrupts = <20>; + fifosize = <128>; clocks = <&clks 28>; + sirf,usp-dma-rx-channel = <17>; + sirf,usp-dma-tx-channel = <18>; }; usp1: usp@b0090000 { @@ -214,7 +221,10 @@ compatible = "sirf,prima2-usp"; reg = <0xb0090000 0x10000>; interrupts = <21>; + fifosize = <128>; clocks = <&clks 29>; + sirf,usp-dma-rx-channel = <14>; + sirf,usp-dma-tx-channel = <15>; }; dmac0: dma-controller@b00b0000 { @@ -237,6 +247,8 @@ compatible = "sirf,prima2-vip"; reg = <0xb00C0000 0x10000>; clocks = <&clks 31>; + interrupts = <14>; + sirf,vip-dma-rx-channel = <16>; }; spi0: spi@b00d0000 { diff --git a/arch/arm/boot/dts/exynos5250.dtsi b/arch/arm/boot/dts/exynos5250.dtsi index 7d7cc777ff7b..bbac42a78ce5 100644 --- a/arch/arm/boot/dts/exynos5250.dtsi +++ b/arch/arm/boot/dts/exynos5250.dtsi @@ -96,6 +96,11 @@ <1 14 0xf08>, <1 11 0xf08>, <1 10 0xf08>; + /* Unfortunately we need this since some versions of U-Boot + * on Exynos don't set the CNTFRQ register, so we need the + * value from DT. + */ + clock-frequency = <24000000>; }; mct@101C0000 { diff --git a/arch/arm/boot/dts/kirkwood.dtsi b/arch/arm/boot/dts/kirkwood.dtsi index cf7aeaf89e9c..1335b2e1bed4 100644 --- a/arch/arm/boot/dts/kirkwood.dtsi +++ b/arch/arm/boot/dts/kirkwood.dtsi @@ -13,6 +13,7 @@ cpu@0 { device_type = "cpu"; compatible = "marvell,feroceon"; + reg = <0>; clocks = <&core_clk 1>, <&core_clk 3>, <&gate_clk 11>; clock-names = "cpu_clk", "ddrclk", "powersave"; }; @@ -167,7 +168,7 @@ xor@60900 { compatible = "marvell,orion-xor"; reg = <0x60900 0x100 - 0xd0B00 0x100>; + 0x60B00 0x100>; status = "okay"; clocks = <&gate_clk 16>; diff --git a/arch/arm/boot/dts/omap3-beagle-xm.dts b/arch/arm/boot/dts/omap3-beagle-xm.dts index 0c514dc8460c..2816bf612672 100644 --- a/arch/arm/boot/dts/omap3-beagle-xm.dts +++ b/arch/arm/boot/dts/omap3-beagle-xm.dts @@ -11,7 +11,7 @@ / { model = "TI OMAP3 BeagleBoard xM"; - compatible = "ti,omap3-beagle-xm", "ti,omap3-beagle", "ti,omap3"; + compatible = "ti,omap3-beagle-xm", "ti,omap36xx", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3.dtsi b/arch/arm/boot/dts/omap3.dtsi index 7d95cda1fae4..b41bd57f4328 100644 --- a/arch/arm/boot/dts/omap3.dtsi +++ b/arch/arm/boot/dts/omap3.dtsi @@ -108,7 +108,7 @@ #address-cells = <1>; #size-cells = <0>; pinctrl-single,register-width = <16>; - pinctrl-single,function-mask = <0x7f1f>; + pinctrl-single,function-mask = <0xff1f>; }; omap3_pmx_wkup: pinmux@0x48002a00 { @@ -117,7 +117,7 @@ #address-cells = <1>; #size-cells = <0>; pinctrl-single,register-width = <16>; - pinctrl-single,function-mask = <0x7f1f>; + pinctrl-single,function-mask = <0xff1f>; }; gpio1: gpio@48310000 { diff --git a/arch/arm/boot/dts/prima2.dtsi b/arch/arm/boot/dts/prima2.dtsi index bbeb623fc2c6..27ed9f5144bc 100644 --- a/arch/arm/boot/dts/prima2.dtsi +++ b/arch/arm/boot/dts/prima2.dtsi @@ -171,7 +171,8 @@ compatible = "simple-bus"; #address-cells = <1>; #size-cells = <1>; - ranges = <0xb0000000 0xb0000000 0x180000>; + ranges = <0xb0000000 0xb0000000 0x180000>, + <0x56000000 0x56000000 0x1b00000>; timer@b0020000 { compatible = "sirf,prima2-tick"; @@ -196,25 +197,32 @@ uart0: uart@b0050000 { cell-index = <0>; compatible = "sirf,prima2-uart"; - reg = <0xb0050000 0x10000>; + reg = <0xb0050000 0x1000>; interrupts = <17>; + fifosize = <128>; clocks = <&clks 13>; + sirf,uart-dma-rx-channel = <21>; + sirf,uart-dma-tx-channel = <2>; }; uart1: uart@b0060000 { cell-index = <1>; compatible = "sirf,prima2-uart"; - reg = <0xb0060000 0x10000>; + reg = <0xb0060000 0x1000>; interrupts = <18>; + fifosize = <32>; clocks = <&clks 14>; }; uart2: uart@b0070000 { cell-index = <2>; compatible = "sirf,prima2-uart"; - reg = <0xb0070000 0x10000>; + reg = <0xb0070000 0x1000>; interrupts = <19>; + fifosize = <128>; clocks = <&clks 15>; + sirf,uart-dma-rx-channel = <6>; + sirf,uart-dma-tx-channel = <7>; }; usp0: usp@b0080000 { @@ -222,7 +230,10 @@ compatible = "sirf,prima2-usp"; reg = <0xb0080000 0x10000>; interrupts = <20>; + fifosize = <128>; clocks = <&clks 28>; + sirf,usp-dma-rx-channel = <17>; + sirf,usp-dma-tx-channel = <18>; }; usp1: usp@b0090000 { @@ -230,7 +241,10 @@ compatible = "sirf,prima2-usp"; reg = <0xb0090000 0x10000>; interrupts = <21>; + fifosize = <128>; clocks = <&clks 29>; + sirf,usp-dma-rx-channel = <14>; + sirf,usp-dma-tx-channel = <15>; }; usp2: usp@b00a0000 { @@ -238,7 +252,10 @@ compatible = "sirf,prima2-usp"; reg = <0xb00a0000 0x10000>; interrupts = <22>; + fifosize = <128>; clocks = <&clks 30>; + sirf,usp-dma-rx-channel = <10>; + sirf,usp-dma-tx-channel = <11>; }; dmac0: dma-controller@b00b0000 { @@ -261,6 +278,8 @@ compatible = "sirf,prima2-vip"; reg = <0xb00C0000 0x10000>; clocks = <&clks 31>; + interrupts = <14>; + sirf,vip-dma-rx-channel = <16>; }; spi0: spi@b00d0000 { diff --git a/arch/arm/boot/dts/r8a73a4.dtsi b/arch/arm/boot/dts/r8a73a4.dtsi index 6c26caa880f2..658fcc537576 100644 --- a/arch/arm/boot/dts/r8a73a4.dtsi +++ b/arch/arm/boot/dts/r8a73a4.dtsi @@ -193,7 +193,7 @@ }; sdhi0: sdhi@ee100000 { - compatible = "renesas,r8a73a4-sdhi"; + compatible = "renesas,sdhi-r8a73a4"; reg = <0 0xee100000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 165 4>; @@ -202,7 +202,7 @@ }; sdhi1: sdhi@ee120000 { - compatible = "renesas,r8a73a4-sdhi"; + compatible = "renesas,sdhi-r8a73a4"; reg = <0 0xee120000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 166 4>; @@ -211,7 +211,7 @@ }; sdhi2: sdhi@ee140000 { - compatible = "renesas,r8a73a4-sdhi"; + compatible = "renesas,sdhi-r8a73a4"; reg = <0 0xee140000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 167 4>; diff --git a/arch/arm/boot/dts/r8a7778.dtsi b/arch/arm/boot/dts/r8a7778.dtsi index 45ac404ab6d8..3577aba82583 100644 --- a/arch/arm/boot/dts/r8a7778.dtsi +++ b/arch/arm/boot/dts/r8a7778.dtsi @@ -96,6 +96,5 @@ pfc: pfc@fffc0000 { compatible = "renesas,pfc-r8a7778"; reg = <0xfffc000 0x118>; - #gpio-range-cells = <3>; }; }; diff --git a/arch/arm/boot/dts/r8a7779.dtsi b/arch/arm/boot/dts/r8a7779.dtsi index 23a62447359c..ebbe507fcbfa 100644 --- a/arch/arm/boot/dts/r8a7779.dtsi +++ b/arch/arm/boot/dts/r8a7779.dtsi @@ -188,7 +188,6 @@ pfc: pfc@fffc0000 { compatible = "renesas,pfc-r8a7779"; reg = <0xfffc0000 0x23c>; - #gpio-range-cells = <3>; }; thermal@ffc48000 { diff --git a/arch/arm/boot/dts/r8a7790.dtsi b/arch/arm/boot/dts/r8a7790.dtsi index 3b879e7c697c..413b4c29e782 100644 --- a/arch/arm/boot/dts/r8a7790.dtsi +++ b/arch/arm/boot/dts/r8a7790.dtsi @@ -148,11 +148,10 @@ pfc: pfc@e6060000 { compatible = "renesas,pfc-r8a7790"; reg = <0 0xe6060000 0 0x250>; - #gpio-range-cells = <3>; }; sdhi0: sdhi@ee100000 { - compatible = "renesas,r8a7790-sdhi"; + compatible = "renesas,sdhi-r8a7790"; reg = <0 0xee100000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 165 4>; @@ -161,7 +160,7 @@ }; sdhi1: sdhi@ee120000 { - compatible = "renesas,r8a7790-sdhi"; + compatible = "renesas,sdhi-r8a7790"; reg = <0 0xee120000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 166 4>; @@ -170,7 +169,7 @@ }; sdhi2: sdhi@ee140000 { - compatible = "renesas,r8a7790-sdhi"; + compatible = "renesas,sdhi-r8a7790"; reg = <0 0xee140000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 167 4>; @@ -179,7 +178,7 @@ }; sdhi3: sdhi@ee160000 { - compatible = "renesas,r8a7790-sdhi"; + compatible = "renesas,sdhi-r8a7790"; reg = <0 0xee160000 0 0x100>; interrupt-parent = <&gic>; interrupts = <0 168 4>; diff --git a/arch/arm/boot/dts/sh73a0.dtsi b/arch/arm/boot/dts/sh73a0.dtsi index ba59a5875a10..3955c7606a6f 100644 --- a/arch/arm/boot/dts/sh73a0.dtsi +++ b/arch/arm/boot/dts/sh73a0.dtsi @@ -196,7 +196,7 @@ }; sdhi0: sdhi@ee100000 { - compatible = "renesas,r8a7740-sdhi"; + compatible = "renesas,sdhi-r8a7740"; reg = <0xee100000 0x100>; interrupt-parent = <&gic>; interrupts = <0 83 4 @@ -208,7 +208,7 @@ /* SDHI1 and SDHI2 have no CD pins, no need for CD IRQ */ sdhi1: sdhi@ee120000 { - compatible = "renesas,r8a7740-sdhi"; + compatible = "renesas,sdhi-r8a7740"; reg = <0xee120000 0x100>; interrupt-parent = <&gic>; interrupts = <0 88 4 @@ -219,7 +219,7 @@ }; sdhi2: sdhi@ee140000 { - compatible = "renesas,r8a7740-sdhi"; + compatible = "renesas,sdhi-r8a7740"; reg = <0xee140000 0x100>; interrupt-parent = <&gic>; interrupts = <0 104 4 diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi index e32b92b949d2..e7f73b2e4550 100644 --- a/arch/arm/boot/dts/zynq-7000.dtsi +++ b/arch/arm/boot/dts/zynq-7000.dtsi @@ -92,6 +92,14 @@ }; }; + global_timer: timer@f8f00200 { + compatible = "arm,cortex-a9-global-timer"; + reg = <0xf8f00200 0x20>; + interrupts = <1 11 0x301>; + interrupt-parent = <&intc>; + clocks = <&clkc 4>; + }; + ttc0: ttc0@f8001000 { interrupt-parent = <&intc>; interrupts = < 0 10 4 0 11 4 0 12 4 >; diff --git a/arch/arm/boot/install.sh b/arch/arm/boot/install.sh index 06ea7d42ce8e..2a45092a40e3 100644 --- a/arch/arm/boot/install.sh +++ b/arch/arm/boot/install.sh @@ -20,6 +20,20 @@ # $4 - default install path (blank if root directory) # +verify () { + if [ ! -f "$1" ]; then + echo "" 1>&2 + echo " *** Missing file: $1" 1>&2 + echo ' *** You need to run "make" before "make install".' 1>&2 + echo "" 1>&2 + exit 1 + fi +} + +# Make sure the files actually exist +verify "$2" +verify "$3" + # User may have a custom install script if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 117f955a2a06..8e1a0245907f 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -269,6 +269,11 @@ static const struct edmacc_param dummy_paramset = { .ccnt = 1, }; +static const struct of_device_id edma_of_ids[] = { + { .compatible = "ti,edma3", }, + {} +}; + /*****************************************************************************/ static void map_dmach_queue(unsigned ctlr, unsigned ch_no, @@ -560,14 +565,38 @@ static int reserve_contiguous_slots(int ctlr, unsigned int id, static int prepare_unused_channel_list(struct device *dev, void *data) { struct platform_device *pdev = to_platform_device(dev); - int i, ctlr; + int i, count, ctlr; + struct of_phandle_args dma_spec; + if (dev->of_node) { + count = of_property_count_strings(dev->of_node, "dma-names"); + if (count < 0) + return 0; + for (i = 0; i < count; i++) { + if (of_parse_phandle_with_args(dev->of_node, "dmas", + "#dma-cells", i, + &dma_spec)) + continue; + + if (!of_match_node(edma_of_ids, dma_spec.np)) { + of_node_put(dma_spec.np); + continue; + } + + clear_bit(EDMA_CHAN_SLOT(dma_spec.args[0]), + edma_cc[0]->edma_unused); + of_node_put(dma_spec.np); + } + return 0; + } + + /* For non-OF case */ for (i = 0; i < pdev->num_resources; i++) { if ((pdev->resource[i].flags & IORESOURCE_DMA) && (int)pdev->resource[i].start >= 0) { ctlr = EDMA_CTLR(pdev->resource[i].start); clear_bit(EDMA_CHAN_SLOT(pdev->resource[i].start), - edma_cc[ctlr]->edma_unused); + edma_cc[ctlr]->edma_unused); } } @@ -1762,11 +1791,6 @@ static int edma_probe(struct platform_device *pdev) return 0; } -static const struct of_device_id edma_of_ids[] = { - { .compatible = "ti,edma3", }, - {} -}; - static struct platform_driver edma_driver = { .driver = { .name = "edma", diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c index 370236dd1a03..990250965f2c 100644 --- a/arch/arm/common/mcpm_entry.c +++ b/arch/arm/common/mcpm_entry.c @@ -51,7 +51,8 @@ void mcpm_cpu_power_down(void) { phys_reset_t phys_reset; - BUG_ON(!platform_ops); + if (WARN_ON_ONCE(!platform_ops || !platform_ops->power_down)) + return; BUG_ON(!irqs_disabled()); /* @@ -93,7 +94,8 @@ void mcpm_cpu_suspend(u64 expected_residency) { phys_reset_t phys_reset; - BUG_ON(!platform_ops); + if (WARN_ON_ONCE(!platform_ops || !platform_ops->suspend)) + return; BUG_ON(!irqs_disabled()); /* Very similar to mcpm_cpu_power_down() */ diff --git a/arch/arm/common/sharpsl_param.c b/arch/arm/common/sharpsl_param.c index d56c932580eb..025f6ce38596 100644 --- a/arch/arm/common/sharpsl_param.c +++ b/arch/arm/common/sharpsl_param.c @@ -15,6 +15,7 @@ #include #include #include +#include /* * Certain hardware parameters determined at the time of device manufacture, @@ -25,8 +26,10 @@ */ #ifdef CONFIG_ARCH_SA1100 #define PARAM_BASE 0xe8ffc000 +#define param_start(x) (void *)(x) #else #define PARAM_BASE 0xa0000a00 +#define param_start(x) __va(x) #endif #define MAGIC_CHG(a,b,c,d) ( ( d << 24 ) | ( c << 16 ) | ( b << 8 ) | a ) @@ -41,7 +44,7 @@ EXPORT_SYMBOL(sharpsl_param); void sharpsl_save_param(void) { - memcpy(&sharpsl_param, (void *)PARAM_BASE, sizeof(struct sharpsl_param_info)); + memcpy(&sharpsl_param, param_start(PARAM_BASE), sizeof(struct sharpsl_param_info)); if (sharpsl_param.comadj_keyword != COMADJ_MAGIC) sharpsl_param.comadj=-1; diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index f3935b46df29..119fc378fc52 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -135,6 +135,7 @@ CONFIG_MMC=y CONFIG_MMC_ARMMMCI=y CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PLTFM=y +CONFIG_MMC_SDHCI_ESDHC_IMX=y CONFIG_MMC_SDHCI_TEGRA=y CONFIG_MMC_SDHCI_SPEAR=y CONFIG_MMC_OMAP=y diff --git a/arch/arm/crypto/aes-armv4.S b/arch/arm/crypto/aes-armv4.S index 19d6cd6f29f9..3a14ea8fe97e 100644 --- a/arch/arm/crypto/aes-armv4.S +++ b/arch/arm/crypto/aes-armv4.S @@ -148,7 +148,7 @@ AES_Te: @ const AES_KEY *key) { .align 5 ENTRY(AES_encrypt) - sub r3,pc,#8 @ AES_encrypt + adr r3,AES_encrypt stmdb sp!,{r1,r4-r12,lr} mov r12,r0 @ inp mov r11,r2 @@ -381,7 +381,7 @@ _armv4_AES_encrypt: .align 5 ENTRY(private_AES_set_encrypt_key) _armv4_AES_set_encrypt_key: - sub r3,pc,#8 @ AES_set_encrypt_key + adr r3,_armv4_AES_set_encrypt_key teq r0,#0 moveq r0,#-1 beq .Labrt @@ -843,7 +843,7 @@ AES_Td: @ const AES_KEY *key) { .align 5 ENTRY(AES_decrypt) - sub r3,pc,#8 @ AES_decrypt + adr r3,AES_decrypt stmdb sp!,{r1,r4-r12,lr} mov r12,r0 @ inp mov r11,r2 diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index d3db39860b9c..1a7024b41351 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -31,5 +31,5 @@ generic-y += termbits.h generic-y += termios.h generic-y += timex.h generic-y += trace_clock.h -generic-y += types.h generic-y += unaligned.h +generic-y += preempt.h diff --git a/arch/arm/include/asm/arch_timer.h b/arch/arm/include/asm/arch_timer.h index 5665134bfa3e..0704e0cf5571 100644 --- a/arch/arm/include/asm/arch_timer.h +++ b/arch/arm/include/asm/arch_timer.h @@ -87,17 +87,43 @@ static inline u64 arch_counter_get_cntvct(void) return cval; } -static inline void arch_counter_set_user_access(void) +static inline u32 arch_timer_get_cntkctl(void) { u32 cntkctl; - asm volatile("mrc p15, 0, %0, c14, c1, 0" : "=r" (cntkctl)); + return cntkctl; +} - /* disable user access to everything */ - cntkctl &= ~((3 << 8) | (7 << 0)); - +static inline void arch_timer_set_cntkctl(u32 cntkctl) +{ asm volatile("mcr p15, 0, %0, c14, c1, 0" : : "r" (cntkctl)); } + +static inline void arch_counter_set_user_access(void) +{ + u32 cntkctl = arch_timer_get_cntkctl(); + + /* Disable user access to both physical/virtual counters/timers */ + /* Also disable virtual event stream */ + cntkctl &= ~(ARCH_TIMER_USR_PT_ACCESS_EN + | ARCH_TIMER_USR_VT_ACCESS_EN + | ARCH_TIMER_VIRT_EVT_EN + | ARCH_TIMER_USR_VCT_ACCESS_EN + | ARCH_TIMER_USR_PCT_ACCESS_EN); + arch_timer_set_cntkctl(cntkctl); +} + +static inline void arch_timer_evtstrm_enable(int divider) +{ + u32 cntkctl = arch_timer_get_cntkctl(); + cntkctl &= ~ARCH_TIMER_EVT_TRIGGER_MASK; + /* Set the divider and enable virtual event stream */ + cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT) + | ARCH_TIMER_VIRT_EVT_EN; + arch_timer_set_cntkctl(cntkctl); + elf_hwcap |= HWCAP_EVTSTRM; +} + #endif #endif diff --git a/arch/arm/include/asm/jump_label.h b/arch/arm/include/asm/jump_label.h index bfc198c75913..863c892b4aaa 100644 --- a/arch/arm/include/asm/jump_label.h +++ b/arch/arm/include/asm/jump_label.h @@ -16,7 +16,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("1:\n\t" + asm_volatile_goto("1:\n\t" JUMP_LABEL_NOP "\n\t" ".pushsection __jump_table, \"aw\"\n\t" ".word 1b, %l[l_yes], %c0\n\t" diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h index 0f7b7620e9a5..fc82a88f5b69 100644 --- a/arch/arm/include/asm/mcpm.h +++ b/arch/arm/include/asm/mcpm.h @@ -76,8 +76,11 @@ int mcpm_cpu_power_up(unsigned int cpu, unsigned int cluster); * * This must be called with interrupts disabled. * - * This does not return. Re-entry in the kernel is expected via - * mcpm_entry_point. + * On success this does not return. Re-entry in the kernel is expected + * via mcpm_entry_point. + * + * This will return if mcpm_platform_register() has not been called + * previously in which case the caller should take appropriate action. */ void mcpm_cpu_power_down(void); @@ -98,8 +101,11 @@ void mcpm_cpu_power_down(void); * * This must be called with interrupts disabled. * - * This does not return. Re-entry in the kernel is expected via - * mcpm_entry_point. + * On success this does not return. Re-entry in the kernel is expected + * via mcpm_entry_point. + * + * This will return if mcpm_platform_register() has not been called + * previously in which case the caller should take appropriate action. */ void mcpm_cpu_suspend(u64 expected_residency); diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h index f1d96d4e8092..73ddd7239b33 100644 --- a/arch/arm/include/asm/syscall.h +++ b/arch/arm/include/asm/syscall.h @@ -57,6 +57,9 @@ static inline void syscall_get_arguments(struct task_struct *task, unsigned int i, unsigned int n, unsigned long *args) { + if (n == 0) + return; + if (i + n > SYSCALL_MAX_ARGS) { unsigned long *args_bad = args + SYSCALL_MAX_ARGS - i; unsigned int n_bad = n + i - SYSCALL_MAX_ARGS; @@ -81,6 +84,9 @@ static inline void syscall_set_arguments(struct task_struct *task, unsigned int i, unsigned int n, const unsigned long *args) { + if (n == 0) + return; + if (i + n > SYSCALL_MAX_ARGS) { pr_warning("%s called with max args %d, handling only %d\n", __func__, i + n, SYSCALL_MAX_ARGS); diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 7e1f76027f66..72abdc541f38 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -19,6 +19,13 @@ #include #include +#if __LINUX_ARM_ARCH__ < 6 +#include +#else +#define __get_user_unaligned __get_user +#define __put_user_unaligned __put_user +#endif + #define VERIFY_READ 0 #define VERIFY_WRITE 1 diff --git a/arch/arm/include/uapi/asm/hwcap.h b/arch/arm/include/uapi/asm/hwcap.h index 6d34d080372a..7dcc10d67253 100644 --- a/arch/arm/include/uapi/asm/hwcap.h +++ b/arch/arm/include/uapi/asm/hwcap.h @@ -26,5 +26,6 @@ #define HWCAP_VFPD32 (1 << 19) /* set if VFP has 32 regs (not 16) */ #define HWCAP_IDIV (HWCAP_IDIVA | HWCAP_IDIVT) #define HWCAP_LPAE (1 << 20) +#define HWCAP_EVTSTRM (1 << 21) #endif /* _UAPI__ASMARM_HWCAP_H */ diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c index 221f07b11ccb..1791f12c180b 100644 --- a/arch/arm/kernel/arch_timer.c +++ b/arch/arm/kernel/arch_timer.c @@ -11,7 +11,6 @@ #include #include #include -#include #include @@ -22,13 +21,6 @@ static unsigned long arch_timer_read_counter_long(void) return arch_timer_read_counter(); } -static u32 sched_clock_mult __read_mostly; - -static unsigned long long notrace arch_timer_sched_clock(void) -{ - return arch_timer_read_counter() * sched_clock_mult; -} - static struct delay_timer arch_delay_timer; static void __init arch_timer_delay_timer_register(void) @@ -48,11 +40,5 @@ int __init arch_timer_arch_init(void) arch_timer_delay_timer_register(); - /* Cache the sched_clock multiplier to save a divide in the hot path. */ - sched_clock_mult = NSEC_PER_SEC / arch_timer_rate; - sched_clock_func = arch_timer_sched_clock; - pr_info("sched_clock: ARM arch timer >56 bits at %ukHz, resolution %uns\n", - arch_timer_rate / 1000, sched_clock_mult); - return 0; } diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 74ad15d1a065..bc6bd9683ba4 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -442,10 +442,10 @@ local_restart: ldrcc pc, [tbl, scno, lsl #2] @ call sys_* routine add r1, sp, #S_OFF - cmp scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE) +2: cmp scno, #(__ARM_NR_BASE - __NR_SYSCALL_BASE) eor r0, scno, #__NR_SYSCALL_BASE @ put OS number back bcs arm_syscall -2: mov why, #0 @ no longer a real syscall + mov why, #0 @ no longer a real syscall b sys_ni_syscall @ not private func #if defined(CONFIG_OABI_COMPAT) || !defined(CONFIG_AEABI) diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index de23a9beed13..39f89fbd5111 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -329,10 +329,10 @@ #ifdef CONFIG_CONTEXT_TRACKING .if \save stmdb sp!, {r0-r3, ip, lr} - bl user_exit + bl context_tracking_user_exit ldmia sp!, {r0-r3, ip, lr} .else - bl user_exit + bl context_tracking_user_exit .endif #endif .endm @@ -341,10 +341,10 @@ #ifdef CONFIG_CONTEXT_TRACKING .if \save stmdb sp!, {r0-r3, ip, lr} - bl user_enter + bl context_tracking_user_enter ldmia sp!, {r0-r3, ip, lr} .else - bl user_enter + bl context_tracking_user_enter .endif #endif .endm diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 2c7cc1e03473..476de57dcef2 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -487,7 +487,26 @@ __fixup_smp: mrc p15, 0, r0, c0, c0, 5 @ read MPIDR and r0, r0, #0xc0000000 @ multiprocessing extensions and teq r0, #0x80000000 @ not part of a uniprocessor system? - moveq pc, lr @ yes, assume SMP + bne __fixup_smp_on_up @ no, assume UP + + @ Core indicates it is SMP. Check for Aegis SOC where a single + @ Cortex-A9 CPU is present but SMP operations fault. + mov r4, #0x41000000 + orr r4, r4, #0x0000c000 + orr r4, r4, #0x00000090 + teq r3, r4 @ Check for ARM Cortex-A9 + movne pc, lr @ Not ARM Cortex-A9, + + @ If a future SoC *does* use 0x0 as the PERIPH_BASE, then the + @ below address check will need to be #ifdef'd or equivalent + @ for the Aegis platform. + mrc p15, 4, r0, c15, c0 @ get SCU base address + teq r0, #0x0 @ '0' on actual UP A9 hardware + beq __fixup_smp_on_up @ So its an A9 UP + ldr r0, [r0, #4] @ read SCU Config + and r0, r0, #0x3 @ number of CPUs + teq r0, #0x0 @ is 1? + movne pc, lr __fixup_smp_on_up: adr r0, 1f diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 0e1e2b3afa45..5d65438685d8 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -975,6 +975,7 @@ static const char *hwcap_str[] = { "idivt", "vfpd32", "lpae", + "evtstrm", NULL }; diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c index 71e08baee209..c02ba4af599f 100644 --- a/arch/arm/kvm/reset.c +++ b/arch/arm/kvm/reset.c @@ -58,14 +58,14 @@ static const struct kvm_irq_level a15_vtimer_irq = { */ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) { - struct kvm_regs *cpu_reset; + struct kvm_regs *reset_regs; const struct kvm_irq_level *cpu_vtimer_irq; switch (vcpu->arch.target) { case KVM_ARM_TARGET_CORTEX_A15: if (vcpu->vcpu_id > a15_max_cpu_idx) return -EINVAL; - cpu_reset = &a15_regs_reset; + reset_regs = &a15_regs_reset; vcpu->arch.midr = read_cpuid_id(); cpu_vtimer_irq = &a15_vtimer_irq; break; @@ -74,7 +74,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) } /* Reset core registers */ - memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs)); + memcpy(&vcpu->arch.regs, reset_regs, sizeof(vcpu->arch.regs)); /* Reset CP15 registers */ kvm_reset_coprocs(vcpu); diff --git a/arch/arm/mach-at91/at91rm9200_time.c b/arch/arm/mach-at91/at91rm9200_time.c index 180b3024bec3..f607deb40f4d 100644 --- a/arch/arm/mach-at91/at91rm9200_time.c +++ b/arch/arm/mach-at91/at91rm9200_time.c @@ -93,7 +93,7 @@ static irqreturn_t at91rm9200_timer_interrupt(int irq, void *dev_id) static struct irqaction at91rm9200_timer_irq = { .name = "at91_tick", - .flags = IRQF_SHARED | IRQF_DISABLED | IRQF_TIMER | IRQF_IRQPOLL, + .flags = IRQF_SHARED | IRQF_TIMER | IRQF_IRQPOLL, .handler = at91rm9200_timer_interrupt, .irq = NR_IRQS_LEGACY + AT91_ID_SYS, }; diff --git a/arch/arm/mach-at91/at91sam926x_time.c b/arch/arm/mach-at91/at91sam926x_time.c index 3a4bc2e1a65e..bb392320a0dd 100644 --- a/arch/arm/mach-at91/at91sam926x_time.c +++ b/arch/arm/mach-at91/at91sam926x_time.c @@ -171,7 +171,7 @@ static irqreturn_t at91sam926x_pit_interrupt(int irq, void *dev_id) static struct irqaction at91sam926x_pit_irq = { .name = "at91_tick", - .flags = IRQF_SHARED | IRQF_DISABLED | IRQF_TIMER | IRQF_IRQPOLL, + .flags = IRQF_SHARED | IRQF_TIMER | IRQF_IRQPOLL, .handler = at91sam926x_pit_interrupt, .irq = NR_IRQS_LEGACY + AT91_ID_SYS, }; diff --git a/arch/arm/mach-at91/at91sam9g45_reset.S b/arch/arm/mach-at91/at91sam9g45_reset.S index 721a1a34dd1d..c40c1e2ef80f 100644 --- a/arch/arm/mach-at91/at91sam9g45_reset.S +++ b/arch/arm/mach-at91/at91sam9g45_reset.S @@ -16,11 +16,17 @@ #include "at91_rstc.h" .arm +/* + * at91_ramc_base is an array void* + * init at NULL if only one DDR controler is present in or DT + */ .globl at91sam9g45_restart at91sam9g45_restart: ldr r5, =at91_ramc_base @ preload constants ldr r0, [r5] + ldr r5, [r5, #4] @ ddr1 + cmp r5, #0 ldr r4, =at91_rstc_base ldr r1, [r4] @@ -30,6 +36,8 @@ at91sam9g45_restart: .balign 32 @ align to cache line + strne r2, [r5, #AT91_DDRSDRC_RTR] @ disable DDR1 access + strne r3, [r5, #AT91_DDRSDRC_LPR] @ power down DDR1 str r2, [r0, #AT91_DDRSDRC_RTR] @ disable DDR0 access str r3, [r0, #AT91_DDRSDRC_LPR] @ power down DDR0 str r4, [r1, #AT91_RSTC_CR] @ reset processor diff --git a/arch/arm/mach-at91/at91x40_time.c b/arch/arm/mach-at91/at91x40_time.c index 2919eba41ff4..c0e637adf65d 100644 --- a/arch/arm/mach-at91/at91x40_time.c +++ b/arch/arm/mach-at91/at91x40_time.c @@ -57,7 +57,7 @@ static irqreturn_t at91x40_timer_interrupt(int irq, void *dev_id) static struct irqaction at91x40_timer_irq = { .name = "at91_tick", - .flags = IRQF_DISABLED | IRQF_TIMER, + .flags = IRQF_TIMER, .handler = at91x40_timer_interrupt }; diff --git a/arch/arm/mach-davinci/board-dm365-evm.c b/arch/arm/mach-davinci/board-dm365-evm.c index 92b7f770615a..4078ba93776b 100644 --- a/arch/arm/mach-davinci/board-dm365-evm.c +++ b/arch/arm/mach-davinci/board-dm365-evm.c @@ -176,7 +176,7 @@ static struct at24_platform_data eeprom_info = { .context = (void *)0x7f00, }; -static struct snd_platform_data dm365_evm_snd_data = { +static struct snd_platform_data dm365_evm_snd_data __maybe_unused = { .asp_chan_q = EVENTQ_3, }; diff --git a/arch/arm/mach-davinci/include/mach/serial.h b/arch/arm/mach-davinci/include/mach/serial.h index 52b8571b2e70..ce402cd21fa0 100644 --- a/arch/arm/mach-davinci/include/mach/serial.h +++ b/arch/arm/mach-davinci/include/mach/serial.h @@ -15,8 +15,6 @@ #include -#include - #define DAVINCI_UART0_BASE (IO_PHYS + 0x20000) #define DAVINCI_UART1_BASE (IO_PHYS + 0x20400) #define DAVINCI_UART2_BASE (IO_PHYS + 0x20800) @@ -39,6 +37,8 @@ #define UART_DM646X_SCR_TX_WATERMARK 0x08 #ifndef __ASSEMBLY__ +#include + extern int davinci_serial_init(struct platform_device *); #endif diff --git a/arch/arm/mach-integrator/pci_v3.h b/arch/arm/mach-integrator/pci_v3.h index 755fd29fed4a..06a9e2e7d007 100644 --- a/arch/arm/mach-integrator/pci_v3.h +++ b/arch/arm/mach-integrator/pci_v3.h @@ -1,2 +1,9 @@ /* Simple oneliner include to the PCIv3 early init */ +#ifdef CONFIG_PCI extern int pci_v3_early_init(void); +#else +static inline int pci_v3_early_init(void) +{ + return 0; +} +#endif diff --git a/arch/arm/mach-msm/timer.c b/arch/arm/mach-msm/timer.c index 696fb73296d0..1e9c3383daba 100644 --- a/arch/arm/mach-msm/timer.c +++ b/arch/arm/mach-msm/timer.c @@ -274,7 +274,6 @@ static void __init msm_dt_timer_init(struct device_node *np) pr_err("Unknown frequency\n"); return; } - of_node_put(np); event_base = base + 0x4; sts_base = base + 0x88; diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c index 4c24303ec481..58adf2fd9cfc 100644 --- a/arch/arm/mach-mvebu/coherency.c +++ b/arch/arm/mach-mvebu/coherency.c @@ -140,6 +140,7 @@ int __init coherency_init(void) coherency_base = of_iomap(np, 0); coherency_cpu_base = of_iomap(np, 1); set_cpu_coherent(cpu_logical_map(smp_processor_id()), 0); + of_node_put(np); } return 0; @@ -147,9 +148,14 @@ int __init coherency_init(void) static int __init coherency_late_init(void) { - if (of_find_matching_node(NULL, of_coherency_table)) + struct device_node *np; + + np = of_find_matching_node(NULL, of_coherency_table); + if (np) { bus_register_notifier(&platform_bus_type, &mvebu_hwcc_platform_nb); + of_node_put(np); + } return 0; } diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c index 3cc4bef6401c..27fc4f049474 100644 --- a/arch/arm/mach-mvebu/pmsu.c +++ b/arch/arm/mach-mvebu/pmsu.c @@ -67,6 +67,7 @@ int __init armada_370_xp_pmsu_init(void) pr_info("Initializing Power Management Service Unit\n"); pmsu_mp_base = of_iomap(np, 0); pmsu_reset_base = of_iomap(np, 1); + of_node_put(np); } return 0; diff --git a/arch/arm/mach-mvebu/system-controller.c b/arch/arm/mach-mvebu/system-controller.c index f875124ff4f9..5175083cdb34 100644 --- a/arch/arm/mach-mvebu/system-controller.c +++ b/arch/arm/mach-mvebu/system-controller.c @@ -98,6 +98,7 @@ static int __init mvebu_system_controller_init(void) BUG_ON(!match); system_controller_base = of_iomap(np, 0); mvebu_sc = (struct mvebu_system_controller *)match->data; + of_node_put(np); } return 0; diff --git a/arch/arm/mach-omap2/board-generic.c b/arch/arm/mach-omap2/board-generic.c index 39c78387ddec..87162e1b94a5 100644 --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -129,6 +129,24 @@ DT_MACHINE_START(OMAP3_DT, "Generic OMAP3 (Flattened Device Tree)") .restart = omap3xxx_restart, MACHINE_END +static const char *omap36xx_boards_compat[] __initdata = { + "ti,omap36xx", + NULL, +}; + +DT_MACHINE_START(OMAP36XX_DT, "Generic OMAP36xx (Flattened Device Tree)") + .reserve = omap_reserve, + .map_io = omap3_map_io, + .init_early = omap3630_init_early, + .init_irq = omap_intc_of_init, + .handle_irq = omap3_intc_handle_irq, + .init_machine = omap_generic_init, + .init_late = omap3_init_late, + .init_time = omap3_sync32k_timer_init, + .dt_compat = omap36xx_boards_compat, + .restart = omap3xxx_restart, +MACHINE_END + static const char *omap3_gp_boards_compat[] __initdata = { "ti,omap3-beagle", "timll,omap3-devkit8000", diff --git a/arch/arm/mach-omap2/board-rx51-peripherals.c b/arch/arm/mach-omap2/board-rx51-peripherals.c index c3270c0f1fce..f6fe388af989 100644 --- a/arch/arm/mach-omap2/board-rx51-peripherals.c +++ b/arch/arm/mach-omap2/board-rx51-peripherals.c @@ -167,38 +167,47 @@ static struct lp55xx_led_config rx51_lp5523_led_config[] = { .name = "lp5523:kb1", .chan_nr = 0, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:kb2", .chan_nr = 1, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:kb3", .chan_nr = 2, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:kb4", .chan_nr = 3, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:b", .chan_nr = 4, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:g", .chan_nr = 5, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:r", .chan_nr = 6, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:kb5", .chan_nr = 7, .led_current = 50, + .max_current = 100, }, { .name = "lp5523:kb6", .chan_nr = 8, .led_current = 50, + .max_current = 100, } }; diff --git a/arch/arm/mach-omap2/gpmc-onenand.c b/arch/arm/mach-omap2/gpmc-onenand.c index 64b5a8346982..8b6876c98ce1 100644 --- a/arch/arm/mach-omap2/gpmc-onenand.c +++ b/arch/arm/mach-omap2/gpmc-onenand.c @@ -272,9 +272,19 @@ static int omap2_onenand_setup_async(void __iomem *onenand_base) struct gpmc_timings t; int ret; - if (gpmc_onenand_data->of_node) + if (gpmc_onenand_data->of_node) { gpmc_read_settings_dt(gpmc_onenand_data->of_node, &onenand_async); + if (onenand_async.sync_read || onenand_async.sync_write) { + if (onenand_async.sync_write) + gpmc_onenand_data->flags |= + ONENAND_SYNC_READWRITE; + else + gpmc_onenand_data->flags |= ONENAND_SYNC_READ; + onenand_async.sync_read = false; + onenand_async.sync_write = false; + } + } omap2_onenand_set_async_mode(onenand_base); diff --git a/arch/arm/mach-omap2/mux.h b/arch/arm/mach-omap2/mux.h index 5d2080ef7923..16f78a990d04 100644 --- a/arch/arm/mach-omap2/mux.h +++ b/arch/arm/mach-omap2/mux.h @@ -28,7 +28,7 @@ #define OMAP_PULL_UP (1 << 4) #define OMAP_ALTELECTRICALSEL (1 << 5) -/* 34xx specific mux bit defines */ +/* omap3/4/5 specific mux bit defines */ #define OMAP_INPUT_EN (1 << 8) #define OMAP_OFF_EN (1 << 9) #define OMAP_OFFOUT_EN (1 << 10) @@ -36,8 +36,6 @@ #define OMAP_OFF_PULL_EN (1 << 12) #define OMAP_OFF_PULL_UP (1 << 13) #define OMAP_WAKEUP_EN (1 << 14) - -/* 44xx specific mux bit defines */ #define OMAP_WAKEUP_EVENT (1 << 15) /* Active pin states */ diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c index fa74a0625da1..ead48fa5715e 100644 --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -628,7 +628,7 @@ void __init omap4_local_timer_init(void) #endif /* CONFIG_HAVE_ARM_TWD */ #endif /* CONFIG_ARCH_OMAP4 */ -#ifdef CONFIG_SOC_OMAP5 +#if defined(CONFIG_SOC_OMAP5) || defined(CONFIG_SOC_DRA7XX) void __init omap5_realtime_timer_init(void) { omap4_sync32k_timer_init(); @@ -636,7 +636,7 @@ void __init omap5_realtime_timer_init(void) clocksource_of_init(); } -#endif /* CONFIG_SOC_OMAP5 */ +#endif /* CONFIG_SOC_OMAP5 || CONFIG_SOC_DRA7XX */ /** * omap_timer_init - build and register timer device with an diff --git a/arch/arm/mach-shmobile/board-armadillo800eva.c b/arch/arm/mach-shmobile/board-armadillo800eva.c index 5bd1479d3deb..7f8f6076d360 100644 --- a/arch/arm/mach-shmobile/board-armadillo800eva.c +++ b/arch/arm/mach-shmobile/board-armadillo800eva.c @@ -1108,9 +1108,9 @@ static const struct pinctrl_map eva_pinctrl_map[] = { PIN_MAP_MUX_GROUP_DEFAULT("asoc-simple-card.1", "pfc-r8a7740", "fsib_mclk_in", "fsib"), /* GETHER */ - PIN_MAP_MUX_GROUP_DEFAULT("sh-eth", "pfc-r8a7740", + PIN_MAP_MUX_GROUP_DEFAULT("r8a7740-gether", "pfc-r8a7740", "gether_mii", "gether"), - PIN_MAP_MUX_GROUP_DEFAULT("sh-eth", "pfc-r8a7740", + PIN_MAP_MUX_GROUP_DEFAULT("r8a7740-gether", "pfc-r8a7740", "gether_int", "gether"), /* HDMI */ PIN_MAP_MUX_GROUP_DEFAULT("sh-mobile-hdmi", "pfc-r8a7740", diff --git a/arch/arm/mach-shmobile/board-lager.c b/arch/arm/mach-shmobile/board-lager.c index ffb6f0ac7606..5930af8d434f 100644 --- a/arch/arm/mach-shmobile/board-lager.c +++ b/arch/arm/mach-shmobile/board-lager.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -155,6 +156,30 @@ static void __init lager_add_standard_devices(void) ðer_pdata, sizeof(ether_pdata)); } +/* + * Ether LEDs on the Lager board are named LINK and ACTIVE which corresponds + * to non-default 01 setting of the Micrel KSZ8041 PHY control register 1 bits + * 14-15. We have to set them back to 01 from the default 00 value each time + * the PHY is reset. It's also important because the PHY's LED0 signal is + * connected to SoC's ETH_LINK signal and in the PHY's default mode it will + * bounce on and off after each packet, which we apparently want to avoid. + */ +static int lager_ksz8041_fixup(struct phy_device *phydev) +{ + u16 phyctrl1 = phy_read(phydev, 0x1e); + + phyctrl1 &= ~0xc000; + phyctrl1 |= 0x4000; + return phy_write(phydev, 0x1e, phyctrl1); +} + +static void __init lager_init(void) +{ + lager_add_standard_devices(); + + phy_register_fixup_for_id("r8a7790-ether-ff:01", lager_ksz8041_fixup); +} + static const char *lager_boards_compat_dt[] __initdata = { "renesas,lager", NULL, @@ -163,6 +188,6 @@ static const char *lager_boards_compat_dt[] __initdata = { DT_MACHINE_START(LAGER_DT, "lager") .init_early = r8a7790_init_delay, .init_time = r8a7790_timer_init, - .init_machine = lager_add_standard_devices, + .init_machine = lager_init, .dt_compat = lager_boards_compat_dt, MACHINE_END diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c index 7aeb5d60e484..e6eb48192912 100644 --- a/arch/arm/mach-vexpress/tc2_pm.c +++ b/arch/arm/mach-vexpress/tc2_pm.c @@ -131,6 +131,16 @@ static void tc2_pm_down(u64 residency) } else BUG(); + /* + * If the CPU is committed to power down, make sure + * the power controller will be in charge of waking it + * up upon IRQ, ie IRQ lines are cut from GIC CPU IF + * to the CPU by disabling the GIC CPU IF to prevent wfi + * from completing execution behind power controller back + */ + if (!skip_wfi) + gic_cpu_if_down(); + if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) { arch_spin_unlock(&tc2_pm_lock); @@ -231,7 +241,6 @@ static void tc2_pm_suspend(u64 residency) cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); ve_spc_set_resume_addr(cluster, cpu, virt_to_phys(mcpm_entry_point)); - gic_cpu_if_down(); tc2_pm_down(residency); } diff --git a/arch/arm/mach-zynq/Kconfig b/arch/arm/mach-zynq/Kconfig index 04f8a4a6e755..6b04260aa142 100644 --- a/arch/arm/mach-zynq/Kconfig +++ b/arch/arm/mach-zynq/Kconfig @@ -13,5 +13,6 @@ config ARCH_ZYNQ select HAVE_SMP select SPARSE_IRQ select CADENCE_TTC_TIMER + select ARM_GLOBAL_TIMER help Support for Xilinx Zynq ARM Cortex A9 Platform diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index f5e1a8471714..1272ed202dde 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1232,7 +1232,8 @@ __iommu_create_mapping(struct device *dev, struct page **pages, size_t size) break; len = (j - i) << PAGE_SHIFT; - ret = iommu_map(mapping->domain, iova, phys, len, 0); + ret = iommu_map(mapping->domain, iova, phys, len, + IOMMU_READ|IOMMU_WRITE); if (ret < 0) goto fail; iova += len; @@ -1431,6 +1432,27 @@ static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt, GFP_KERNEL); } +static int __dma_direction_to_prot(enum dma_data_direction dir) +{ + int prot; + + switch (dir) { + case DMA_BIDIRECTIONAL: + prot = IOMMU_READ | IOMMU_WRITE; + break; + case DMA_TO_DEVICE: + prot = IOMMU_READ; + break; + case DMA_FROM_DEVICE: + prot = IOMMU_WRITE; + break; + default: + prot = 0; + } + + return prot; +} + /* * Map a part of the scatter-gather list into contiguous io address space */ @@ -1444,6 +1466,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, int ret = 0; unsigned int count; struct scatterlist *s; + int prot; size = PAGE_ALIGN(size); *handle = DMA_ERROR_CODE; @@ -1460,7 +1483,9 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs)) __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); - ret = iommu_map(mapping->domain, iova, phys, len, 0); + prot = __dma_direction_to_prot(dir); + + ret = iommu_map(mapping->domain, iova, phys, len, prot); if (ret < 0) goto fail; count += len >> PAGE_SHIFT; @@ -1665,19 +1690,7 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p if (dma_addr == DMA_ERROR_CODE) return dma_addr; - switch (dir) { - case DMA_BIDIRECTIONAL: - prot = IOMMU_READ | IOMMU_WRITE; - break; - case DMA_TO_DEVICE: - prot = IOMMU_READ; - break; - case DMA_FROM_DEVICE: - prot = IOMMU_WRITE; - break; - default: - prot = 0; - } + prot = __dma_direction_to_prot(dir); ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot); if (ret < 0) diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index febaee7ca57b..18ec4c504abf 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -17,7 +17,6 @@ #include #include #include -#include #include #include #include @@ -379,8 +378,6 @@ void __init arm_memblock_init(struct meminfo *mi, if (mdesc->reserve) mdesc->reserve(); - early_init_dt_scan_reserved_mem(); - /* * reserve memory for DMA contigouos allocations, * must come from DMA area inside low memory diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c04454876bcb..35fd0eb57270 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -14,6 +14,7 @@ config ARM64 select GENERIC_IOMAP select GENERIC_IRQ_PROBE select GENERIC_IRQ_SHOW + select GENERIC_SCHED_CLOCK select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL select HARDIRQS_SW_RESEND diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug index 1a6bfe954d49..835c559786bd 100644 --- a/arch/arm64/Kconfig.debug +++ b/arch/arm64/Kconfig.debug @@ -6,13 +6,6 @@ config FRAME_POINTER bool default y -config DEBUG_STACK_USAGE - bool "Enable stack utilization instrumentation" - depends on DEBUG_KERNEL - help - Enables the display of the minimum amount of free stack which each - task has ever had available in the sysrq-T output. - config EARLY_PRINTK bool "Early printk support" default y diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 5b3e83217b03..31c81e9b792e 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -42,7 +42,7 @@ CONFIG_IP_PNP_BOOTP=y # CONFIG_WIRELESS is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y -# CONFIG_BLK_DEV is not set +CONFIG_BLK_DEV=y CONFIG_SCSI=y # CONFIG_SCSI_PROC_FS is not set CONFIG_BLK_DEV_SD=y @@ -72,6 +72,7 @@ CONFIG_LOGO=y # CONFIG_IOMMU_SUPPORT is not set CONFIG_EXT2_FS=y CONFIG_EXT3_FS=y +CONFIG_EXT4_FS=y # CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set # CONFIG_EXT3_FS_XATTR is not set CONFIG_FUSE_FS=y @@ -90,3 +91,5 @@ CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_INFO=y # CONFIG_FTRACE is not set CONFIG_ATOMIC64_SELFTEST=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_BLK=y diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 79a642d199f2..519f89f5b6a3 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -50,3 +50,4 @@ generic-y += unaligned.h generic-y += user.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h index c9f1d2816c2b..9400596a0f39 100644 --- a/arch/arm64/include/asm/arch_timer.h +++ b/arch/arm64/include/asm/arch_timer.h @@ -92,19 +92,49 @@ static inline u32 arch_timer_get_cntfrq(void) return val; } -static inline void arch_counter_set_user_access(void) +static inline u32 arch_timer_get_cntkctl(void) { u32 cntkctl; - - /* Disable user access to the timers and the physical counter. */ asm volatile("mrs %0, cntkctl_el1" : "=r" (cntkctl)); - cntkctl &= ~((3 << 8) | (1 << 0)); + return cntkctl; +} - /* Enable user access to the virtual counter and frequency. */ - cntkctl |= (1 << 1); +static inline void arch_timer_set_cntkctl(u32 cntkctl) +{ asm volatile("msr cntkctl_el1, %0" : : "r" (cntkctl)); } +static inline void arch_counter_set_user_access(void) +{ + u32 cntkctl = arch_timer_get_cntkctl(); + + /* Disable user access to the timers and the physical counter */ + /* Also disable virtual event stream */ + cntkctl &= ~(ARCH_TIMER_USR_PT_ACCESS_EN + | ARCH_TIMER_USR_VT_ACCESS_EN + | ARCH_TIMER_VIRT_EVT_EN + | ARCH_TIMER_USR_PCT_ACCESS_EN); + + /* Enable user access to the virtual counter */ + cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN; + + arch_timer_set_cntkctl(cntkctl); +} + +static inline void arch_timer_evtstrm_enable(int divider) +{ + u32 cntkctl = arch_timer_get_cntkctl(); + cntkctl &= ~ARCH_TIMER_EVT_TRIGGER_MASK; + /* Set the divider and enable virtual event stream */ + cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT) + | ARCH_TIMER_VIRT_EVT_EN; + arch_timer_set_cntkctl(cntkctl); + elf_hwcap |= HWCAP_EVTSTRM; +#ifdef CONFIG_COMPAT + compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM; +#endif +} + static inline u64 arch_counter_get_cntvct(void) { u64 cval; diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h index e2950b098e76..6cddbb0c9f54 100644 --- a/arch/arm64/include/asm/hwcap.h +++ b/arch/arm64/include/asm/hwcap.h @@ -30,6 +30,7 @@ #define COMPAT_HWCAP_IDIVA (1 << 17) #define COMPAT_HWCAP_IDIVT (1 << 18) #define COMPAT_HWCAP_IDIV (COMPAT_HWCAP_IDIVA|COMPAT_HWCAP_IDIVT) +#define COMPAT_HWCAP_EVTSTRM (1 << 21) #ifndef __ASSEMBLY__ /* @@ -37,11 +38,11 @@ * instruction set this cpu supports. */ #define ELF_HWCAP (elf_hwcap) -#define COMPAT_ELF_HWCAP (COMPAT_HWCAP_HALF|COMPAT_HWCAP_THUMB|\ - COMPAT_HWCAP_FAST_MULT|COMPAT_HWCAP_EDSP|\ - COMPAT_HWCAP_TLS|COMPAT_HWCAP_VFP|\ - COMPAT_HWCAP_VFPv3|COMPAT_HWCAP_VFPv4|\ - COMPAT_HWCAP_NEON|COMPAT_HWCAP_IDIV) + +#ifdef CONFIG_COMPAT +#define COMPAT_ELF_HWCAP (compat_elf_hwcap) +extern unsigned int compat_elf_hwcap; +#endif extern unsigned long elf_hwcap; #endif diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index edb3d5c73a32..7ecc2b23882e 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -166,9 +166,10 @@ do { \ #define get_user(x, ptr) \ ({ \ + __typeof__(*(ptr)) __user *__p = (ptr); \ might_fault(); \ - access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \ - __get_user((x), (ptr)) : \ + access_ok(VERIFY_READ, __p, sizeof(*__p)) ? \ + __get_user((x), __p) : \ ((x) = 0, -EFAULT); \ }) @@ -227,9 +228,10 @@ do { \ #define put_user(x, ptr) \ ({ \ + __typeof__(*(ptr)) __user *__p = (ptr); \ might_fault(); \ - access_ok(VERIFY_WRITE, (ptr), sizeof(*(ptr))) ? \ - __put_user((x), (ptr)) : \ + access_ok(VERIFY_WRITE, __p, sizeof(*__p)) ? \ + __put_user((x), __p) : \ -EFAULT; \ }) diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h index eea497578b87..9b12476e9c85 100644 --- a/arch/arm64/include/uapi/asm/hwcap.h +++ b/arch/arm64/include/uapi/asm/hwcap.h @@ -21,6 +21,7 @@ */ #define HWCAP_FP (1 << 0) #define HWCAP_ASIMD (1 << 1) +#define HWCAP_EVTSTRM (1 << 2) #endif /* _UAPI__ASM_HWCAP_H */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1f2e4d5a5c0f..bb785d23dbde 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -80,8 +80,10 @@ void fpsimd_thread_switch(struct task_struct *next) void fpsimd_flush_thread(void) { + preempt_disable(); memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state)); fpsimd_load_state(¤t->thread.fpsimd_state); + preempt_enable(); } #ifdef CONFIG_KERNEL_MODE_NEON diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 055cfb80e05c..d355b7b9710b 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -60,6 +60,16 @@ EXPORT_SYMBOL(processor_id); unsigned long elf_hwcap __read_mostly; EXPORT_SYMBOL_GPL(elf_hwcap); +#ifdef CONFIG_COMPAT +#define COMPAT_ELF_HWCAP_DEFAULT \ + (COMPAT_HWCAP_HALF|COMPAT_HWCAP_THUMB|\ + COMPAT_HWCAP_FAST_MULT|COMPAT_HWCAP_EDSP|\ + COMPAT_HWCAP_TLS|COMPAT_HWCAP_VFP|\ + COMPAT_HWCAP_VFPv3|COMPAT_HWCAP_VFPv4|\ + COMPAT_HWCAP_NEON|COMPAT_HWCAP_IDIV) +unsigned int compat_elf_hwcap __read_mostly = COMPAT_ELF_HWCAP_DEFAULT; +#endif + static const char *cpu_name; static const char *machine_name; phys_addr_t __fdt_pointer __initdata; @@ -304,6 +314,7 @@ subsys_initcall(topology_init); static const char *hwcap_str[] = { "fp", "asimd", + "evtstrm", NULL }; diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c index 03dc3718eb13..29c39d5d77e3 100644 --- a/arch/arm64/kernel/time.c +++ b/arch/arm64/kernel/time.c @@ -61,13 +61,6 @@ unsigned long profile_pc(struct pt_regs *regs) EXPORT_SYMBOL(profile_pc); #endif -static u64 sched_clock_mult __read_mostly; - -unsigned long long notrace sched_clock(void) -{ - return arch_timer_read_counter() * sched_clock_mult; -} - void __init time_init(void) { u32 arch_timer_rate; @@ -78,9 +71,6 @@ void __init time_init(void) if (!arch_timer_rate) panic("Unable to initialise architected timer.\n"); - /* Cache the sched_clock multiplier to save a divide in the hot path. */ - sched_clock_mult = NSEC_PER_SEC / arch_timer_rate; - /* Calibrate the delay loop directly */ lpj_fine = arch_timer_rate / HZ; } diff --git a/arch/arm64/mm/tlb.S b/arch/arm64/mm/tlb.S index 8ae80a18e8ec..19da91e0cd27 100644 --- a/arch/arm64/mm/tlb.S +++ b/arch/arm64/mm/tlb.S @@ -35,7 +35,7 @@ */ ENTRY(__cpu_flush_user_tlb_range) vma_vm_mm x3, x2 // get vma->vm_mm - mmid x3, x3 // get vm_mm->context.id + mmid w3, x3 // get vm_mm->context.id dsb sy lsr x0, x0, #12 // align address lsr x1, x1, #12 diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild index d22af851f3f6..658001b52400 100644 --- a/arch/avr32/include/asm/Kbuild +++ b/arch/avr32/include/asm/Kbuild @@ -1,5 +1,20 @@ generic-y += clkdev.h +generic-y += cputime.h +generic-y += delay.h +generic-y += device.h +generic-y += div64.h +generic-y += emergency-restart.h generic-y += exec.h -generic-y += trace_clock.h +generic-y += futex.h +generic-y += preempt.h +generic-y += irq_regs.h generic-y += param.h +generic-y += local.h +generic-y += local64.h +generic-y += percpu.h +generic-y += scatterlist.h +generic-y += sections.h +generic-y += topology.h +generic-y += trace_clock.h +generic-y += xor.h diff --git a/arch/avr32/include/asm/cputime.h b/arch/avr32/include/asm/cputime.h deleted file mode 100644 index e87e0f81cbeb..000000000000 --- a/arch/avr32/include/asm/cputime.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_CPUTIME_H -#define __ASM_AVR32_CPUTIME_H - -#include - -#endif /* __ASM_AVR32_CPUTIME_H */ diff --git a/arch/avr32/include/asm/delay.h b/arch/avr32/include/asm/delay.h deleted file mode 100644 index 9670e127b7b2..000000000000 --- a/arch/avr32/include/asm/delay.h +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/arch/avr32/include/asm/device.h b/arch/avr32/include/asm/device.h deleted file mode 100644 index d8f9872b0e2d..000000000000 --- a/arch/avr32/include/asm/device.h +++ /dev/null @@ -1,7 +0,0 @@ -/* - * Arch specific extensions to struct device - * - * This file is released under the GPLv2 - */ -#include - diff --git a/arch/avr32/include/asm/div64.h b/arch/avr32/include/asm/div64.h deleted file mode 100644 index d7ddd4fdeca6..000000000000 --- a/arch/avr32/include/asm/div64.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_DIV64_H -#define __ASM_AVR32_DIV64_H - -#include - -#endif /* __ASM_AVR32_DIV64_H */ diff --git a/arch/avr32/include/asm/emergency-restart.h b/arch/avr32/include/asm/emergency-restart.h deleted file mode 100644 index 3e7e014776ba..000000000000 --- a/arch/avr32/include/asm/emergency-restart.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_EMERGENCY_RESTART_H -#define __ASM_AVR32_EMERGENCY_RESTART_H - -#include - -#endif /* __ASM_AVR32_EMERGENCY_RESTART_H */ diff --git a/arch/avr32/include/asm/futex.h b/arch/avr32/include/asm/futex.h deleted file mode 100644 index 10419f14a68a..000000000000 --- a/arch/avr32/include/asm/futex.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_FUTEX_H -#define __ASM_AVR32_FUTEX_H - -#include - -#endif /* __ASM_AVR32_FUTEX_H */ diff --git a/arch/avr32/include/asm/irq_regs.h b/arch/avr32/include/asm/irq_regs.h deleted file mode 100644 index 3dd9c0b70270..000000000000 --- a/arch/avr32/include/asm/irq_regs.h +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/arch/avr32/include/asm/local.h b/arch/avr32/include/asm/local.h deleted file mode 100644 index 1c1619694da3..000000000000 --- a/arch/avr32/include/asm/local.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_LOCAL_H -#define __ASM_AVR32_LOCAL_H - -#include - -#endif /* __ASM_AVR32_LOCAL_H */ diff --git a/arch/avr32/include/asm/local64.h b/arch/avr32/include/asm/local64.h deleted file mode 100644 index 36c93b5cc239..000000000000 --- a/arch/avr32/include/asm/local64.h +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/arch/avr32/include/asm/percpu.h b/arch/avr32/include/asm/percpu.h deleted file mode 100644 index 69227b4cd0d4..000000000000 --- a/arch/avr32/include/asm/percpu.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_PERCPU_H -#define __ASM_AVR32_PERCPU_H - -#include - -#endif /* __ASM_AVR32_PERCPU_H */ diff --git a/arch/avr32/include/asm/scatterlist.h b/arch/avr32/include/asm/scatterlist.h deleted file mode 100644 index a5902d9834e8..000000000000 --- a/arch/avr32/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_SCATTERLIST_H -#define __ASM_AVR32_SCATTERLIST_H - -#include - -#endif /* __ASM_AVR32_SCATTERLIST_H */ diff --git a/arch/avr32/include/asm/sections.h b/arch/avr32/include/asm/sections.h deleted file mode 100644 index aa14252e4181..000000000000 --- a/arch/avr32/include/asm/sections.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_SECTIONS_H -#define __ASM_AVR32_SECTIONS_H - -#include - -#endif /* __ASM_AVR32_SECTIONS_H */ diff --git a/arch/avr32/include/asm/topology.h b/arch/avr32/include/asm/topology.h deleted file mode 100644 index 5b766cbb4806..000000000000 --- a/arch/avr32/include/asm/topology.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_AVR32_TOPOLOGY_H -#define __ASM_AVR32_TOPOLOGY_H - -#include - -#endif /* __ASM_AVR32_TOPOLOGY_H */ diff --git a/arch/avr32/include/asm/xor.h b/arch/avr32/include/asm/xor.h deleted file mode 100644 index 99c87aa0af4f..000000000000 --- a/arch/avr32/include/asm/xor.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_XOR_H -#define _ASM_XOR_H - -#include - -#endif diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c index c2731003edef..42a53e740a7e 100644 --- a/arch/avr32/kernel/process.c +++ b/arch/avr32/kernel/process.c @@ -289,7 +289,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, memset(childregs, 0, sizeof(struct pt_regs)); p->thread.cpu_context.r0 = arg; p->thread.cpu_context.r1 = usp; /* fn */ - p->thread.cpu_context.r2 = syscall_return; + p->thread.cpu_context.r2 = (unsigned long)syscall_return; p->thread.cpu_context.pc = (unsigned long)ret_from_kernel_thread; childregs->sr = MODE_SUPERVISOR; } else { diff --git a/arch/avr32/kernel/time.c b/arch/avr32/kernel/time.c index 869a1c6ffeee..12f828ad5058 100644 --- a/arch/avr32/kernel/time.c +++ b/arch/avr32/kernel/time.c @@ -98,7 +98,14 @@ static void comparator_mode(enum clock_event_mode mode, case CLOCK_EVT_MODE_SHUTDOWN: sysreg_write(COMPARE, 0); pr_debug("%s: stop\n", evdev->name); - cpu_idle_poll_ctrl(false); + if (evdev->mode == CLOCK_EVT_MODE_ONESHOT || + evdev->mode == CLOCK_EVT_MODE_RESUME) { + /* + * Only disable idle poll if we have forced that + * in a previous call. + */ + cpu_idle_poll_ctrl(false); + } break; default: BUG(); diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild index 127826f8a375..f2b43474b0e2 100644 --- a/arch/blackfin/include/asm/Kbuild +++ b/arch/blackfin/include/asm/Kbuild @@ -44,3 +44,4 @@ generic-y += ucontext.h generic-y += unaligned.h generic-y += user.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild index e49f918531ad..fc0b3c356027 100644 --- a/arch/c6x/include/asm/Kbuild +++ b/arch/c6x/include/asm/Kbuild @@ -56,3 +56,4 @@ generic-y += ucontext.h generic-y += user.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild index c8325455520e..b06caf649a95 100644 --- a/arch/cris/include/asm/Kbuild +++ b/arch/cris/include/asm/Kbuild @@ -11,3 +11,4 @@ generic-y += module.h generic-y += trace_clock.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild index c5d767028306..74742dc6a3da 100644 --- a/arch/frv/include/asm/Kbuild +++ b/arch/frv/include/asm/Kbuild @@ -2,3 +2,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild index 8ada3cf0c98d..7e0e7213a481 100644 --- a/arch/h8300/include/asm/Kbuild +++ b/arch/h8300/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += mmu.h generic-y += module.h generic-y += trace_clock.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index 1da17caac23c..67c3450309b7 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -53,3 +53,4 @@ generic-y += types.h generic-y += ucontext.h generic-y += unaligned.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index a3456f34f672..f93ee087e8fe 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -3,4 +3,5 @@ generic-y += clkdev.h generic-y += exec.h generic-y += kvm_para.h generic-y += trace_clock.h +generic-y += preempt.h generic-y += vtime.h \ No newline at end of file diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h index 74a7cc3293bc..0d2bcb37ec35 100644 --- a/arch/ia64/include/asm/io.h +++ b/arch/ia64/include/asm/io.h @@ -424,6 +424,7 @@ extern void __iomem * ioremap(unsigned long offset, unsigned long size); extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size); extern void iounmap (volatile void __iomem *addr); extern void __iomem * early_ioremap (unsigned long phys_addr, unsigned long size); +#define early_memremap(phys_addr, size) early_ioremap(phys_addr, size) extern void early_iounmap (volatile void __iomem *addr, unsigned long size); static inline void __iomem * ioremap_cache (unsigned long phys_addr, unsigned long size) { diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c index 51bce594eb83..da5b462e6de6 100644 --- a/arch/ia64/kernel/efi.c +++ b/arch/ia64/kernel/efi.c @@ -44,10 +44,15 @@ #define EFI_DEBUG 0 +static __initdata unsigned long palo_phys; + +static __initdata efi_config_table_type_t arch_tables[] = { + {PROCESSOR_ABSTRACTION_LAYER_OVERWRITE_GUID, "PALO", &palo_phys}, + {NULL_GUID, NULL, 0}, +}; + extern efi_status_t efi_call_phys (void *, ...); -struct efi efi; -EXPORT_SYMBOL(efi); static efi_runtime_services_t *runtime; static u64 mem_limit = ~0UL, max_addr = ~0UL, min_addr = 0UL; @@ -423,9 +428,9 @@ static u8 __init palo_checksum(u8 *buffer, u32 length) * Parse and handle PALO table which is published at: * http://www.dig64.org/home/DIG64_PALO_R1_0.pdf */ -static void __init handle_palo(unsigned long palo_phys) +static void __init handle_palo(unsigned long phys_addr) { - struct palo_table *palo = __va(palo_phys); + struct palo_table *palo = __va(phys_addr); u8 checksum; if (strncmp(palo->signature, PALO_SIG, sizeof(PALO_SIG) - 1)) { @@ -467,12 +472,10 @@ void __init efi_init (void) { void *efi_map_start, *efi_map_end; - efi_config_table_t *config_tables; efi_char16_t *c16; u64 efi_desc_size; char *cp, vendor[100] = "unknown"; int i; - unsigned long palo_phys; /* * It's too early to be able to use the standard kernel command line @@ -514,8 +517,6 @@ efi_init (void) efi.systab->hdr.revision >> 16, efi.systab->hdr.revision & 0xffff); - config_tables = __va(efi.systab->tables); - /* Show what we know for posterity */ c16 = __va(efi.systab->fw_vendor); if (c16) { @@ -528,43 +529,10 @@ efi_init (void) efi.systab->hdr.revision >> 16, efi.systab->hdr.revision & 0xffff, vendor); - efi.mps = EFI_INVALID_TABLE_ADDR; - efi.acpi = EFI_INVALID_TABLE_ADDR; - efi.acpi20 = EFI_INVALID_TABLE_ADDR; - efi.smbios = EFI_INVALID_TABLE_ADDR; - efi.sal_systab = EFI_INVALID_TABLE_ADDR; - efi.boot_info = EFI_INVALID_TABLE_ADDR; - efi.hcdp = EFI_INVALID_TABLE_ADDR; - efi.uga = EFI_INVALID_TABLE_ADDR; - palo_phys = EFI_INVALID_TABLE_ADDR; - for (i = 0; i < (int) efi.systab->nr_tables; i++) { - if (efi_guidcmp(config_tables[i].guid, MPS_TABLE_GUID) == 0) { - efi.mps = config_tables[i].table; - printk(" MPS=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, ACPI_20_TABLE_GUID) == 0) { - efi.acpi20 = config_tables[i].table; - printk(" ACPI 2.0=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, ACPI_TABLE_GUID) == 0) { - efi.acpi = config_tables[i].table; - printk(" ACPI=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, SMBIOS_TABLE_GUID) == 0) { - efi.smbios = config_tables[i].table; - printk(" SMBIOS=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, SAL_SYSTEM_TABLE_GUID) == 0) { - efi.sal_systab = config_tables[i].table; - printk(" SALsystab=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, HCDP_TABLE_GUID) == 0) { - efi.hcdp = config_tables[i].table; - printk(" HCDP=0x%lx", config_tables[i].table); - } else if (efi_guidcmp(config_tables[i].guid, - PROCESSOR_ABSTRACTION_LAYER_OVERWRITE_GUID) == 0) { - palo_phys = config_tables[i].table; - printk(" PALO=0x%lx", config_tables[i].table); - } - } - printk("\n"); + if (efi_config_init(arch_tables) != 0) + return; if (palo_phys != EFI_INVALID_TABLE_ADDR) handle_palo(palo_phys); diff --git a/arch/m32r/include/asm/Kbuild b/arch/m32r/include/asm/Kbuild index bebdc36ebb0a..2b58c5f0bc38 100644 --- a/arch/m32r/include/asm/Kbuild +++ b/arch/m32r/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += module.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 09d77a862da3..a5d27f272a59 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -31,3 +31,4 @@ generic-y += trace_clock.h generic-y += types.h generic-y += word-at-a-time.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/metag/include/asm/Kbuild b/arch/metag/include/asm/Kbuild index 6ae0ccb632cb..84d0c1d6b9b3 100644 --- a/arch/metag/include/asm/Kbuild +++ b/arch/metag/include/asm/Kbuild @@ -52,3 +52,4 @@ generic-y += unaligned.h generic-y += user.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h index 23f5118f58db..8e9c0b3b9691 100644 --- a/arch/metag/include/asm/topology.h +++ b/arch/metag/include/asm/topology.h @@ -26,6 +26,8 @@ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ + .max_newidle_lb_cost = 0, \ + .next_decay_max_lb_cost = jiffies, \ } #define cpu_to_node(cpu) ((void)(cpu), 0) diff --git a/arch/metag/kernel/irq.c b/arch/metag/kernel/irq.c index 2a2c9d55187e..3b4b7f6c0950 100644 --- a/arch/metag/kernel/irq.c +++ b/arch/metag/kernel/irq.c @@ -159,44 +159,30 @@ void irq_ctx_exit(int cpu) extern asmlinkage void __do_softirq(void); -asmlinkage void do_softirq(void) +void do_softirq_own_stack(void) { - unsigned long flags; struct thread_info *curctx; union irq_ctx *irqctx; u32 *isp; - if (in_interrupt()) - return; - - local_irq_save(flags); - - if (local_softirq_pending()) { - curctx = current_thread_info(); - irqctx = softirq_ctx[smp_processor_id()]; - irqctx->tinfo.task = curctx->task; - - /* build the stack frame on the softirq stack */ - isp = (u32 *) ((char *)irqctx + sizeof(struct thread_info)); - - asm volatile ( - "MOV D0.5,%0\n" - "SWAP A0StP,D0.5\n" - "CALLR D1RtP,___do_softirq\n" - "MOV A0StP,D0.5\n" - : - : "r" (isp) - : "memory", "cc", "D1Ar1", "D0Ar2", "D1Ar3", "D0Ar4", - "D1Ar5", "D0Ar6", "D0Re0", "D1Re0", "D0.4", "D1RtP", - "D0.5" - ); - /* - * Shouldn't happen, we returned above if in_interrupt(): - */ - WARN_ON_ONCE(softirq_count()); - } - - local_irq_restore(flags); + curctx = current_thread_info(); + irqctx = softirq_ctx[smp_processor_id()]; + irqctx->tinfo.task = curctx->task; + + /* build the stack frame on the softirq stack */ + isp = (u32 *) ((char *)irqctx + sizeof(struct thread_info)); + + asm volatile ( + "MOV D0.5,%0\n" + "SWAP A0StP,D0.5\n" + "CALLR D1RtP,___do_softirq\n" + "MOV A0StP,D0.5\n" + : + : "r" (isp) + : "memory", "cc", "D1Ar1", "D0Ar2", "D1Ar3", "D0Ar4", + "D1Ar5", "D0Ar6", "D0Re0", "D1Re0", "D0.4", "D1RtP", + "D0.5" + ); } #endif diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index d3c51a6a601d..ce0bbf8f5640 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h generic-y += syscalls.h +generic-y += preempt.h diff --git a/arch/mips/alchemy/board-mtx1.c b/arch/mips/alchemy/board-mtx1.c index 4a9baa9f6330..9969dbab19e3 100644 --- a/arch/mips/alchemy/board-mtx1.c +++ b/arch/mips/alchemy/board-mtx1.c @@ -276,7 +276,7 @@ static struct platform_device mtx1_pci_host = { .resource = alchemy_pci_host_res, }; -static struct __initdata platform_device * mtx1_devs[] = { +static struct platform_device *mtx1_devs[] __initdata = { &mtx1_pci_host, &mtx1_gpio_leds, &mtx1_wdt, diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index 454ddf9bb76f..1acbb8b77a71 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -11,5 +11,6 @@ generic-y += sections.h generic-y += segment.h generic-y += serial.h generic-y += trace_clock.h +generic-y += preempt.h generic-y += ucontext.h generic-y += xor.h diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h index 51680d15ca8e..d445d060e346 100644 --- a/arch/mips/include/asm/cpu-features.h +++ b/arch/mips/include/asm/cpu-features.h @@ -187,7 +187,7 @@ /* * MIPS32, MIPS64, VR5500, IDT32332, IDT32334 and maybe a few other - * pre-MIPS32/MIPS53 processors have CLO, CLZ. The IDT RC64574 is 64-bit and + * pre-MIPS32/MIPS64 processors have CLO, CLZ. The IDT RC64574 is 64-bit and * has CLO and CLZ but not DCLO nor DCLZ. For 64-bit kernels * cpu_has_clo_clz also indicates the availability of DCLO and DCLZ. */ diff --git a/arch/mips/include/asm/jump_label.h b/arch/mips/include/asm/jump_label.h index 4d6d77ed9b9d..e194f957ca8c 100644 --- a/arch/mips/include/asm/jump_label.h +++ b/arch/mips/include/asm/jump_label.h @@ -22,7 +22,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("1:\tnop\n\t" + asm_volatile_goto("1:\tnop\n\t" "nop\n\t" ".pushsection __jump_table, \"aw\"\n\t" WORD_INSN " 1b, %l[l_yes], %0\n\t" diff --git a/arch/mips/kernel/octeon_switch.S b/arch/mips/kernel/octeon_switch.S index 4204d76af854..029e002a4ea0 100644 --- a/arch/mips/kernel/octeon_switch.S +++ b/arch/mips/kernel/octeon_switch.S @@ -73,7 +73,7 @@ 3: #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP) - PTR_L t8, __stack_chk_guard + PTR_LA t8, __stack_chk_guard LONG_L t9, TASK_STACK_CANARY(a1) LONG_S t9, 0(t8) #endif diff --git a/arch/mips/kernel/r2300_switch.S b/arch/mips/kernel/r2300_switch.S index 38af83f84c4a..20b7b040e76f 100644 --- a/arch/mips/kernel/r2300_switch.S +++ b/arch/mips/kernel/r2300_switch.S @@ -67,7 +67,7 @@ LEAF(resume) 1: #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP) - PTR_L t8, __stack_chk_guard + PTR_LA t8, __stack_chk_guard LONG_L t9, TASK_STACK_CANARY(a1) LONG_S t9, 0(t8) #endif diff --git a/arch/mips/kernel/r4k_switch.S b/arch/mips/kernel/r4k_switch.S index 921238a6bd26..078de5eaca8f 100644 --- a/arch/mips/kernel/r4k_switch.S +++ b/arch/mips/kernel/r4k_switch.S @@ -69,7 +69,7 @@ 1: #if defined(CONFIG_CC_STACKPROTECTOR) && !defined(CONFIG_SMP) - PTR_L t8, __stack_chk_guard + PTR_LA t8, __stack_chk_guard LONG_L t9, TASK_STACK_CANARY(a1) LONG_S t9, 0(t8) #endif diff --git a/arch/mips/kernel/rtlx.c b/arch/mips/kernel/rtlx.c index d763f11e35e2..2c12ea1668d1 100644 --- a/arch/mips/kernel/rtlx.c +++ b/arch/mips/kernel/rtlx.c @@ -172,8 +172,9 @@ int rtlx_open(int index, int can_sleep) if (rtlx == NULL) { if( (p = vpe_get_shared(tclimit)) == NULL) { if (can_sleep) { - __wait_event_interruptible(channel_wqs[index].lx_queue, - (p = vpe_get_shared(tclimit)), ret); + ret = __wait_event_interruptible( + channel_wqs[index].lx_queue, + (p = vpe_get_shared(tclimit))); if (ret) goto out_fail; } else { @@ -263,11 +264,10 @@ unsigned int rtlx_read_poll(int index, int can_sleep) /* data available to read? */ if (chan->lx_read == chan->lx_write) { if (can_sleep) { - int ret = 0; - - __wait_event_interruptible(channel_wqs[index].lx_queue, + int ret = __wait_event_interruptible( + channel_wqs[index].lx_queue, (chan->lx_read != chan->lx_write) || - sp_stopping, ret); + sp_stopping); if (ret) return ret; @@ -440,14 +440,13 @@ static ssize_t file_write(struct file *file, const char __user * buffer, /* any space left... */ if (!rtlx_write_poll(minor)) { - int ret = 0; + int ret; if (file->f_flags & O_NONBLOCK) return -EAGAIN; - __wait_event_interruptible(channel_wqs[minor].rt_queue, - rtlx_write_poll(minor), - ret); + ret = __wait_event_interruptible(channel_wqs[minor].rt_queue, + rtlx_write_poll(minor)); if (ret) return ret; } diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c index 627883bc6d5f..bc6f96fcb529 100644 --- a/arch/mips/mm/c-r4k.c +++ b/arch/mips/mm/c-r4k.c @@ -609,6 +609,7 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size) r4k_blast_scache(); else blast_scache_range(addr, addr + size); + preempt_enable(); __sync(); return; } @@ -650,6 +651,7 @@ static void r4k_dma_cache_inv(unsigned long addr, unsigned long size) */ blast_inv_scache_range(addr, addr + size); } + preempt_enable(); __sync(); return; } diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c index f25a7e9f8cbc..5f8b95512580 100644 --- a/arch/mips/mm/dma-default.c +++ b/arch/mips/mm/dma-default.c @@ -308,12 +308,10 @@ static void mips_dma_sync_sg_for_cpu(struct device *dev, { int i; - /* Make sure that gcc doesn't leave the empty loop body. */ - for (i = 0; i < nelems; i++, sg++) { - if (cpu_needs_post_dma_flush(dev)) + if (cpu_needs_post_dma_flush(dev)) + for (i = 0; i < nelems; i++, sg++) __dma_sync(sg_page(sg), sg->offset, sg->length, direction); - } } static void mips_dma_sync_sg_for_device(struct device *dev, @@ -321,12 +319,10 @@ static void mips_dma_sync_sg_for_device(struct device *dev, { int i; - /* Make sure that gcc doesn't leave the empty loop body. */ - for (i = 0; i < nelems; i++, sg++) { - if (!plat_device_is_coherent(dev)) + if (!plat_device_is_coherent(dev)) + for (i = 0; i < nelems; i++, sg++) __dma_sync(sg_page(sg), sg->offset, sg->length, direction); - } } int mips_dma_mapping_error(struct device *dev, dma_addr_t dma_addr) diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c index e205ef598e97..12156176c7ca 100644 --- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -124,7 +124,7 @@ void *kmap_coherent(struct page *page, unsigned long addr) BUG_ON(Page_dcache_dirty(page)); - inc_preempt_count(); + pagefault_disable(); idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1); #ifdef CONFIG_MIPS_MT_SMTC idx += FIX_N_COLOURS * smp_processor_id() + @@ -193,8 +193,7 @@ void kunmap_coherent(void) write_c0_entryhi(old_ctx); EXIT_CRITICAL(flags); #endif - dec_preempt_count(); - preempt_check_resched(); + pagefault_enable(); } void copy_user_highpage(struct page *to, struct page *from, diff --git a/arch/mn10300/include/asm/Kbuild b/arch/mn10300/include/asm/Kbuild index c5d767028306..74742dc6a3da 100644 --- a/arch/mn10300/include/asm/Kbuild +++ b/arch/mn10300/include/asm/Kbuild @@ -2,3 +2,4 @@ generic-y += clkdev.h generic-y += exec.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index 195653e851da..78405625e799 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -67,3 +67,4 @@ generic-y += ucontext.h generic-y += user.h generic-y += word-at-a-time.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/openrisc/include/asm/prom.h b/arch/openrisc/include/asm/prom.h index eb59bfe23e85..93c9980e1b6b 100644 --- a/arch/openrisc/include/asm/prom.h +++ b/arch/openrisc/include/asm/prom.h @@ -14,53 +14,9 @@ * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. */ - -#include /* linux/of.h gets to determine #include ordering */ - #ifndef _ASM_OPENRISC_PROM_H #define _ASM_OPENRISC_PROM_H -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#include -#include -#include -#include -#include -#include -#include -#include -#include #define HAVE_ARCH_DEVTREE_FIXUPS -/* Other Prototypes */ -extern int early_uartlite_console(void); - -/* Parse the ibm,dma-window property of an OF node into the busno, phys and - * size parameters. - */ -void of_parse_dma_window(struct device_node *dn, const void *dma_window_prop, - unsigned long *busno, unsigned long *phys, unsigned long *size); - -extern void kdump_move_device_tree(void); - -/* Get the MAC address */ -extern const void *of_get_mac_address(struct device_node *np); - -/** - * of_irq_map_pci - Resolve the interrupt for a PCI device - * @pdev: the device whose interrupt is to be resolved - * @out_irq: structure of_irq filled by this function - * - * This function resolves the PCI interrupt for a given PCI device. If a - * device-node exists for a given pci_dev, it will use normal OF tree - * walking. If not, it will implement standard swizzling and walk up the - * PCI tree until an device-node is found, at which point it will finish - * resolving using the OF tree walking. - */ -struct pci_dev; -extern int of_irq_map_pci(struct pci_dev *pdev, struct of_irq *out_irq); - -#endif /* __ASSEMBLY__ */ -#endif /* __KERNEL__ */ #endif /* _ASM_OPENRISC_PROM_H */ diff --git a/arch/parisc/configs/712_defconfig b/arch/parisc/configs/712_defconfig index 0f90569b9d85..9387cc2693f6 100644 --- a/arch/parisc/configs/712_defconfig +++ b/arch/parisc/configs/712_defconfig @@ -40,6 +40,8 @@ CONFIG_IP_NF_QUEUE=m CONFIG_LLC2=m CONFIG_NET_PKTGEN=m CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_PARPORT=y diff --git a/arch/parisc/configs/a500_defconfig b/arch/parisc/configs/a500_defconfig index b647b182dacc..90025322b75e 100644 --- a/arch/parisc/configs/a500_defconfig +++ b/arch/parisc/configs/a500_defconfig @@ -79,6 +79,8 @@ CONFIG_IP_DCCP=m CONFIG_LLC2=m CONFIG_NET_PKTGEN=m CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_BLK_DEV_UMEM=m diff --git a/arch/parisc/configs/b180_defconfig b/arch/parisc/configs/b180_defconfig index e289f5bf3148..f1a0c25bef8d 100644 --- a/arch/parisc/configs/b180_defconfig +++ b/arch/parisc/configs/b180_defconfig @@ -4,6 +4,7 @@ CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 CONFIG_SYSFS_DEPRECATED_V2=y +CONFIG_BLK_DEV_INITRD=y CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODVERSIONS=y @@ -27,6 +28,8 @@ CONFIG_IP_PNP_BOOTP=y # CONFIG_INET_LRO is not set CONFIG_IPV6=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_PARPORT=y CONFIG_PARPORT_PC=y diff --git a/arch/parisc/configs/c3000_defconfig b/arch/parisc/configs/c3000_defconfig index 311ca367b622..ec1b014952b6 100644 --- a/arch/parisc/configs/c3000_defconfig +++ b/arch/parisc/configs/c3000_defconfig @@ -5,6 +5,7 @@ CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 CONFIG_SYSFS_DEPRECATED_V2=y +CONFIG_BLK_DEV_INITRD=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_EXPERT=y CONFIG_KALLSYMS_ALL=y @@ -39,6 +40,8 @@ CONFIG_NETFILTER_DEBUG=y CONFIG_IP_NF_QUEUE=m CONFIG_NET_PKTGEN=m CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_BLK_DEV_UMEM=m diff --git a/arch/parisc/configs/c8000_defconfig b/arch/parisc/configs/c8000_defconfig index f11006361297..e1c8d2015c89 100644 --- a/arch/parisc/configs/c8000_defconfig +++ b/arch/parisc/configs/c8000_defconfig @@ -62,6 +62,8 @@ CONFIG_TIPC=m CONFIG_LLC2=m CONFIG_DNS_RESOLVER=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set CONFIG_PARPORT=y CONFIG_PARPORT_PC=y diff --git a/arch/parisc/configs/default_defconfig b/arch/parisc/configs/default_defconfig index dfe88f6c95c4..ba61495e1fa4 100644 --- a/arch/parisc/configs/default_defconfig +++ b/arch/parisc/configs/default_defconfig @@ -49,6 +49,8 @@ CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=y CONFIG_LLC2=m CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y # CONFIG_STANDALONE is not set # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_PARPORT=y diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index ff4c9faed546..a603b9ebe54c 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h user.h cputime.h emergency-restart.h \ div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \ poll.h xor.h clkdev.h exec.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/parisc/include/asm/traps.h b/arch/parisc/include/asm/traps.h index 1945f995f2df..4736020ba5ea 100644 --- a/arch/parisc/include/asm/traps.h +++ b/arch/parisc/include/asm/traps.h @@ -6,7 +6,7 @@ struct pt_regs; /* traps.c */ void parisc_terminate(char *msg, struct pt_regs *regs, - int code, unsigned long offset); + int code, unsigned long offset) __noreturn __cold; /* mm/fault.c */ void do_page_fault(struct pt_regs *regs, unsigned long code, diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c index 2e6443b1e922..ef5927685299 100644 --- a/arch/parisc/kernel/irq.c +++ b/arch/parisc/kernel/irq.c @@ -499,22 +499,9 @@ static void execute_on_irq_stack(void *func, unsigned long param1) *irq_stack_in_use = 1; } -asmlinkage void do_softirq(void) +void do_softirq_own_stack(void) { - __u32 pending; - unsigned long flags; - - if (in_interrupt()) - return; - - local_irq_save(flags); - - pending = local_softirq_pending(); - - if (pending) - execute_on_irq_stack(__do_softirq, 0); - - local_irq_restore(flags); + execute_on_irq_stack(__do_softirq, 0); } #endif /* CONFIG_IRQSTACKS */ diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c index 8a252f2d6c08..2b96602e812f 100644 --- a/arch/parisc/kernel/smp.c +++ b/arch/parisc/kernel/smp.c @@ -72,7 +72,6 @@ enum ipi_message_type { IPI_NOP=0, IPI_RESCHEDULE=1, IPI_CALL_FUNC, - IPI_CALL_FUNC_SINGLE, IPI_CPU_START, IPI_CPU_STOP, IPI_CPU_TEST @@ -164,11 +163,6 @@ ipi_interrupt(int irq, void *dev_id) generic_smp_call_function_interrupt(); break; - case IPI_CALL_FUNC_SINGLE: - smp_debug(100, KERN_DEBUG "CPU%d IPI_CALL_FUNC_SINGLE\n", this_cpu); - generic_smp_call_function_single_interrupt(); - break; - case IPI_CPU_START: smp_debug(100, KERN_DEBUG "CPU%d IPI_CPU_START\n", this_cpu); break; @@ -260,7 +254,7 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask) void arch_send_call_function_single_ipi(int cpu) { - send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE); + send_IPI_single(cpu, IPI_CALL_FUNC); } /* diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c index 04e47c6a4562..1cd1d0c83b6d 100644 --- a/arch/parisc/kernel/traps.c +++ b/arch/parisc/kernel/traps.c @@ -291,11 +291,6 @@ void die_if_kernel(char *str, struct pt_regs *regs, long err) do_exit(SIGSEGV); } -int syscall_ipi(int (*syscall) (struct pt_regs *), struct pt_regs *regs) -{ - return syscall(regs); -} - /* gdb uses break 4,8 */ #define GDB_BREAK_INSN 0x10004 static void handle_gdb_break(struct pt_regs *regs, int wot) @@ -805,14 +800,14 @@ void notrace handle_interruption(int code, struct pt_regs *regs) else { /* - * The kernel should never fault on its own address space. + * The kernel should never fault on its own address space, + * unless pagefault_disable() was called before. */ - if (fault_space == 0) + if (fault_space == 0 && !in_atomic()) { pdc_chassis_send_status(PDC_CHASSIS_DIRECT_PANIC); parisc_terminate("Kernel Fault", regs, code, fault_address); - } } diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c index ac4370b1ca40..b5507ec06b84 100644 --- a/arch/parisc/lib/memcpy.c +++ b/arch/parisc/lib/memcpy.c @@ -56,7 +56,7 @@ #ifdef __KERNEL__ #include #include -#include +#include #define s_space "%%sr1" #define d_space "%%sr2" #else @@ -524,4 +524,17 @@ EXPORT_SYMBOL(copy_to_user); EXPORT_SYMBOL(copy_from_user); EXPORT_SYMBOL(copy_in_user); EXPORT_SYMBOL(memcpy); + +long probe_kernel_read(void *dst, const void *src, size_t size) +{ + unsigned long addr = (unsigned long)src; + + if (size < 0 || addr < PAGE_SIZE) + return -EFAULT; + + /* check for I/O space F_EXTEND(0xfff00000) access as well? */ + + return __probe_kernel_read(dst, src, size); +} + #endif diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index d10d27a720c0..0293588d5b8c 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -171,17 +171,25 @@ void do_page_fault(struct pt_regs *regs, unsigned long code, unsigned long address) { struct vm_area_struct *vma, *prev_vma; - struct task_struct *tsk = current; - struct mm_struct *mm = tsk->mm; + struct task_struct *tsk; + struct mm_struct *mm; unsigned long acc_type; int fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags; - if (in_atomic() || !mm) + if (in_atomic()) goto no_context; + tsk = current; + mm = tsk->mm; + if (!mm) + goto no_context; + + flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; if (user_mode(regs)) flags |= FAULT_FLAG_USER; + + acc_type = parisc_acctyp(code, regs->iir); if (acc_type & VM_WRITE) flags |= FAULT_FLAG_WRITE; retry: @@ -196,8 +204,6 @@ retry: good_area: - acc_type = parisc_acctyp(code,regs->iir); - if ((vma->vm_flags & acc_type) != acc_type) goto bad_area; diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 38f3b7e47ec5..b365d5cbb722 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -138,6 +138,7 @@ config PPC select OLD_SIGSUSPEND select OLD_SIGACTION if PPC32 select HAVE_DEBUG_STACKOVERFLOW + select HAVE_IRQ_EXIT_ON_IRQ_STACK config EARLY_PRINTK bool diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 6a15c968d214..15ca2255f438 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -74,7 +74,7 @@ src-wlib-$(CONFIG_8xx) += mpc8xx.c planetcore.c src-wlib-$(CONFIG_PPC_82xx) += pq2.c fsl-soc.c planetcore.c src-wlib-$(CONFIG_EMBEDDED6xx) += mv64x60.c mv64x60_i2c.c ugecon.c -src-plat-y := of.c +src-plat-y := of.c epapr.c src-plat-$(CONFIG_40x) += fixed-head.S ep405.c cuboot-hotfoot.c \ treeboot-walnut.c cuboot-acadia.c \ cuboot-kilauea.c simpleboot.c \ @@ -97,7 +97,7 @@ src-plat-$(CONFIG_EMBEDDED6xx) += cuboot-pq2.c cuboot-mpc7448hpc2.c \ prpmc2800.c src-plat-$(CONFIG_AMIGAONE) += cuboot-amigaone.c src-plat-$(CONFIG_PPC_PS3) += ps3-head.S ps3-hvcall.S ps3.c -src-plat-$(CONFIG_EPAPR_BOOT) += epapr.c +src-plat-$(CONFIG_EPAPR_BOOT) += epapr.c epapr-wrapper.c src-wlib := $(sort $(src-wlib-y)) src-plat := $(sort $(src-plat-y)) diff --git a/arch/powerpc/boot/epapr-wrapper.c b/arch/powerpc/boot/epapr-wrapper.c new file mode 100644 index 000000000000..c10191006673 --- /dev/null +++ b/arch/powerpc/boot/epapr-wrapper.c @@ -0,0 +1,9 @@ +extern void epapr_platform_init(unsigned long r3, unsigned long r4, + unsigned long r5, unsigned long r6, + unsigned long r7); + +void platform_init(unsigned long r3, unsigned long r4, unsigned long r5, + unsigned long r6, unsigned long r7) +{ + epapr_platform_init(r3, r4, r5, r6, r7); +} diff --git a/arch/powerpc/boot/epapr.c b/arch/powerpc/boot/epapr.c index 06c1961bd124..02e91aa2194a 100644 --- a/arch/powerpc/boot/epapr.c +++ b/arch/powerpc/boot/epapr.c @@ -48,8 +48,8 @@ static void platform_fixups(void) fdt_addr, fdt_totalsize((void *)fdt_addr), ima_size); } -void platform_init(unsigned long r3, unsigned long r4, unsigned long r5, - unsigned long r6, unsigned long r7) +void epapr_platform_init(unsigned long r3, unsigned long r4, unsigned long r5, + unsigned long r6, unsigned long r7) { epapr_magic = r6; ima_size = r7; diff --git a/arch/powerpc/boot/of.c b/arch/powerpc/boot/of.c index 61d9899aa0d0..62e2f43ec1df 100644 --- a/arch/powerpc/boot/of.c +++ b/arch/powerpc/boot/of.c @@ -26,6 +26,9 @@ static unsigned long claim_base; +void epapr_platform_init(unsigned long r3, unsigned long r4, unsigned long r5, + unsigned long r6, unsigned long r7); + static void *of_try_claim(unsigned long size) { unsigned long addr = 0; @@ -61,7 +64,7 @@ static void of_image_hdr(const void *hdr) } } -void platform_init(unsigned long a1, unsigned long a2, void *promptr) +static void of_platform_init(unsigned long a1, unsigned long a2, void *promptr) { platform_ops.image_hdr = of_image_hdr; platform_ops.malloc = of_try_claim; @@ -81,3 +84,14 @@ void platform_init(unsigned long a1, unsigned long a2, void *promptr) loader_info.initrd_size = a2; } } + +void platform_init(unsigned long r3, unsigned long r4, unsigned long r5, + unsigned long r6, unsigned long r7) +{ + /* Detect OF vs. ePAPR boot */ + if (r5) + of_platform_init(r3, r4, (void *)r5); + else + epapr_platform_init(r3, r4, r5, r6, r7); +} + diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper index 6761c746048d..cd7af841ba05 100755 --- a/arch/powerpc/boot/wrapper +++ b/arch/powerpc/boot/wrapper @@ -148,18 +148,18 @@ make_space=y case "$platform" in pseries) - platformo=$object/of.o + platformo="$object/of.o $object/epapr.o" link_address='0x4000000' ;; maple) - platformo=$object/of.o + platformo="$object/of.o $object/epapr.o" link_address='0x400000' ;; pmac|chrp) - platformo=$object/of.o + platformo="$object/of.o $object/epapr.o" ;; coff) - platformo="$object/crt0.o $object/of.o" + platformo="$object/crt0.o $object/of.o $object/epapr.o" lds=$object/zImage.coff.lds link_address='0x500000' pie= @@ -253,6 +253,7 @@ treeboot-iss4xx-mpic) platformo="$object/treeboot-iss4xx.o" ;; epapr) + platformo="$object/epapr.o $object/epapr-wrapper.o" link_address='0x20000000' pie=-pie ;; diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 704e6f10ae80..d8f9d2f18a23 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -2,4 +2,5 @@ generic-y += clkdev.h generic-y += rwsem.h generic-y += trace_clock.h +generic-y += preempt.h generic-y += vtime.h \ No newline at end of file diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 0e40843a1c6e..41f13cec8a8f 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -69,9 +69,9 @@ extern struct thread_info *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); extern void call_do_softirq(struct thread_info *tp); -extern int call_handle_irq(int irq, void *p1, - struct thread_info *tp, void *func); +extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp); extern void do_IRQ(struct pt_regs *regs); +extern void __do_irq(struct pt_regs *regs); int irq_choose_cpu(const struct cpumask *mask); diff --git a/arch/powerpc/include/asm/jump_label.h b/arch/powerpc/include/asm/jump_label.h index ae098c438f00..f016bb699b5f 100644 --- a/arch/powerpc/include/asm/jump_label.h +++ b/arch/powerpc/include/asm/jump_label.h @@ -19,7 +19,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("1:\n\t" + asm_volatile_goto("1:\n\t" "nop\n\t" ".pushsection __jump_table, \"aw\"\n\t" JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t" diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index e378cccfca55..ce4de5aed7b5 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -149,8 +149,6 @@ typedef struct { struct thread_struct { unsigned long ksp; /* Kernel stack pointer */ - unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */ - #ifdef CONFIG_PPC64 unsigned long ksp_vsid; #endif @@ -162,6 +160,7 @@ struct thread_struct { #endif #ifdef CONFIG_PPC32 void *pgdir; /* root of page-table tree */ + unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */ #endif #ifdef CONFIG_PPC_ADV_DEBUG_REGS /* @@ -321,7 +320,6 @@ struct thread_struct { #else #define INIT_THREAD { \ .ksp = INIT_SP, \ - .ksp_limit = INIT_SP_LIMIT, \ .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ .fs = KERNEL_DS, \ .fpr = {{0}}, \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index d8958be5f31a..502c7a4e73f7 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -80,10 +80,11 @@ int main(void) DEFINE(TASKTHREADPPR, offsetof(struct task_struct, thread.ppr)); #else DEFINE(THREAD_INFO, offsetof(struct task_struct, stack)); + DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); + DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit)); #endif /* CONFIG_PPC64 */ DEFINE(KSP, offsetof(struct thread_struct, ksp)); - DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit)); DEFINE(PT_REGS, offsetof(struct thread_struct, regs)); #ifdef CONFIG_BOOKE DEFINE(THREAD_NORMSAVES, offsetof(struct thread_struct, normsave[0])); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 0adab06ce5c0..572bb5b95f35 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -661,7 +661,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) /* number of bytes needed for the bitmap */ sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long); - page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz)); + page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); tbl->it_map = page_address(page); diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index c69440cef7af..ba0165615215 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -441,50 +441,6 @@ void migrate_irqs(void) } #endif -static inline void handle_one_irq(unsigned int irq) -{ - struct thread_info *curtp, *irqtp; - unsigned long saved_sp_limit; - struct irq_desc *desc; - - desc = irq_to_desc(irq); - if (!desc) - return; - - /* Switch to the irq stack to handle this */ - curtp = current_thread_info(); - irqtp = hardirq_ctx[smp_processor_id()]; - - if (curtp == irqtp) { - /* We're already on the irq stack, just handle it */ - desc->handle_irq(irq, desc); - return; - } - - saved_sp_limit = current->thread.ksp_limit; - - irqtp->task = curtp->task; - irqtp->flags = 0; - - /* Copy the softirq bits in preempt_count so that the - * softirq checks work in the hardirq context. */ - irqtp->preempt_count = (irqtp->preempt_count & ~SOFTIRQ_MASK) | - (curtp->preempt_count & SOFTIRQ_MASK); - - current->thread.ksp_limit = (unsigned long)irqtp + - _ALIGN_UP(sizeof(struct thread_info), 16); - - call_handle_irq(irq, desc, irqtp, desc->handle_irq); - current->thread.ksp_limit = saved_sp_limit; - irqtp->task = NULL; - - /* Set any flag that may have been set on the - * alternate stack - */ - if (irqtp->flags) - set_bits(irqtp->flags, &curtp->flags); -} - static inline void check_stack_overflow(void) { #ifdef CONFIG_DEBUG_STACKOVERFLOW @@ -501,9 +457,9 @@ static inline void check_stack_overflow(void) #endif } -void do_IRQ(struct pt_regs *regs) +void __do_irq(struct pt_regs *regs) { - struct pt_regs *old_regs = set_irq_regs(regs); + struct irq_desc *desc; unsigned int irq; irq_enter(); @@ -519,18 +475,57 @@ void do_IRQ(struct pt_regs *regs) */ irq = ppc_md.get_irq(); - /* We can hard enable interrupts now */ + /* We can hard enable interrupts now to allow perf interrupts */ may_hard_irq_enable(); /* And finally process it */ - if (irq != NO_IRQ) - handle_one_irq(irq); - else + if (unlikely(irq == NO_IRQ)) __get_cpu_var(irq_stat).spurious_irqs++; + else { + desc = irq_to_desc(irq); + if (likely(desc)) + desc->handle_irq(irq, desc); + } trace_irq_exit(regs); irq_exit(); +} + +void do_IRQ(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + struct thread_info *curtp, *irqtp, *sirqtp; + + /* Switch to the irq stack to handle this */ + curtp = current_thread_info(); + irqtp = hardirq_ctx[raw_smp_processor_id()]; + sirqtp = softirq_ctx[raw_smp_processor_id()]; + + /* Already there ? */ + if (unlikely(curtp == irqtp || curtp == sirqtp)) { + __do_irq(regs); + set_irq_regs(old_regs); + return; + } + + /* Prepare the thread_info in the irq stack */ + irqtp->task = curtp->task; + irqtp->flags = 0; + + /* Copy the preempt_count so that the [soft]irq checks work. */ + irqtp->preempt_count = curtp->preempt_count; + + /* Switch stack and call */ + call_do_irq(regs, irqtp); + + /* Restore stack limit */ + irqtp->task = NULL; + + /* Copy back updates to the thread_info */ + if (irqtp->flags) + set_bits(irqtp->flags, &curtp->flags); + set_irq_regs(old_regs); } @@ -592,28 +587,22 @@ void irq_ctx_init(void) memset((void *)softirq_ctx[i], 0, THREAD_SIZE); tp = softirq_ctx[i]; tp->cpu = i; - tp->preempt_count = 0; memset((void *)hardirq_ctx[i], 0, THREAD_SIZE); tp = hardirq_ctx[i]; tp->cpu = i; - tp->preempt_count = HARDIRQ_OFFSET; } } -static inline void do_softirq_onstack(void) +void do_softirq_own_stack(void) { struct thread_info *curtp, *irqtp; - unsigned long saved_sp_limit = current->thread.ksp_limit; curtp = current_thread_info(); irqtp = softirq_ctx[smp_processor_id()]; irqtp->task = curtp->task; irqtp->flags = 0; - current->thread.ksp_limit = (unsigned long)irqtp + - _ALIGN_UP(sizeof(struct thread_info), 16); call_do_softirq(irqtp); - current->thread.ksp_limit = saved_sp_limit; irqtp->task = NULL; /* Set any flag that may have been set on the @@ -623,21 +612,6 @@ static inline void do_softirq_onstack(void) set_bits(irqtp->flags, &curtp->flags); } -void do_softirq(void) -{ - unsigned long flags; - - if (in_interrupt()) - return; - - local_irq_save(flags); - - if (local_softirq_pending()) - do_softirq_onstack(); - - local_irq_restore(flags); -} - irq_hw_number_t virq_to_hw(unsigned int virq) { struct irq_data *irq_data = irq_get_irq_data(virq); diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 777d999f563b..2b0ad9845363 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -36,26 +36,41 @@ .text +/* + * We store the saved ksp_limit in the unused part + * of the STACK_FRAME_OVERHEAD + */ _GLOBAL(call_do_softirq) mflr r0 stw r0,4(r1) + lwz r10,THREAD+KSP_LIMIT(r2) + addi r11,r3,THREAD_INFO_GAP stwu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3) mr r1,r3 + stw r10,8(r1) + stw r11,THREAD+KSP_LIMIT(r2) bl __do_softirq + lwz r10,8(r1) lwz r1,0(r1) lwz r0,4(r1) + stw r10,THREAD+KSP_LIMIT(r2) mtlr r0 blr -_GLOBAL(call_handle_irq) +_GLOBAL(call_do_irq) mflr r0 stw r0,4(r1) - mtctr r6 - stwu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r5) - mr r1,r5 - bctrl + lwz r10,THREAD+KSP_LIMIT(r2) + addi r11,r3,THREAD_INFO_GAP + stwu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4) + mr r1,r4 + stw r10,8(r1) + stw r11,THREAD+KSP_LIMIT(r2) + bl __do_irq + lwz r10,8(r1) lwz r1,0(r1) lwz r0,4(r1) + stw r10,THREAD+KSP_LIMIT(r2) mtlr r0 blr diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 971d7e78aff2..e59caf874d05 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -40,14 +40,12 @@ _GLOBAL(call_do_softirq) mtlr r0 blr -_GLOBAL(call_handle_irq) - ld r8,0(r6) +_GLOBAL(call_do_irq) mflr r0 std r0,16(r1) - mtctr r8 - stdu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r5) - mr r1,r5 - bctrl + stdu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4) + mr r1,r4 + bl .__do_irq ld r1,0(r1) ld r0,16(r1) mtlr r0 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6f428da53e20..96d2fdf3aa9e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1000,9 +1000,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; p->thread.ksp = sp; +#ifdef CONFIG_PPC32 p->thread.ksp_limit = (unsigned long)task_stack_page(p) + _ALIGN_UP(sizeof(struct thread_info), 16); - +#endif #ifdef CONFIG_HAVE_HW_BREAKPOINT p->thread.ptrace_bps[0] = NULL; #endif diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 12e656ffe60e..5fe2842e8bab 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -196,6 +196,8 @@ static int __initdata mem_reserve_cnt; static cell_t __initdata regbuf[1024]; +static bool rtas_has_query_cpu_stopped; + /* * Error results ... some OF calls will return "-1" on error, some @@ -1574,6 +1576,11 @@ static void __init prom_instantiate_rtas(void) prom_setprop(rtas_node, "/rtas", "linux,rtas-entry", &val, sizeof(val)); + /* Check if it supports "query-cpu-stopped-state" */ + if (prom_getprop(rtas_node, "query-cpu-stopped-state", + &val, sizeof(val)) != PROM_ERROR) + rtas_has_query_cpu_stopped = true; + #if defined(CONFIG_PPC_POWERNV) && defined(__BIG_ENDIAN__) /* PowerVN takeover hack */ prom_rtas_data = base; @@ -1815,6 +1822,18 @@ static void __init prom_hold_cpus(void) = (void *) LOW_ADDR(__secondary_hold_acknowledge); unsigned long secondary_hold = LOW_ADDR(__secondary_hold); + /* + * On pseries, if RTAS supports "query-cpu-stopped-state", + * we skip this stage, the CPUs will be started by the + * kernel using RTAS. + */ + if ((of_platform == PLATFORM_PSERIES || + of_platform == PLATFORM_PSERIES_LPAR) && + rtas_has_query_cpu_stopped) { + prom_printf("prom_hold_cpus: skipped\n"); + return; + } + prom_debug("prom_hold_cpus: start...\n"); prom_debug(" 1) spinloop = 0x%x\n", (unsigned long)spinloop); prom_debug(" 1) *spinloop = 0x%x\n", *spinloop); @@ -3011,6 +3030,8 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, * On non-powermacs, put all CPUs in spin-loops. * * PowerMacs use a different mechanism to spin CPUs + * + * (This must be done after instanciating RTAS) */ if (of_platform != PLATFORM_POWERMAC && of_platform != PLATFORM_OPAL) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b99ef67..b4e667663d9b 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "cacheinfo.h" @@ -179,15 +180,25 @@ SYSFS_PMCSETUP(spurr, SPRN_SPURR); SYSFS_PMCSETUP(dscr, SPRN_DSCR); SYSFS_PMCSETUP(pir, SPRN_PIR); +/* + Lets only enable read for phyp resources and + enable write when needed with a separate function. + Lets be conservative and default to pseries. +*/ static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra); static DEVICE_ATTR(spurr, 0400, show_spurr, NULL); static DEVICE_ATTR(dscr, 0600, show_dscr, store_dscr); -static DEVICE_ATTR(purr, 0600, show_purr, store_purr); +static DEVICE_ATTR(purr, 0400, show_purr, store_purr); static DEVICE_ATTR(pir, 0400, show_pir, NULL); unsigned long dscr_default = 0; EXPORT_SYMBOL(dscr_default); +static void add_write_permission_dev_attr(struct device_attribute *attr) +{ + attr->attr.mode |= 0200; +} + static ssize_t show_dscr_default(struct device *dev, struct device_attribute *attr, char *buf) { @@ -394,8 +405,11 @@ static void register_cpu_online(unsigned int cpu) if (cpu_has_feature(CPU_FTR_MMCRA)) device_create_file(s, &dev_attr_mmcra); - if (cpu_has_feature(CPU_FTR_PURR)) + if (cpu_has_feature(CPU_FTR_PURR)) { + if (!firmware_has_feature(FW_FEATURE_LPAR)) + add_write_permission_dev_attr(&dev_attr_purr); device_create_file(s, &dev_attr_purr); + } if (cpu_has_feature(CPU_FTR_SPURR)) device_create_file(s, &dev_attr_spurr); diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S index 7b60b9851469..cd809eaa8b5c 100644 --- a/arch/powerpc/kernel/tm.S +++ b/arch/powerpc/kernel/tm.S @@ -79,6 +79,11 @@ _GLOBAL(tm_abort) TABORT(R3) blr + .section ".toc","aw" +DSCR_DEFAULT: + .tc dscr_default[TC],dscr_default + + .section ".text" /* void tm_reclaim(struct thread_struct *thread, * unsigned long orig_msr, @@ -123,6 +128,7 @@ _GLOBAL(tm_reclaim) mr r15, r14 ori r15, r15, MSR_FP li r16, MSR_RI + ori r16, r16, MSR_EE /* IRQs hard off */ andc r15, r15, r16 oris r15, r15, MSR_VEC@h #ifdef CONFIG_VSX @@ -187,11 +193,18 @@ dont_backup_fp: std r1, PACATMSCRATCH(r13) ld r1, PACAR1(r13) + /* Store the PPR in r11 and reset to decent value */ + std r11, GPR11(r1) /* Temporary stash */ + mfspr r11, SPRN_PPR + HMT_MEDIUM + /* Now get some more GPRS free */ std r7, GPR7(r1) /* Temporary stash */ std r12, GPR12(r1) /* '' '' '' */ ld r12, STACK_PARAM(0)(r1) /* Param 0, thread_struct * */ + std r11, THREAD_TM_PPR(r12) /* Store PPR and free r11 */ + addi r7, r12, PT_CKPT_REGS /* Thread's ckpt_regs */ /* Make r7 look like an exception frame so that we @@ -203,15 +216,19 @@ dont_backup_fp: SAVE_GPR(0, r7) /* user r0 */ SAVE_GPR(2, r7) /* user r2 */ SAVE_4GPRS(3, r7) /* user r3-r6 */ - SAVE_4GPRS(8, r7) /* user r8-r11 */ + SAVE_GPR(8, r7) /* user r8 */ + SAVE_GPR(9, r7) /* user r9 */ + SAVE_GPR(10, r7) /* user r10 */ ld r3, PACATMSCRATCH(r13) /* user r1 */ ld r4, GPR7(r1) /* user r7 */ - ld r5, GPR12(r1) /* user r12 */ - GET_SCRATCH0(6) /* user r13 */ + ld r5, GPR11(r1) /* user r11 */ + ld r6, GPR12(r1) /* user r12 */ + GET_SCRATCH0(8) /* user r13 */ std r3, GPR1(r7) std r4, GPR7(r7) - std r5, GPR12(r7) - std r6, GPR13(r7) + std r5, GPR11(r7) + std r6, GPR12(r7) + std r8, GPR13(r7) SAVE_NVGPRS(r7) /* user r14-r31 */ @@ -234,14 +251,12 @@ dont_backup_fp: std r6, _XER(r7) - /* ******************** TAR, PPR, DSCR ********** */ + /* ******************** TAR, DSCR ********** */ mfspr r3, SPRN_TAR - mfspr r4, SPRN_PPR - mfspr r5, SPRN_DSCR + mfspr r4, SPRN_DSCR std r3, THREAD_TM_TAR(r12) - std r4, THREAD_TM_PPR(r12) - std r5, THREAD_TM_DSCR(r12) + std r4, THREAD_TM_DSCR(r12) /* MSR and flags: We don't change CRs, and we don't need to alter * MSR. @@ -258,7 +273,7 @@ dont_backup_fp: std r3, THREAD_TM_TFHAR(r12) std r4, THREAD_TM_TFIAR(r12) - /* AMR and PPR are checkpointed too, but are unsupported by Linux. */ + /* AMR is checkpointed too, but is unsupported by Linux. */ /* Restore original MSR/IRQ state & clear TM mode */ ld r14, TM_FRAME_L0(r1) /* Orig MSR */ @@ -274,6 +289,12 @@ dont_backup_fp: mtcr r4 mtlr r0 ld r2, 40(r1) + + /* Load system default DSCR */ + ld r4, DSCR_DEFAULT@toc(r2) + ld r0, 0(r4) + mtspr SPRN_DSCR, r0 + blr @@ -358,25 +379,24 @@ dont_restore_fp: restore_gprs: - /* ******************** TAR, PPR, DSCR ********** */ - ld r4, THREAD_TM_TAR(r3) - ld r5, THREAD_TM_PPR(r3) - ld r6, THREAD_TM_DSCR(r3) + /* ******************** CR,LR,CCR,MSR ********** */ + ld r4, _CTR(r7) + ld r5, _LINK(r7) + ld r6, _CCR(r7) + ld r8, _XER(r7) - mtspr SPRN_TAR, r4 - mtspr SPRN_PPR, r5 - mtspr SPRN_DSCR, r6 + mtctr r4 + mtlr r5 + mtcr r6 + mtxer r8 - /* ******************** CR,LR,CCR,MSR ********** */ - ld r3, _CTR(r7) - ld r4, _LINK(r7) - ld r5, _CCR(r7) - ld r6, _XER(r7) + /* ******************** TAR ******************** */ + ld r4, THREAD_TM_TAR(r3) + mtspr SPRN_TAR, r4 - mtctr r3 - mtlr r4 - mtcr r5 - mtxer r6 + /* Load up the PPR and DSCR in GPRs only at this stage */ + ld r5, THREAD_TM_DSCR(r3) + ld r6, THREAD_TM_PPR(r3) /* Clear the MSR RI since we are about to change R1. EE is already off */ @@ -384,19 +404,26 @@ restore_gprs: mtmsrd r4, 1 REST_4GPRS(0, r7) /* GPR0-3 */ - REST_GPR(4, r7) /* GPR4-6 */ - REST_GPR(5, r7) - REST_GPR(6, r7) + REST_GPR(4, r7) /* GPR4 */ REST_4GPRS(8, r7) /* GPR8-11 */ REST_2GPRS(12, r7) /* GPR12-13 */ REST_NVGPRS(r7) /* GPR14-31 */ - ld r7, GPR7(r7) /* GPR7 */ + /* Load up PPR and DSCR here so we don't run with user values for long + */ + mtspr SPRN_DSCR, r5 + mtspr SPRN_PPR, r6 + + REST_GPR(5, r7) /* GPR5-7 */ + REST_GPR(6, r7) + ld r7, GPR7(r7) /* Commit register state as checkpointed state: */ TRECHKPT + HMT_MEDIUM + /* Our transactional state has now changed. * * Now just get out of here. Transactional (current) state will be @@ -419,6 +446,12 @@ restore_gprs: mtcr r4 mtlr r0 ld r2, 40(r1) + + /* Load system default DSCR */ + ld r4, DSCR_DEFAULT@toc(r2) + ld r0, 0(r4) + mtspr SPRN_DSCR, r0 + blr /* ****************************************************************** */ diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index 78a350670de3..d38cc08b16c7 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1530,11 +1530,15 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, const char *cp; dn = dev->of_node; - if (!dn) - return -ENODEV; + if (!dn) { + strcat(buf, "\n"); + return strlen(buf); + } cp = of_get_property(dn, "compatible", NULL); - if (!cp) - return -ENODEV; + if (!cp) { + strcat(buf, "\n"); + return strlen(buf); + } return sprintf(buf, "vio:T%sS%s\n", vio_dev->type, cp); } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 294b7af28cdd..c71103b8a748 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1066,7 +1066,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) BEGIN_FTR_SECTION mfspr r8, SPRN_DSCR ld r7, HSTATE_DSCR(r13) - std r8, VCPU_DSCR(r7) + std r8, VCPU_DSCR(r9) mtspr SPRN_DSCR, r7 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d729df4..c65593abae8e 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -332,6 +332,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, unsigned long hva; int pfnmap = 0; int tsize = BOOK3E_PAGESZ_4K; + int ret = 0; + unsigned long mmu_seq; + struct kvm *kvm = vcpu_e500->vcpu.kvm; + + /* used to check for invalidations in progress */ + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); /* * Translate guest physical to true physical, acquiring @@ -449,6 +456,12 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1); } + spin_lock(&kvm->mmu_lock); + if (mmu_notifier_retry(kvm, mmu_seq)) { + ret = -EAGAIN; + goto out; + } + kvmppc_e500_ref_setup(ref, gtlbe, pfn); kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize, @@ -457,10 +470,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, /* Clear i-cache for new pages */ kvmppc_mmu_flush_icache(pfn); +out: + spin_unlock(&kvm->mmu_lock); + /* Drop refcount on page, so that mmu notifiers can clear it */ kvm_release_pfn_clean(pfn); - return 0; + return ret; } /* XXX only map the one-one case, for now use TLB0 */ diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S index 167f72555d60..57a072065057 100644 --- a/arch/powerpc/lib/checksum_64.S +++ b/arch/powerpc/lib/checksum_64.S @@ -226,19 +226,35 @@ _GLOBAL(csum_partial) blr - .macro source + .macro srcnr 100: .section __ex_table,"a" .align 3 - .llong 100b,.Lsrc_error + .llong 100b,.Lsrc_error_nr .previous .endm - .macro dest + .macro source +150: + .section __ex_table,"a" + .align 3 + .llong 150b,.Lsrc_error + .previous + .endm + + .macro dstnr 200: .section __ex_table,"a" .align 3 - .llong 200b,.Ldest_error + .llong 200b,.Ldest_error_nr + .previous + .endm + + .macro dest +250: + .section __ex_table,"a" + .align 3 + .llong 250b,.Ldest_error .previous .endm @@ -269,16 +285,16 @@ _GLOBAL(csum_partial_copy_generic) rldicl. r6,r3,64-1,64-2 /* r6 = (r3 & 0x3) >> 1 */ beq .Lcopy_aligned - li r7,4 - sub r6,r7,r6 + li r9,4 + sub r6,r9,r6 mtctr r6 1: -source; lhz r6,0(r3) /* align to doubleword */ +srcnr; lhz r6,0(r3) /* align to doubleword */ subi r5,r5,2 addi r3,r3,2 adde r0,r0,r6 -dest; sth r6,0(r4) +dstnr; sth r6,0(r4) addi r4,r4,2 bdnz 1b @@ -392,10 +408,10 @@ dest; std r16,56(r4) mtctr r6 3: -source; ld r6,0(r3) +srcnr; ld r6,0(r3) addi r3,r3,8 adde r0,r0,r6 -dest; std r6,0(r4) +dstnr; std r6,0(r4) addi r4,r4,8 bdnz 3b @@ -405,10 +421,10 @@ dest; std r6,0(r4) srdi. r6,r5,2 beq .Lcopy_tail_halfword -source; lwz r6,0(r3) +srcnr; lwz r6,0(r3) addi r3,r3,4 adde r0,r0,r6 -dest; stw r6,0(r4) +dstnr; stw r6,0(r4) addi r4,r4,4 subi r5,r5,4 @@ -416,10 +432,10 @@ dest; stw r6,0(r4) srdi. r6,r5,1 beq .Lcopy_tail_byte -source; lhz r6,0(r3) +srcnr; lhz r6,0(r3) addi r3,r3,2 adde r0,r0,r6 -dest; sth r6,0(r4) +dstnr; sth r6,0(r4) addi r4,r4,2 subi r5,r5,2 @@ -427,10 +443,10 @@ dest; sth r6,0(r4) andi. r6,r5,1 beq .Lcopy_finish -source; lbz r6,0(r3) +srcnr; lbz r6,0(r3) sldi r9,r6,8 /* Pad the byte out to 16 bits */ adde r0,r0,r9 -dest; stb r6,0(r4) +dstnr; stb r6,0(r4) .Lcopy_finish: addze r0,r0 /* add in final carry */ @@ -440,6 +456,11 @@ dest; stb r6,0(r4) blr .Lsrc_error: + ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + addi r1,r1,STACKFRAMESIZE +.Lsrc_error_nr: cmpdi 0,r7,0 beqlr li r6,-EFAULT @@ -447,6 +468,11 @@ dest; stb r6,0(r4) blr .Ldest_error: + ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + addi r1,r1,STACKFRAMESIZE +.Ldest_error_nr: cmpdi 0,r8,0 beqlr li r6,-EFAULT diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index a7ee978fb860..b1faa1593c90 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1505,6 +1505,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) */ if ((ra == 1) && !(regs->msr & MSR_PR) \ && (val3 >= (regs->gpr[1] - STACK_INT_FRAME_SIZE))) { +#ifdef CONFIG_PPC32 /* * Check if we will touch kernel sack overflow */ @@ -1513,7 +1514,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) err = -EINVAL; break; } - +#endif /* CONFIG_PPC32 */ /* * Check if we already set since that means we'll * lose the previous value. diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index d0cd9e4c6837..8ed035d2edb5 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -300,5 +300,9 @@ void vmemmap_free(unsigned long start, unsigned long end) { } +void register_page_bootmem_memmap(unsigned long section_nr, + struct page *start_page, unsigned long size) +{ +} #endif /* CONFIG_SPARSEMEM_VMEMMAP */ diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 1cf9c5b67f24..3fa93dc7fe75 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -297,12 +297,21 @@ void __init paging_init(void) } #endif /* ! CONFIG_NEED_MULTIPLE_NODES */ +static void __init register_page_bootmem_info(void) +{ + int i; + + for_each_online_node(i) + register_page_bootmem_info_node(NODE_DATA(i)); +} + void __init mem_init(void) { #ifdef CONFIG_SWIOTLB swiotlb_init(0); #endif + register_page_bootmem_info(); high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); set_max_mapnr(max_pfn); free_all_bootmem(); diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 2ee4a707f0df..a3f7abd2f13f 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -199,6 +199,7 @@ #define MMCR1_UNIT_SHIFT(pmc) (60 - (4 * ((pmc) - 1))) #define MMCR1_COMBINE_SHIFT(pmc) (35 - ((pmc) - 1)) #define MMCR1_PMCSEL_SHIFT(pmc) (24 - (((pmc) - 1)) * 8) +#define MMCR1_FAB_SHIFT 36 #define MMCR1_DC_QUAL_SHIFT 47 #define MMCR1_IC_QUAL_SHIFT 46 @@ -388,8 +389,8 @@ static int power8_compute_mmcr(u64 event[], int n_ev, * the threshold bits are used for the match value. */ if (event_is_fab_match(event[i])) { - mmcr1 |= (event[i] >> EVENT_THR_CTL_SHIFT) & - EVENT_THR_CTL_MASK; + mmcr1 |= ((event[i] >> EVENT_THR_CTL_SHIFT) & + EVENT_THR_CTL_MASK) << MMCR1_FAB_SHIFT; } else { val = (event[i] >> EVENT_THR_CTL_SHIFT) & EVENT_THR_CTL_MASK; mmcra |= val << MMCRA_THR_CTL_SHIFT; diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 1c1771a40250..24f58cb0a543 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -233,18 +233,24 @@ static void __init smp_init_pseries(void) alloc_bootmem_cpumask_var(&of_spin_mask); - /* Mark threads which are still spinning in hold loops. */ - if (cpu_has_feature(CPU_FTR_SMT)) { - for_each_present_cpu(i) { - if (cpu_thread_in_core(i) == 0) - cpumask_set_cpu(i, of_spin_mask); - } - } else { - cpumask_copy(of_spin_mask, cpu_present_mask); + /* + * Mark threads which are still spinning in hold loops + * + * We know prom_init will not have started them if RTAS supports + * query-cpu-stopped-state. + */ + if (rtas_token("query-cpu-stopped-state") == RTAS_UNKNOWN_SERVICE) { + if (cpu_has_feature(CPU_FTR_SMT)) { + for_each_present_cpu(i) { + if (cpu_thread_in_core(i) == 0) + cpumask_set_cpu(i, of_spin_mask); + } + } else + cpumask_copy(of_spin_mask, cpu_present_mask); + + cpumask_clear_cpu(boot_cpuid, of_spin_mask); } - cpumask_clear_cpu(boot_cpuid, of_spin_mask); - /* Non-lpar has additional take/give timebase */ if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) { smp_ops->give_timebase = rtas_give_timebase; diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index dcc6ac2d8026..7143793859fa 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -93,6 +93,7 @@ config S390 select ARCH_INLINE_WRITE_UNLOCK_IRQ select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE select ARCH_SAVE_PAGE_KEYS if HIBERNATION + select ARCH_USE_CMPXCHG_LOCKREF select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT select CLONE_BACKWARDS2 @@ -102,7 +103,6 @@ config S390 select GENERIC_TIME_VSYSCALL_OLD select HAVE_ALIGNED_STRUCT_PAGE if SLUB select HAVE_ARCH_JUMP_LABEL if !MARCH_G5 - select HAVE_ARCH_MUTEX_CPU_RELAX select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK select HAVE_ARCH_TRANSPARENT_HUGEPAGE if 64BIT diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index f313f9cbcf44..7a5288f3479a 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -2,3 +2,4 @@ generic-y += clkdev.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/s390/include/asm/jump_label.h b/arch/s390/include/asm/jump_label.h index 6c32190dc73e..346b1c85ffb4 100644 --- a/arch/s390/include/asm/jump_label.h +++ b/arch/s390/include/asm/jump_label.h @@ -15,7 +15,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("0: brcl 0,0\n" + asm_volatile_goto("0: brcl 0,0\n" ".pushsection __jump_table, \"aw\"\n" ASM_ALIGN "\n" ASM_PTR " 0b, %l[label], %0\n" diff --git a/arch/s390/include/asm/mutex.h b/arch/s390/include/asm/mutex.h index 688271f5f2e4..458c1f7fbc18 100644 --- a/arch/s390/include/asm/mutex.h +++ b/arch/s390/include/asm/mutex.h @@ -7,5 +7,3 @@ */ #include - -#define arch_mutex_cpu_relax() barrier() diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index 0eb37505cab1..ca7821f07260 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -198,6 +198,8 @@ static inline void cpu_relax(void) barrier(); } +#define arch_mutex_cpu_relax() barrier() + static inline void psw_set_key(unsigned int key) { asm volatile("spka 0(%0)" : : "d" (key)); diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h index 701fe8c59e1f..83e5d216105e 100644 --- a/arch/s390/include/asm/spinlock.h +++ b/arch/s390/include/asm/spinlock.h @@ -44,6 +44,11 @@ extern void arch_spin_lock_wait_flags(arch_spinlock_t *, unsigned long flags); extern int arch_spin_trylock_retry(arch_spinlock_t *); extern void arch_spin_relax(arch_spinlock_t *lock); +static inline int arch_spin_value_unlocked(arch_spinlock_t lock) +{ + return lock.owner_cpu == 0; +} + static inline void arch_spin_lock(arch_spinlock_t *lp) { int old; diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index c84f33d51f7b..7dd21720e5b0 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -40,28 +40,26 @@ static inline void *load_real_addr(void *addr) } /* - * Copy up to one page to vmalloc or real memory + * Copy real to virtual or real memory */ -static ssize_t copy_page_real(void *buf, void *src, size_t csize) +static int copy_from_realmem(void *dest, void *src, size_t count) { - size_t size; + unsigned long size; + int rc; - if (is_vmalloc_addr(buf)) { - BUG_ON(csize >= PAGE_SIZE); - /* If buf is not page aligned, copy first part */ - size = min(roundup(__pa(buf), PAGE_SIZE) - __pa(buf), csize); - if (size) { - if (memcpy_real(load_real_addr(buf), src, size)) - return -EFAULT; - buf += size; - src += size; - } - /* Copy second part */ - size = csize - size; - return (size) ? memcpy_real(load_real_addr(buf), src, size) : 0; - } else { - return memcpy_real(buf, src, csize); - } + if (!count) + return 0; + if (!is_vmalloc_or_module_addr(dest)) + return memcpy_real(dest, src, count); + do { + size = min(count, PAGE_SIZE - (__pa(dest) & ~PAGE_MASK)); + if (memcpy_real(load_real_addr(dest), src, size)) + return -EFAULT; + count -= size; + dest += size; + src += size; + } while (count); + return 0; } /* @@ -114,7 +112,7 @@ static ssize_t copy_oldmem_page_kdump(char *buf, size_t csize, rc = copy_to_user_real((void __force __user *) buf, (void *) src, csize); else - rc = copy_page_real(buf, (void *) src, csize); + rc = copy_from_realmem(buf, (void *) src, csize); return (rc == 0) ? rc : csize; } @@ -210,7 +208,7 @@ int copy_from_oldmem(void *dest, void *src, size_t count) if (OLDMEM_BASE) { if ((unsigned long) src < OLDMEM_SIZE) { copied = min(count, OLDMEM_SIZE - (unsigned long) src); - rc = memcpy_real(dest, src + OLDMEM_BASE, copied); + rc = copy_from_realmem(dest, src + OLDMEM_BASE, copied); if (rc) return rc; } @@ -223,7 +221,7 @@ int copy_from_oldmem(void *dest, void *src, size_t count) return rc; } } - return memcpy_real(dest + copied, src + copied, count - copied); + return copy_from_realmem(dest + copied, src + copied, count - copied); } /* diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index cc30d1fb000c..0dc2b6d0a1ec 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -266,6 +266,7 @@ sysc_sigpending: tm __TI_flags+3(%r12),_TIF_SYSCALL jno sysc_return lm %r2,%r7,__PT_R2(%r11) # load svc arguments + l %r10,__TI_sysc_table(%r12) # 31 bit system call table xr %r8,%r8 # svc 0 returns -ENOSYS clc __PT_INT_CODE+2(2,%r11),BASED(.Lnr_syscalls+2) jnl sysc_nr_ok # invalid svc number -> do svc 0 diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S index 2b2188b97c6a..e5b43c97a834 100644 --- a/arch/s390/kernel/entry64.S +++ b/arch/s390/kernel/entry64.S @@ -297,6 +297,7 @@ sysc_sigpending: tm __TI_flags+7(%r12),_TIF_SYSCALL jno sysc_return lmg %r2,%r7,__PT_R2(%r11) # load svc arguments + lg %r10,__TI_sysc_table(%r12) # address of system call table lghi %r8,0 # svc 0 returns -ENOSYS llgh %r1,__PT_INT_CODE+2(%r11) # load new svc number cghi %r1,NR_syscalls diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c index 8ac2097f13d4..bb27a262c44a 100644 --- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -157,39 +157,29 @@ int arch_show_interrupts(struct seq_file *p, int prec) /* * Switch to the asynchronous interrupt stack for softirq execution. */ -asmlinkage void do_softirq(void) +void do_softirq_own_stack(void) { - unsigned long flags, old, new; - - if (in_interrupt()) - return; - - local_irq_save(flags); - - if (local_softirq_pending()) { - /* Get current stack pointer. */ - asm volatile("la %0,0(15)" : "=a" (old)); - /* Check against async. stack address range. */ - new = S390_lowcore.async_stack; - if (((new - old) >> (PAGE_SHIFT + THREAD_ORDER)) != 0) { - /* Need to switch to the async. stack. */ - new -= STACK_FRAME_OVERHEAD; - ((struct stack_frame *) new)->back_chain = old; - - asm volatile(" la 15,0(%0)\n" - " basr 14,%2\n" - " la 15,0(%1)\n" - : : "a" (new), "a" (old), - "a" (__do_softirq) - : "0", "1", "2", "3", "4", "5", "14", - "cc", "memory" ); - } else { - /* We are already on the async stack. */ - __do_softirq(); - } + unsigned long old, new; + + /* Get current stack pointer. */ + asm volatile("la %0,0(15)" : "=a" (old)); + /* Check against async. stack address range. */ + new = S390_lowcore.async_stack; + if (((new - old) >> (PAGE_SHIFT + THREAD_ORDER)) != 0) { + /* Need to switch to the async. stack. */ + new -= STACK_FRAME_OVERHEAD; + ((struct stack_frame *) new)->back_chain = old; + asm volatile(" la 15,0(%0)\n" + " basr 14,%2\n" + " la 15,0(%1)\n" + : : "a" (new), "a" (old), + "a" (__do_softirq) + : "0", "1", "2", "3", "4", "5", "14", + "cc", "memory" ); + } else { + /* We are already on the async stack. */ + __do_softirq(); } - - local_irq_restore(flags); } /* diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c index 0ce9fb245034..d86e64eddb42 100644 --- a/arch/s390/kernel/kprobes.c +++ b/arch/s390/kernel/kprobes.c @@ -67,6 +67,11 @@ static int __kprobes is_prohibited_opcode(kprobe_opcode_t *insn) case 0xac: /* stnsm */ case 0xad: /* stosm */ return -EINVAL; + case 0xc6: + switch (insn[0] & 0x0f) { + case 0x00: /* exrl */ + return -EINVAL; + } } switch (insn[0]) { case 0x0101: /* pr */ @@ -180,7 +185,6 @@ static int __kprobes is_insn_relative_long(kprobe_opcode_t *insn) break; case 0xc6: switch (insn[0] & 0x0f) { - case 0x00: /* exrl */ case 0x02: /* pfdrl */ case 0x04: /* cghrl */ case 0x05: /* chrl */ diff --git a/arch/score/Kconfig b/arch/score/Kconfig index a1be70db75fe..305f7ee1f382 100644 --- a/arch/score/Kconfig +++ b/arch/score/Kconfig @@ -2,6 +2,7 @@ menu "Machine selection" config SCORE def_bool y + select HAVE_GENERIC_HARDIRQS select GENERIC_IRQ_SHOW select GENERIC_IOMAP select GENERIC_ATOMIC64 @@ -110,3 +111,6 @@ source "security/Kconfig" source "crypto/Kconfig" source "lib/Kconfig" + +config NO_IOMEM + def_bool y diff --git a/arch/score/Makefile b/arch/score/Makefile index 974aefe86123..9e3e060290e0 100644 --- a/arch/score/Makefile +++ b/arch/score/Makefile @@ -20,8 +20,8 @@ cflags-y += -G0 -pipe -mel -mnhwloop -D__SCOREEL__ \ # KBUILD_AFLAGS += $(cflags-y) KBUILD_CFLAGS += $(cflags-y) -KBUILD_AFLAGS_MODULE += -mlong-calls -KBUILD_CFLAGS_MODULE += -mlong-calls +KBUILD_AFLAGS_MODULE += +KBUILD_CFLAGS_MODULE += LDFLAGS += --oformat elf32-littlescore LDFLAGS_vmlinux += -G0 -static -nostdlib diff --git a/arch/score/include/asm/Kbuild b/arch/score/include/asm/Kbuild index e1c7bb999b06..f3414ade77a3 100644 --- a/arch/score/include/asm/Kbuild +++ b/arch/score/include/asm/Kbuild @@ -4,3 +4,4 @@ header-y += generic-y += clkdev.h generic-y += trace_clock.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/score/include/asm/checksum.h b/arch/score/include/asm/checksum.h index f909ac3144a4..961bd64015a8 100644 --- a/arch/score/include/asm/checksum.h +++ b/arch/score/include/asm/checksum.h @@ -184,48 +184,57 @@ static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, __wsum sum) { __asm__ __volatile__( - ".set\tnoreorder\t\t\t# csum_ipv6_magic\n\t" - ".set\tnoat\n\t" - "addu\t%0, %5\t\t\t# proto (long in network byte order)\n\t" - "sltu\t$1, %0, %5\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %6\t\t\t# csum\n\t" - "sltu\t$1, %0, %6\n\t" - "lw\t%1, 0(%2)\t\t\t# four words source address\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 4(%2)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 8(%2)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 12(%2)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 0(%3)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 4(%3)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 8(%3)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "lw\t%1, 12(%3)\n\t" - "addu\t%0, $1\n\t" - "addu\t%0, %1\n\t" - "sltu\t$1, %0, %1\n\t" - "addu\t%0, $1\t\t\t# Add final carry\n\t" - ".set\tnoat\n\t" - ".set\tnoreorder" + ".set\tvolatile\t\t\t# csum_ipv6_magic\n\t" + "add\t%0, %0, %5\t\t\t# proto (long in network byte order)\n\t" + "cmp.c\t%5, %0\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %6\t\t\t# csum\n\t" + "cmp.c\t%6, %0\n\t" + "lw\t%1, [%2, 0]\t\t\t# four words source address\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "1:lw\t%1, [%2, 4]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%2,8]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%2, 12]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0,%1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%3, 0]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%3, 4]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%3, 8]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "lw\t%1, [%3, 12]\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:add\t%0, %0, %1\n\t" + "cmp.c\t%1, %0\n\t" + "bleu 1f\n\t" + "addi\t%0, 0x1\n\t" + "1:\n\t" + ".set\toptimize" : "=r" (sum), "=r" (proto) : "r" (saddr), "r" (daddr), "0" (htonl(len)), "1" (htonl(proto)), "r" (sum)); diff --git a/arch/score/include/asm/io.h b/arch/score/include/asm/io.h index fbbfd7132e3b..574c8827abe2 100644 --- a/arch/score/include/asm/io.h +++ b/arch/score/include/asm/io.h @@ -5,5 +5,4 @@ #define virt_to_bus virt_to_phys #define bus_to_virt phys_to_virt - #endif /* _ASM_SCORE_IO_H */ diff --git a/arch/score/include/asm/pgalloc.h b/arch/score/include/asm/pgalloc.h index 059a61b7071b..716b3fd1d863 100644 --- a/arch/score/include/asm/pgalloc.h +++ b/arch/score/include/asm/pgalloc.h @@ -2,7 +2,7 @@ #define _ASM_SCORE_PGALLOC_H #include - +#include static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) { diff --git a/arch/score/kernel/entry.S b/arch/score/kernel/entry.S index 7234ed09b7b7..befb87d30a89 100644 --- a/arch/score/kernel/entry.S +++ b/arch/score/kernel/entry.S @@ -264,7 +264,7 @@ resume_kernel: disable_irq lw r8, [r28, TI_PRE_COUNT] cmpz.c r8 - bne r8, restore_all + bne restore_all need_resched: lw r8, [r28, TI_FLAGS] andri.c r9, r8, _TIF_NEED_RESCHED @@ -415,7 +415,7 @@ ENTRY(handle_sys) sw r9, [r0, PT_EPC] cmpi.c r27, __NR_syscalls # check syscall number - bgeu illegal_syscall + bcs illegal_syscall slli r8, r27, 2 # get syscall routine la r11, sys_call_table diff --git a/arch/score/kernel/process.c b/arch/score/kernel/process.c index f4c6d02421d3..a1519ad3d49d 100644 --- a/arch/score/kernel/process.c +++ b/arch/score/kernel/process.c @@ -78,8 +78,8 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, p->thread.reg0 = (unsigned long) childregs; if (unlikely(p->flags & PF_KTHREAD)) { memset(childregs, 0, sizeof(struct pt_regs)); - p->thread->reg12 = usp; - p->thread->reg13 = arg; + p->thread.reg12 = usp; + p->thread.reg13 = arg; p->thread.reg3 = (unsigned long) ret_from_kernel_thread; } else { *childregs = *current_pt_regs(); diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index 280bea9e5e2b..231efbb68108 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -34,3 +34,4 @@ generic-y += termios.h generic-y += trace_clock.h generic-y += ucontext.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c index 063af10ff3c1..0833736afa32 100644 --- a/arch/sh/kernel/irq.c +++ b/arch/sh/kernel/irq.c @@ -149,47 +149,32 @@ void irq_ctx_exit(int cpu) hardirq_ctx[cpu] = NULL; } -asmlinkage void do_softirq(void) +void do_softirq_own_stack(void) { - unsigned long flags; struct thread_info *curctx; union irq_ctx *irqctx; u32 *isp; - if (in_interrupt()) - return; - - local_irq_save(flags); - - if (local_softirq_pending()) { - curctx = current_thread_info(); - irqctx = softirq_ctx[smp_processor_id()]; - irqctx->tinfo.task = curctx->task; - irqctx->tinfo.previous_sp = current_stack_pointer; - - /* build the stack frame on the softirq stack */ - isp = (u32 *)((char *)irqctx + sizeof(*irqctx)); - - __asm__ __volatile__ ( - "mov r15, r9 \n" - "jsr @%0 \n" - /* switch to the softirq stack */ - " mov %1, r15 \n" - /* restore the thread stack */ - "mov r9, r15 \n" - : /* no outputs */ - : "r" (__do_softirq), "r" (isp) - : "memory", "r0", "r1", "r2", "r3", "r4", - "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr" - ); - - /* - * Shouldn't happen, we returned above if in_interrupt(): - */ - WARN_ON_ONCE(softirq_count()); - } - - local_irq_restore(flags); + curctx = current_thread_info(); + irqctx = softirq_ctx[smp_processor_id()]; + irqctx->tinfo.task = curctx->task; + irqctx->tinfo.previous_sp = current_stack_pointer; + + /* build the stack frame on the softirq stack */ + isp = (u32 *)((char *)irqctx + sizeof(*irqctx)); + + __asm__ __volatile__ ( + "mov r15, r9 \n" + "jsr @%0 \n" + /* switch to the softirq stack */ + " mov %1, r15 \n" + /* restore the thread stack */ + "mov r9, r15 \n" + : /* no outputs */ + : "r" (__do_softirq), "r" (isp) + : "memory", "r0", "r1", "r2", "r3", "r4", + "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr" + ); } #else static inline void handle_one_irq(unsigned int irq) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 2137ad667438..78c4fdb91bc5 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -506,12 +506,17 @@ config SUN_OPENPROMFS Only choose N if you know in advance that you will not need to modify OpenPROM settings on the running system. -# Makefile helper +# Makefile helpers config SPARC64_PCI bool default y depends on SPARC64 && PCI +config SPARC64_PCI_MSI + bool + default y + depends on SPARC64_PCI && PCI_MSI + endmenu menu "Executable file formats" diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 7e4a97fbded4..bf390667657a 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -16,3 +16,4 @@ generic-y += serial.h generic-y += trace_clock.h generic-y += types.h generic-y += word-at-a-time.h +generic-y += preempt.h diff --git a/arch/sparc/include/asm/floppy_64.h b/arch/sparc/include/asm/floppy_64.h index e204f902e6c9..7c90c50c200d 100644 --- a/arch/sparc/include/asm/floppy_64.h +++ b/arch/sparc/include/asm/floppy_64.h @@ -254,7 +254,7 @@ static int sun_fd_request_irq(void) once = 1; error = request_irq(FLOPPY_IRQ, sparc_floppy_irq, - IRQF_DISABLED, "floppy", NULL); + 0, "floppy", NULL); return ((error == 0) ? 0 : -1); } diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h index 5080d16a832f..ec2e2e2aba7d 100644 --- a/arch/sparc/include/asm/jump_label.h +++ b/arch/sparc/include/asm/jump_label.h @@ -9,7 +9,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("1:\n\t" + asm_volatile_goto("1:\n\t" "nop\n\t" "nop\n\t" ".pushsection __jump_table, \"aw\"\n\t" diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile index d432fb20358e..d15cc1794b0e 100644 --- a/arch/sparc/kernel/Makefile +++ b/arch/sparc/kernel/Makefile @@ -1,3 +1,4 @@ + # # Makefile for the linux kernel. # @@ -99,7 +100,7 @@ obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_SPARC64_PCI) += pci.o pci_common.o psycho_common.o obj-$(CONFIG_SPARC64_PCI) += pci_psycho.o pci_sabre.o pci_schizo.o obj-$(CONFIG_SPARC64_PCI) += pci_sun4v.o pci_sun4v_asm.o pci_fire.o -obj-$(CONFIG_PCI_MSI) += pci_msi.o +obj-$(CONFIG_SPARC64_PCI_MSI) += pci_msi.o obj-$(CONFIG_COMPAT) += sys32.o sys_sparc32.o signal32.o diff --git a/arch/sparc/kernel/ds.c b/arch/sparc/kernel/ds.c index 62d6b153ffa2..dff60abbea01 100644 --- a/arch/sparc/kernel/ds.c +++ b/arch/sparc/kernel/ds.c @@ -849,9 +849,8 @@ void ldom_reboot(const char *boot_command) if (boot_command && strlen(boot_command)) { unsigned long len; - strcpy(full_boot_str, "boot "); - strlcpy(full_boot_str + strlen("boot "), boot_command, - sizeof(full_boot_str + strlen("boot "))); + snprintf(full_boot_str, sizeof(full_boot_str), "boot %s", + boot_command); len = strlen(full_boot_str); if (reboot_data_supported) { diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c index d4840cec2c55..666193f4e8bb 100644 --- a/arch/sparc/kernel/irq_64.c +++ b/arch/sparc/kernel/irq_64.c @@ -698,30 +698,19 @@ void __irq_entry handler_irq(int pil, struct pt_regs *regs) set_irq_regs(old_regs); } -void do_softirq(void) +void do_softirq_own_stack(void) { - unsigned long flags; - - if (in_interrupt()) - return; - - local_irq_save(flags); + void *orig_sp, *sp = softirq_stack[smp_processor_id()]; - if (local_softirq_pending()) { - void *orig_sp, *sp = softirq_stack[smp_processor_id()]; - - sp += THREAD_SIZE - 192 - STACK_BIAS; - - __asm__ __volatile__("mov %%sp, %0\n\t" - "mov %1, %%sp" - : "=&r" (orig_sp) - : "r" (sp)); - __do_softirq(); - __asm__ __volatile__("mov %0, %%sp" - : : "r" (orig_sp)); - } + sp += THREAD_SIZE - 192 - STACK_BIAS; - local_irq_restore(flags); + __asm__ __volatile__("mov %%sp, %0\n\t" + "mov %1, %%sp" + : "=&r" (orig_sp) + : "r" (sp)); + __do_softirq(); + __asm__ __volatile__("mov %0, %%sp" + : : "r" (orig_sp)); } #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/sparc/kernel/ldc.c b/arch/sparc/kernel/ldc.c index 54df554b82d9..e01d75d40329 100644 --- a/arch/sparc/kernel/ldc.c +++ b/arch/sparc/kernel/ldc.c @@ -1249,12 +1249,12 @@ int ldc_bind(struct ldc_channel *lp, const char *name) snprintf(lp->rx_irq_name, LDC_IRQ_NAME_MAX, "%s RX", name); snprintf(lp->tx_irq_name, LDC_IRQ_NAME_MAX, "%s TX", name); - err = request_irq(lp->cfg.rx_irq, ldc_rx, IRQF_DISABLED, + err = request_irq(lp->cfg.rx_irq, ldc_rx, 0, lp->rx_irq_name, lp); if (err) return err; - err = request_irq(lp->cfg.tx_irq, ldc_tx, IRQF_DISABLED, + err = request_irq(lp->cfg.tx_irq, ldc_tx, 0, lp->tx_irq_name, lp); if (err) { free_irq(lp->cfg.rx_irq, lp); diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild index 664d6ad23f80..22f3bd147fa7 100644 --- a/arch/tile/include/asm/Kbuild +++ b/arch/tile/include/asm/Kbuild @@ -38,3 +38,4 @@ generic-y += termios.h generic-y += trace_clock.h generic-y += types.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/tile/include/asm/atomic.h b/arch/tile/include/asm/atomic.h index d385eaadece7..709798460763 100644 --- a/arch/tile/include/asm/atomic.h +++ b/arch/tile/include/asm/atomic.h @@ -166,7 +166,7 @@ static inline int atomic_cmpxchg(atomic_t *v, int o, int n) * * Atomically sets @v to @i and returns old @v */ -static inline u64 atomic64_xchg(atomic64_t *v, u64 n) +static inline long long atomic64_xchg(atomic64_t *v, long long n) { return xchg64(&v->counter, n); } @@ -180,7 +180,8 @@ static inline u64 atomic64_xchg(atomic64_t *v, u64 n) * Atomically checks if @v holds @o and replaces it with @n if so. * Returns the old value at @v. */ -static inline u64 atomic64_cmpxchg(atomic64_t *v, u64 o, u64 n) +static inline long long atomic64_cmpxchg(atomic64_t *v, long long o, + long long n) { return cmpxchg64(&v->counter, o, n); } diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h index 0d0395b1b152..1ad4a1f7d42b 100644 --- a/arch/tile/include/asm/atomic_32.h +++ b/arch/tile/include/asm/atomic_32.h @@ -80,7 +80,7 @@ static inline void atomic_set(atomic_t *v, int n) /* A 64bit atomic type */ typedef struct { - u64 __aligned(8) counter; + long long counter; } atomic64_t; #define ATOMIC64_INIT(val) { (val) } @@ -91,14 +91,14 @@ typedef struct { * * Atomically reads the value of @v. */ -static inline u64 atomic64_read(const atomic64_t *v) +static inline long long atomic64_read(const atomic64_t *v) { /* * Requires an atomic op to read both 32-bit parts consistently. * Casting away const is safe since the atomic support routines * do not write to memory if the value has not been modified. */ - return _atomic64_xchg_add((u64 *)&v->counter, 0); + return _atomic64_xchg_add((long long *)&v->counter, 0); } /** @@ -108,7 +108,7 @@ static inline u64 atomic64_read(const atomic64_t *v) * * Atomically adds @i to @v. */ -static inline void atomic64_add(u64 i, atomic64_t *v) +static inline void atomic64_add(long long i, atomic64_t *v) { _atomic64_xchg_add(&v->counter, i); } @@ -120,7 +120,7 @@ static inline void atomic64_add(u64 i, atomic64_t *v) * * Atomically adds @i to @v and returns @i + @v */ -static inline u64 atomic64_add_return(u64 i, atomic64_t *v) +static inline long long atomic64_add_return(long long i, atomic64_t *v) { smp_mb(); /* barrier for proper semantics */ return _atomic64_xchg_add(&v->counter, i) + i; @@ -135,7 +135,8 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v) * Atomically adds @a to @v, so long as @v was not already @u. * Returns non-zero if @v was not @u, and zero otherwise. */ -static inline u64 atomic64_add_unless(atomic64_t *v, u64 a, u64 u) +static inline long long atomic64_add_unless(atomic64_t *v, long long a, + long long u) { smp_mb(); /* barrier for proper semantics */ return _atomic64_xchg_add_unless(&v->counter, a, u) != u; @@ -151,7 +152,7 @@ static inline u64 atomic64_add_unless(atomic64_t *v, u64 a, u64 u) * atomic64_set() can't be just a raw store, since it would be lost if it * fell between the load and store of one of the other atomic ops. */ -static inline void atomic64_set(atomic64_t *v, u64 n) +static inline void atomic64_set(atomic64_t *v, long long n) { _atomic64_xchg(&v->counter, n); } @@ -236,11 +237,13 @@ extern struct __get_user __atomic_xchg_add_unless(volatile int *p, extern struct __get_user __atomic_or(volatile int *p, int *lock, int n); extern struct __get_user __atomic_andn(volatile int *p, int *lock, int n); extern struct __get_user __atomic_xor(volatile int *p, int *lock, int n); -extern u64 __atomic64_cmpxchg(volatile u64 *p, int *lock, u64 o, u64 n); -extern u64 __atomic64_xchg(volatile u64 *p, int *lock, u64 n); -extern u64 __atomic64_xchg_add(volatile u64 *p, int *lock, u64 n); -extern u64 __atomic64_xchg_add_unless(volatile u64 *p, - int *lock, u64 o, u64 n); +extern long long __atomic64_cmpxchg(volatile long long *p, int *lock, + long long o, long long n); +extern long long __atomic64_xchg(volatile long long *p, int *lock, long long n); +extern long long __atomic64_xchg_add(volatile long long *p, int *lock, + long long n); +extern long long __atomic64_xchg_add_unless(volatile long long *p, + int *lock, long long o, long long n); /* Return failure from the atomic wrappers. */ struct __get_user __atomic_bad_address(int __user *addr); diff --git a/arch/tile/include/asm/cmpxchg.h b/arch/tile/include/asm/cmpxchg.h index 4001d5eab4bb..0ccda3c425be 100644 --- a/arch/tile/include/asm/cmpxchg.h +++ b/arch/tile/include/asm/cmpxchg.h @@ -35,10 +35,10 @@ int _atomic_xchg(int *ptr, int n); int _atomic_xchg_add(int *v, int i); int _atomic_xchg_add_unless(int *v, int a, int u); int _atomic_cmpxchg(int *ptr, int o, int n); -u64 _atomic64_xchg(u64 *v, u64 n); -u64 _atomic64_xchg_add(u64 *v, u64 i); -u64 _atomic64_xchg_add_unless(u64 *v, u64 a, u64 u); -u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); +long long _atomic64_xchg(long long *v, long long n); +long long _atomic64_xchg_add(long long *v, long long i); +long long _atomic64_xchg_add_unless(long long *v, long long a, long long u); +long long _atomic64_cmpxchg(long long *v, long long o, long long n); #define xchg(ptr, n) \ ({ \ @@ -53,7 +53,8 @@ u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); if (sizeof(*(ptr)) != 4) \ __cmpxchg_called_with_bad_pointer(); \ smp_mb(); \ - (typeof(*(ptr)))_atomic_cmpxchg((int *)ptr, (int)o, (int)n); \ + (typeof(*(ptr)))_atomic_cmpxchg((int *)ptr, (int)o, \ + (int)n); \ }) #define xchg64(ptr, n) \ @@ -61,7 +62,8 @@ u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); if (sizeof(*(ptr)) != 8) \ __xchg_called_with_bad_pointer(); \ smp_mb(); \ - (typeof(*(ptr)))_atomic64_xchg((u64 *)(ptr), (u64)(n)); \ + (typeof(*(ptr)))_atomic64_xchg((long long *)(ptr), \ + (long long)(n)); \ }) #define cmpxchg64(ptr, o, n) \ @@ -69,7 +71,8 @@ u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); if (sizeof(*(ptr)) != 8) \ __cmpxchg_called_with_bad_pointer(); \ smp_mb(); \ - (typeof(*(ptr)))_atomic64_cmpxchg((u64 *)ptr, (u64)o, (u64)n); \ + (typeof(*(ptr)))_atomic64_cmpxchg((long long *)ptr, \ + (long long)o, (long long)n); \ }) #else @@ -81,10 +84,11 @@ u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); switch (sizeof(*(ptr))) { \ case 4: \ __x = (typeof(__x))(unsigned long) \ - __insn_exch4((ptr), (u32)(unsigned long)(n)); \ + __insn_exch4((ptr), \ + (u32)(unsigned long)(n)); \ break; \ case 8: \ - __x = (typeof(__x)) \ + __x = (typeof(__x)) \ __insn_exch((ptr), (unsigned long)(n)); \ break; \ default: \ @@ -103,10 +107,12 @@ u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n); switch (sizeof(*(ptr))) { \ case 4: \ __x = (typeof(__x))(unsigned long) \ - __insn_cmpexch4((ptr), (u32)(unsigned long)(n)); \ + __insn_cmpexch4((ptr), \ + (u32)(unsigned long)(n)); \ break; \ case 8: \ - __x = (typeof(__x))__insn_cmpexch((ptr), (u64)(n)); \ + __x = (typeof(__x))__insn_cmpexch((ptr), \ + (long long)(n)); \ break; \ default: \ __cmpxchg_called_with_bad_pointer(); \ diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h index 63294f5a8efb..4f7ae39fa202 100644 --- a/arch/tile/include/asm/percpu.h +++ b/arch/tile/include/asm/percpu.h @@ -15,9 +15,37 @@ #ifndef _ASM_TILE_PERCPU_H #define _ASM_TILE_PERCPU_H -register unsigned long __my_cpu_offset __asm__("tp"); -#define __my_cpu_offset __my_cpu_offset -#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp)) +register unsigned long my_cpu_offset_reg asm("tp"); + +#ifdef CONFIG_PREEMPT +/* + * For full preemption, we can't just use the register variable + * directly, since we need barrier() to hazard against it, causing the + * compiler to reload anything computed from a previous "tp" value. + * But we also don't want to use volatile asm, since we'd like the + * compiler to be able to cache the value across multiple percpu reads. + * So we use a fake stack read as a hazard against barrier(). + * The 'U' constraint is like 'm' but disallows postincrement. + */ +static inline unsigned long __my_cpu_offset(void) +{ + unsigned long tp; + register unsigned long *sp asm("sp"); + asm("move %0, tp" : "=r" (tp) : "U" (*sp)); + return tp; +} +#define __my_cpu_offset __my_cpu_offset() +#else +/* + * We don't need to hazard against barrier() since "tp" doesn't ever + * change with PREEMPT_NONE, and with PREEMPT_VOLUNTARY it only + * changes at function call points, at which we are already re-reading + * the value of "tp" due to "my_cpu_offset_reg" being a global variable. + */ +#define __my_cpu_offset my_cpu_offset_reg +#endif + +#define set_my_cpu_offset(tp) (my_cpu_offset_reg = (tp)) #include diff --git a/arch/tile/kernel/hardwall.c b/arch/tile/kernel/hardwall.c index df27a1fd94a3..531f4c365351 100644 --- a/arch/tile/kernel/hardwall.c +++ b/arch/tile/kernel/hardwall.c @@ -66,7 +66,7 @@ static struct hardwall_type hardwall_types[] = { 0, "udn", LIST_HEAD_INIT(hardwall_types[HARDWALL_UDN].list), - __SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_UDN].lock), + __SPIN_LOCK_UNLOCKED(hardwall_types[HARDWALL_UDN].lock), NULL }, #ifndef __tilepro__ @@ -77,7 +77,7 @@ static struct hardwall_type hardwall_types[] = { 1, /* disabled pending hypervisor support */ "idn", LIST_HEAD_INIT(hardwall_types[HARDWALL_IDN].list), - __SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_IDN].lock), + __SPIN_LOCK_UNLOCKED(hardwall_types[HARDWALL_IDN].lock), NULL }, { /* access to user-space IPI */ @@ -87,7 +87,7 @@ static struct hardwall_type hardwall_types[] = { 0, "ipi", LIST_HEAD_INIT(hardwall_types[HARDWALL_IPI].list), - __SPIN_LOCK_INITIALIZER(hardwall_types[HARDWALL_IPI].lock), + __SPIN_LOCK_UNLOCKED(hardwall_types[HARDWALL_IPI].lock), NULL }, #endif diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S index 088d5c141e68..2cbe6d5dd6b0 100644 --- a/arch/tile/kernel/intvec_32.S +++ b/arch/tile/kernel/intvec_32.S @@ -815,6 +815,9 @@ STD_ENTRY(interrupt_return) } bzt r28, 1f bnz r29, 1f + /* Disable interrupts explicitly for preemption. */ + IRQ_DISABLE(r20,r21) + TRACE_IRQS_OFF jal preempt_schedule_irq FEEDBACK_REENTER(interrupt_return) 1: diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S index ec755d3f3734..b8fc497f2437 100644 --- a/arch/tile/kernel/intvec_64.S +++ b/arch/tile/kernel/intvec_64.S @@ -841,6 +841,9 @@ STD_ENTRY(interrupt_return) } beqzt r28, 1f bnez r29, 1f + /* Disable interrupts explicitly for preemption. */ + IRQ_DISABLE(r20,r21) + TRACE_IRQS_OFF jal preempt_schedule_irq FEEDBACK_REENTER(interrupt_return) 1: diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c index 362284af3afd..c93977a62116 100644 --- a/arch/tile/kernel/stack.c +++ b/arch/tile/kernel/stack.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -332,21 +333,18 @@ static void describe_addr(struct KBacktraceIterator *kbt, } if (vma->vm_file) { - char *s; p = d_path(&vma->vm_file->f_path, buf, bufsize); if (IS_ERR(p)) p = "?"; - s = strrchr(p, '/'); - if (s) - p = s+1; + name = kbasename(p); } else { - p = "anon"; + name = "anon"; } /* Generate a string description of the vma info. */ - namelen = strlen(p); + namelen = strlen(name); remaining = (bufsize - 1) - namelen; - memmove(buf, p, namelen); + memmove(buf, name, namelen); snprintf(buf + namelen, remaining, "[%lx+%lx] ", vma->vm_start, vma->vm_end - vma->vm_start); } diff --git a/arch/tile/lib/atomic_32.c b/arch/tile/lib/atomic_32.c index 759efa337be8..c89b211fd9e7 100644 --- a/arch/tile/lib/atomic_32.c +++ b/arch/tile/lib/atomic_32.c @@ -107,19 +107,19 @@ unsigned long _atomic_xor(volatile unsigned long *p, unsigned long mask) EXPORT_SYMBOL(_atomic_xor); -u64 _atomic64_xchg(u64 *v, u64 n) +long long _atomic64_xchg(long long *v, long long n) { return __atomic64_xchg(v, __atomic_setup(v), n); } EXPORT_SYMBOL(_atomic64_xchg); -u64 _atomic64_xchg_add(u64 *v, u64 i) +long long _atomic64_xchg_add(long long *v, long long i) { return __atomic64_xchg_add(v, __atomic_setup(v), i); } EXPORT_SYMBOL(_atomic64_xchg_add); -u64 _atomic64_xchg_add_unless(u64 *v, u64 a, u64 u) +long long _atomic64_xchg_add_unless(long long *v, long long a, long long u) { /* * Note: argument order is switched here since it is easier @@ -130,7 +130,7 @@ u64 _atomic64_xchg_add_unless(u64 *v, u64 a, u64 u) } EXPORT_SYMBOL(_atomic64_xchg_add_unless); -u64 _atomic64_cmpxchg(u64 *v, u64 o, u64 n) +long long _atomic64_cmpxchg(long long *v, long long o, long long n) { return __atomic64_cmpxchg(v, __atomic_setup(v), o, n); } diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index b30f34a79882..fdde187e6087 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.h percpu.h sections.h topology.h xor.h generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h generic-y += switch_to.h clkdev.h generic-y += trace_clock.h +generic-y += preempt.h diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild index 89d8b6c4e39a..00045cbe5c63 100644 --- a/arch/unicore32/include/asm/Kbuild +++ b/arch/unicore32/include/asm/Kbuild @@ -60,3 +60,4 @@ generic-y += unaligned.h generic-y += user.h generic-y += vga.h generic-y += xor.h +generic-y += preempt.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ee2fb9d37745..6af783f54725 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select COMPAT_OLD_SIGACTION if IA32_EMULATION select RTC_LIB select HAVE_DEBUG_STACKOVERFLOW + select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64 config INSTRUCTION_DECODER def_bool y @@ -756,20 +757,25 @@ config DMI BIOS code. config GART_IOMMU - bool "GART IOMMU support" if EXPERT - default y + bool "Old AMD GART IOMMU support" select SWIOTLB depends on X86_64 && PCI && AMD_NB ---help--- - Support for full DMA access of devices with 32bit memory access only - on systems with more than 3GB. This is usually needed for USB, - sound, many IDE/SATA chipsets and some other devices. - Provides a driver for the AMD Athlon64/Opteron/Turion/Sempron GART - based hardware IOMMU and a software bounce buffer based IOMMU used - on Intel systems and as fallback. - The code is only active when needed (enough memory and limited - device) unless CONFIG_IOMMU_DEBUG or iommu=force is specified - too. + Provides a driver for older AMD Athlon64/Opteron/Turion/Sempron + GART based hardware IOMMUs. + + The GART supports full DMA access for devices with 32-bit access + limitations, on systems with more than 3 GB. This is usually needed + for USB, sound, many IDE/SATA chipsets and some other devices. + + Newer systems typically have a modern AMD IOMMU, supported via + the CONFIG_AMD_IOMMU=y config option. + + In normal configurations this driver is only active when needed: + there's more than 3 GB of memory and the system contains a + 32-bit limited device. + + If unsure, say Y. config CALGARY_IOMMU bool "IBM Calgary IOMMU support" @@ -860,7 +866,7 @@ source "kernel/Kconfig.preempt" config X86_UP_APIC bool "Local APIC support on uniprocessors" - depends on X86_32 && !SMP && !X86_32_NON_STANDARD + depends on X86_32 && !SMP && !X86_32_NON_STANDARD && !PCI_MSI ---help--- A local APIC (Advanced Programmable Interrupt Controller) is an integrated interrupt controller in the CPU. If you have a single-CPU @@ -885,11 +891,11 @@ config X86_UP_IOAPIC config X86_LOCAL_APIC def_bool y - depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC + depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI config X86_IO_APIC def_bool y - depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC + depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC || PCI_MSI config X86_VISWS_APIC def_bool y @@ -1033,6 +1039,7 @@ config X86_REBOOTFIXUPS config MICROCODE tristate "CPU microcode loading support" + depends on CPU_SUP_AMD || CPU_SUP_INTEL select FW_LOADER ---help--- @@ -1593,7 +1600,7 @@ config EFI_STUB This kernel feature allows a bzImage to be loaded directly by EFI firmware without the use of a bootloader. - See Documentation/x86/efi-stub.txt for more information. + See Documentation/efi-stub.txt for more information. config SECCOMP def_bool y @@ -1722,16 +1729,56 @@ config RELOCATABLE Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address it has been loaded at and the compile time physical address - (CONFIG_PHYSICAL_START) is ignored. + (CONFIG_PHYSICAL_START) is used as the minimum location. -# Relocation on x86-32 needs some additional build support +config RANDOMIZE_BASE + bool "Randomize the address of the kernel image" + depends on RELOCATABLE + depends on !HIBERNATION + default n + ---help--- + Randomizes the physical and virtual address at which the + kernel image is decompressed, as a security feature that + deters exploit attempts relying on knowledge of the location + of kernel internals. + + Entropy is generated using the RDRAND instruction if it + is supported. If not, then RDTSC is used, if supported. If + neither RDRAND nor RDTSC are supported, then no randomness + is introduced. + + The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET, + and aligned according to PHYSICAL_ALIGN. + +config RANDOMIZE_BASE_MAX_OFFSET + hex "Maximum ASLR offset allowed" + depends on RANDOMIZE_BASE + range 0x0 0x20000000 if X86_32 + default "0x20000000" if X86_32 + range 0x0 0x40000000 if X86_64 + default "0x40000000" if X86_64 + ---help--- + Determines the maximal offset in bytes that will be applied to the + kernel when Address Space Layout Randomization (ASLR) is active. + Must be less than or equal to the actual physical memory on the + system. This must be a multiple of CONFIG_PHYSICAL_ALIGN. + + On 32-bit this is limited to 512MiB. + + On 64-bit this is limited by how the kernel fixmap page table is + positioned, so this cannot be larger that 1GiB currently. Normally + there is a 512MiB to 1.5GiB split between kernel and modules. When + this is raised above the 512MiB default, the modules area will + shrink to compensate, up to the current maximum 1GiB to 1GiB split. + +# Relocation on x86 needs some additional build support config X86_NEED_RELOCS def_bool y - depends on X86_32 && RELOCATABLE + depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) config PHYSICAL_ALIGN hex "Alignment value to which kernel should be aligned" - default "0x1000000" + default "0x200000" range 0x2000 0x1000000 if X86_32 range 0x200000 0x1000000 if X86_64 ---help--- diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index 379814bc41e3..28f7db7993f5 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -20,7 +20,7 @@ targets := vmlinux.bin setup.bin setup.elf bzImage targets += fdimage fdimage144 fdimage288 image.iso mtools.conf subdir- := compressed -setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpucheck.o +setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpuflags.o cpucheck.o setup-y += early_serial_console.o edd.o header.o main.o mca.o memory.o setup-y += pm.o pmjump.o printf.o regs.o string.o tty.o video.o setup-y += video-mode.o version.o @@ -71,7 +71,8 @@ GCOV_PROFILE := n $(obj)/bzImage: asflags-y := $(SVGA_MODE) quiet_cmd_image = BUILD $@ -cmd_image = $(obj)/tools/build $(obj)/setup.bin $(obj)/vmlinux.bin $(obj)/zoffset.h > $@ +cmd_image = $(obj)/tools/build $(obj)/setup.bin $(obj)/vmlinux.bin \ + $(obj)/zoffset.h $@ $(obj)/bzImage: $(obj)/setup.bin $(obj)/vmlinux.bin $(obj)/tools/build FORCE $(call if_changed,image) diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h index ef72baeff484..50f8c5e0f37e 100644 --- a/arch/x86/boot/boot.h +++ b/arch/x86/boot/boot.h @@ -26,9 +26,8 @@ #include #include #include "bitops.h" -#include -#include #include "ctype.h" +#include "cpuflags.h" /* Useful macros */ #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)])) @@ -307,14 +306,7 @@ static inline int cmdline_find_option_bool(const char *option) return __cmdline_find_option_bool(cmd_line_ptr, option); } - /* cpu.c, cpucheck.c */ -struct cpu_features { - int level; /* Family, or 64 for x86-64 */ - int model; - u32 flags[NCAPINTS]; -}; -extern struct cpu_features cpu; int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr); int validate_cpu(void); diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index dcd90df10ab4..ae8b5dbbd8c5 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -27,7 +27,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \ $(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \ - $(obj)/piggy.o + $(obj)/piggy.o $(obj)/cpuflags.o $(obj)/aslr.o $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c new file mode 100644 index 000000000000..05957986d123 --- /dev/null +++ b/arch/x86/boot/compressed/aslr.c @@ -0,0 +1,267 @@ +#include "misc.h" + +#ifdef CONFIG_RANDOMIZE_BASE +#include +#include +#include + +#define I8254_PORT_CONTROL 0x43 +#define I8254_PORT_COUNTER0 0x40 +#define I8254_CMD_READBACK 0xC0 +#define I8254_SELECT_COUNTER0 0x02 +#define I8254_STATUS_NOTREADY 0x40 +static inline u16 i8254(void) +{ + u16 status, timer; + + do { + outb(I8254_PORT_CONTROL, + I8254_CMD_READBACK | I8254_SELECT_COUNTER0); + status = inb(I8254_PORT_COUNTER0); + timer = inb(I8254_PORT_COUNTER0); + timer |= inb(I8254_PORT_COUNTER0) << 8; + } while (status & I8254_STATUS_NOTREADY); + + return timer; +} + +static unsigned long get_random_long(void) +{ + unsigned long random; + + if (has_cpuflag(X86_FEATURE_RDRAND)) { + debug_putstr("KASLR using RDRAND...\n"); + if (rdrand_long(&random)) + return random; + } + + if (has_cpuflag(X86_FEATURE_TSC)) { + uint32_t raw; + + debug_putstr("KASLR using RDTSC...\n"); + rdtscl(raw); + + /* Only use the low bits of rdtsc. */ + random = raw & 0xffff; + } else { + debug_putstr("KASLR using i8254...\n"); + random = i8254(); + } + + /* Extend timer bits poorly... */ + random |= (random << 16); +#ifdef CONFIG_X86_64 + random |= (random << 32); +#endif + return random; +} + +struct mem_vector { + unsigned long start; + unsigned long size; +}; + +#define MEM_AVOID_MAX 5 +struct mem_vector mem_avoid[MEM_AVOID_MAX]; + +static bool mem_contains(struct mem_vector *region, struct mem_vector *item) +{ + /* Item at least partially before region. */ + if (item->start < region->start) + return false; + /* Item at least partially after region. */ + if (item->start + item->size > region->start + region->size) + return false; + return true; +} + +static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two) +{ + /* Item one is entirely before item two. */ + if (one->start + one->size <= two->start) + return false; + /* Item one is entirely after item two. */ + if (one->start >= two->start + two->size) + return false; + return true; +} + +static void mem_avoid_init(unsigned long input, unsigned long input_size, + unsigned long output, unsigned long output_size) +{ + u64 initrd_start, initrd_size; + u64 cmd_line, cmd_line_size; + unsigned long unsafe, unsafe_len; + char *ptr; + + /* + * Avoid the region that is unsafe to overlap during + * decompression (see calculations at top of misc.c). + */ + unsafe_len = (output_size >> 12) + 32768 + 18; + unsafe = (unsigned long)input + input_size - unsafe_len; + mem_avoid[0].start = unsafe; + mem_avoid[0].size = unsafe_len; + + /* Avoid initrd. */ + initrd_start = (u64)real_mode->ext_ramdisk_image << 32; + initrd_start |= real_mode->hdr.ramdisk_image; + initrd_size = (u64)real_mode->ext_ramdisk_size << 32; + initrd_size |= real_mode->hdr.ramdisk_size; + mem_avoid[1].start = initrd_start; + mem_avoid[1].size = initrd_size; + + /* Avoid kernel command line. */ + cmd_line = (u64)real_mode->ext_cmd_line_ptr << 32; + cmd_line |= real_mode->hdr.cmd_line_ptr; + /* Calculate size of cmd_line. */ + ptr = (char *)(unsigned long)cmd_line; + for (cmd_line_size = 0; ptr[cmd_line_size++]; ) + ; + mem_avoid[2].start = cmd_line; + mem_avoid[2].size = cmd_line_size; + + /* Avoid heap memory. */ + mem_avoid[3].start = (unsigned long)free_mem_ptr; + mem_avoid[3].size = BOOT_HEAP_SIZE; + + /* Avoid stack memory. */ + mem_avoid[4].start = (unsigned long)free_mem_end_ptr; + mem_avoid[4].size = BOOT_STACK_SIZE; +} + +/* Does this memory vector overlap a known avoided area? */ +bool mem_avoid_overlap(struct mem_vector *img) +{ + int i; + + for (i = 0; i < MEM_AVOID_MAX; i++) { + if (mem_overlaps(img, &mem_avoid[i])) + return true; + } + + return false; +} + +unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET / CONFIG_PHYSICAL_ALIGN]; +unsigned long slot_max = 0; + +static void slots_append(unsigned long addr) +{ + /* Overflowing the slots list should be impossible. */ + if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET / + CONFIG_PHYSICAL_ALIGN) + return; + + slots[slot_max++] = addr; +} + +static unsigned long slots_fetch_random(void) +{ + /* Handle case of no slots stored. */ + if (slot_max == 0) + return 0; + + return slots[get_random_long() % slot_max]; +} + +static void process_e820_entry(struct e820entry *entry, + unsigned long minimum, + unsigned long image_size) +{ + struct mem_vector region, img; + + /* Skip non-RAM entries. */ + if (entry->type != E820_RAM) + return; + + /* Ignore entries entirely above our maximum. */ + if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET) + return; + + /* Ignore entries entirely below our minimum. */ + if (entry->addr + entry->size < minimum) + return; + + region.start = entry->addr; + region.size = entry->size; + + /* Potentially raise address to minimum location. */ + if (region.start < minimum) + region.start = minimum; + + /* Potentially raise address to meet alignment requirements. */ + region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN); + + /* Did we raise the address above the bounds of this e820 region? */ + if (region.start > entry->addr + entry->size) + return; + + /* Reduce size by any delta from the original address. */ + region.size -= region.start - entry->addr; + + /* Reduce maximum size to fit end of image within maximum limit. */ + if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET) + region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start; + + /* Walk each aligned slot and check for avoided areas. */ + for (img.start = region.start, img.size = image_size ; + mem_contains(®ion, &img) ; + img.start += CONFIG_PHYSICAL_ALIGN) { + if (mem_avoid_overlap(&img)) + continue; + slots_append(img.start); + } +} + +static unsigned long find_random_addr(unsigned long minimum, + unsigned long size) +{ + int i; + unsigned long addr; + + /* Make sure minimum is aligned. */ + minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); + + /* Verify potential e820 positions, appending to slots list. */ + for (i = 0; i < real_mode->e820_entries; i++) { + process_e820_entry(&real_mode->e820_map[i], minimum, size); + } + + return slots_fetch_random(); +} + +unsigned char *choose_kernel_location(unsigned char *input, + unsigned long input_size, + unsigned char *output, + unsigned long output_size) +{ + unsigned long choice = (unsigned long)output; + unsigned long random; + + if (cmdline_find_option_bool("nokaslr")) { + debug_putstr("KASLR disabled...\n"); + goto out; + } + + /* Record the various known unsafe memory ranges. */ + mem_avoid_init((unsigned long)input, input_size, + (unsigned long)output, output_size); + + /* Walk e820 and find a random address. */ + random = find_random_addr(choice, output_size); + if (!random) { + debug_putstr("KASLR could not find suitable E820 region...\n"); + goto out; + } + + /* Always enforce the minimum. */ + if (random < choice) + goto out; + + choice = random; +out: + return (unsigned char *)choice; +} + +#endif /* CONFIG_RANDOMIZE_BASE */ diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c index bffd73b45b1f..b68e3033e6b9 100644 --- a/arch/x86/boot/compressed/cmdline.c +++ b/arch/x86/boot/compressed/cmdline.c @@ -1,6 +1,6 @@ #include "misc.h" -#ifdef CONFIG_EARLY_PRINTK +#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE static unsigned long fs; static inline void set_fs(unsigned long seg) diff --git a/arch/x86/boot/compressed/cpuflags.c b/arch/x86/boot/compressed/cpuflags.c new file mode 100644 index 000000000000..aa313466118b --- /dev/null +++ b/arch/x86/boot/compressed/cpuflags.c @@ -0,0 +1,12 @@ +#ifdef CONFIG_RANDOMIZE_BASE + +#include "../cpuflags.c" + +bool has_cpuflag(int flag) +{ + get_cpuflags(); + + return test_bit(flag, cpu.flags); +} + +#endif diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index b7388a425f09..a7677babf946 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -19,214 +19,10 @@ static efi_system_table_t *sys_table; -static void efi_char16_printk(efi_char16_t *str) -{ - struct efi_simple_text_output_protocol *out; - - out = (struct efi_simple_text_output_protocol *)sys_table->con_out; - efi_call_phys2(out->output_string, out, str); -} - -static void efi_printk(char *str) -{ - char *s8; - - for (s8 = str; *s8; s8++) { - efi_char16_t ch[2] = { 0 }; - - ch[0] = *s8; - if (*s8 == '\n') { - efi_char16_t nl[2] = { '\r', 0 }; - efi_char16_printk(nl); - } - - efi_char16_printk(ch); - } -} - -static efi_status_t __get_map(efi_memory_desc_t **map, unsigned long *map_size, - unsigned long *desc_size) -{ - efi_memory_desc_t *m = NULL; - efi_status_t status; - unsigned long key; - u32 desc_version; - - *map_size = sizeof(*m) * 32; -again: - /* - * Add an additional efi_memory_desc_t because we're doing an - * allocation which may be in a new descriptor region. - */ - *map_size += sizeof(*m); - status = efi_call_phys3(sys_table->boottime->allocate_pool, - EFI_LOADER_DATA, *map_size, (void **)&m); - if (status != EFI_SUCCESS) - goto fail; - - status = efi_call_phys5(sys_table->boottime->get_memory_map, map_size, - m, &key, desc_size, &desc_version); - if (status == EFI_BUFFER_TOO_SMALL) { - efi_call_phys1(sys_table->boottime->free_pool, m); - goto again; - } - - if (status != EFI_SUCCESS) - efi_call_phys1(sys_table->boottime->free_pool, m); -fail: - *map = m; - return status; -} - -/* - * Allocate at the highest possible address that is not above 'max'. - */ -static efi_status_t high_alloc(unsigned long size, unsigned long align, - unsigned long *addr, unsigned long max) -{ - unsigned long map_size, desc_size; - efi_memory_desc_t *map; - efi_status_t status; - unsigned long nr_pages; - u64 max_addr = 0; - int i; - - status = __get_map(&map, &map_size, &desc_size); - if (status != EFI_SUCCESS) - goto fail; - - nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; -again: - for (i = 0; i < map_size / desc_size; i++) { - efi_memory_desc_t *desc; - unsigned long m = (unsigned long)map; - u64 start, end; - - desc = (efi_memory_desc_t *)(m + (i * desc_size)); - if (desc->type != EFI_CONVENTIONAL_MEMORY) - continue; - - if (desc->num_pages < nr_pages) - continue; +#include "../../../../drivers/firmware/efi/efi-stub-helper.c" - start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); - if ((start + size) > end || (start + size) > max) - continue; - - if (end - size > max) - end = max; - - if (round_down(end - size, align) < start) - continue; - - start = round_down(end - size, align); - - /* - * Don't allocate at 0x0. It will confuse code that - * checks pointers against NULL. - */ - if (start == 0x0) - continue; - - if (start > max_addr) - max_addr = start; - } - - if (!max_addr) - status = EFI_NOT_FOUND; - else { - status = efi_call_phys4(sys_table->boottime->allocate_pages, - EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, - nr_pages, &max_addr); - if (status != EFI_SUCCESS) { - max = max_addr; - max_addr = 0; - goto again; - } - - *addr = max_addr; - } - -free_pool: - efi_call_phys1(sys_table->boottime->free_pool, map); - -fail: - return status; -} - -/* - * Allocate at the lowest possible address. - */ -static efi_status_t low_alloc(unsigned long size, unsigned long align, - unsigned long *addr) -{ - unsigned long map_size, desc_size; - efi_memory_desc_t *map; - efi_status_t status; - unsigned long nr_pages; - int i; - - status = __get_map(&map, &map_size, &desc_size); - if (status != EFI_SUCCESS) - goto fail; - - nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; - for (i = 0; i < map_size / desc_size; i++) { - efi_memory_desc_t *desc; - unsigned long m = (unsigned long)map; - u64 start, end; - - desc = (efi_memory_desc_t *)(m + (i * desc_size)); - - if (desc->type != EFI_CONVENTIONAL_MEMORY) - continue; - - if (desc->num_pages < nr_pages) - continue; - - start = desc->phys_addr; - end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); - - /* - * Don't allocate at 0x0. It will confuse code that - * checks pointers against NULL. Skip the first 8 - * bytes so we start at a nice even number. - */ - if (start == 0x0) - start += 8; - - start = round_up(start, align); - if ((start + size) > end) - continue; - - status = efi_call_phys4(sys_table->boottime->allocate_pages, - EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, - nr_pages, &start); - if (status == EFI_SUCCESS) { - *addr = start; - break; - } - } - - if (i == map_size / desc_size) - status = EFI_NOT_FOUND; - -free_pool: - efi_call_phys1(sys_table->boottime->free_pool, map); -fail: - return status; -} - -static void low_free(unsigned long size, unsigned long addr) -{ - unsigned long nr_pages; - - nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; - efi_call_phys2(sys_table->boottime->free_pages, addr, nr_pages); -} static void find_bits(unsigned long mask, u8 *pos, u8 *size) { @@ -624,242 +420,6 @@ void setup_graphics(struct boot_params *boot_params) } } -struct initrd { - efi_file_handle_t *handle; - u64 size; -}; - -/* - * Check the cmdline for a LILO-style initrd= arguments. - * - * We only support loading an initrd from the same filesystem as the - * kernel image. - */ -static efi_status_t handle_ramdisks(efi_loaded_image_t *image, - struct setup_header *hdr) -{ - struct initrd *initrds; - unsigned long initrd_addr; - efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID; - u64 initrd_total; - efi_file_io_interface_t *io; - efi_file_handle_t *fh; - efi_status_t status; - int nr_initrds; - char *str; - int i, j, k; - - initrd_addr = 0; - initrd_total = 0; - - str = (char *)(unsigned long)hdr->cmd_line_ptr; - - j = 0; /* See close_handles */ - - if (!str || !*str) - return EFI_SUCCESS; - - for (nr_initrds = 0; *str; nr_initrds++) { - str = strstr(str, "initrd="); - if (!str) - break; - - str += 7; - - /* Skip any leading slashes */ - while (*str == '/' || *str == '\\') - str++; - - while (*str && *str != ' ' && *str != '\n') - str++; - } - - if (!nr_initrds) - return EFI_SUCCESS; - - status = efi_call_phys3(sys_table->boottime->allocate_pool, - EFI_LOADER_DATA, - nr_initrds * sizeof(*initrds), - &initrds); - if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for initrds\n"); - goto fail; - } - - str = (char *)(unsigned long)hdr->cmd_line_ptr; - for (i = 0; i < nr_initrds; i++) { - struct initrd *initrd; - efi_file_handle_t *h; - efi_file_info_t *info; - efi_char16_t filename_16[256]; - unsigned long info_sz; - efi_guid_t info_guid = EFI_FILE_INFO_ID; - efi_char16_t *p; - u64 file_sz; - - str = strstr(str, "initrd="); - if (!str) - break; - - str += 7; - - initrd = &initrds[i]; - p = filename_16; - - /* Skip any leading slashes */ - while (*str == '/' || *str == '\\') - str++; - - while (*str && *str != ' ' && *str != '\n') { - if ((u8 *)p >= (u8 *)filename_16 + sizeof(filename_16)) - break; - - if (*str == '/') { - *p++ = '\\'; - *str++; - } else { - *p++ = *str++; - } - } - - *p = '\0'; - - /* Only open the volume once. */ - if (!i) { - efi_boot_services_t *boottime; - - boottime = sys_table->boottime; - - status = efi_call_phys3(boottime->handle_protocol, - image->device_handle, &fs_proto, &io); - if (status != EFI_SUCCESS) { - efi_printk("Failed to handle fs_proto\n"); - goto free_initrds; - } - - status = efi_call_phys2(io->open_volume, io, &fh); - if (status != EFI_SUCCESS) { - efi_printk("Failed to open volume\n"); - goto free_initrds; - } - } - - status = efi_call_phys5(fh->open, fh, &h, filename_16, - EFI_FILE_MODE_READ, (u64)0); - if (status != EFI_SUCCESS) { - efi_printk("Failed to open initrd file: "); - efi_char16_printk(filename_16); - efi_printk("\n"); - goto close_handles; - } - - initrd->handle = h; - - info_sz = 0; - status = efi_call_phys4(h->get_info, h, &info_guid, - &info_sz, NULL); - if (status != EFI_BUFFER_TOO_SMALL) { - efi_printk("Failed to get initrd info size\n"); - goto close_handles; - } - -grow: - status = efi_call_phys3(sys_table->boottime->allocate_pool, - EFI_LOADER_DATA, info_sz, &info); - if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for initrd info\n"); - goto close_handles; - } - - status = efi_call_phys4(h->get_info, h, &info_guid, - &info_sz, info); - if (status == EFI_BUFFER_TOO_SMALL) { - efi_call_phys1(sys_table->boottime->free_pool, info); - goto grow; - } - - file_sz = info->file_size; - efi_call_phys1(sys_table->boottime->free_pool, info); - - if (status != EFI_SUCCESS) { - efi_printk("Failed to get initrd info\n"); - goto close_handles; - } - - initrd->size = file_sz; - initrd_total += file_sz; - } - - if (initrd_total) { - unsigned long addr; - - /* - * Multiple initrd's need to be at consecutive - * addresses in memory, so allocate enough memory for - * all the initrd's. - */ - status = high_alloc(initrd_total, 0x1000, - &initrd_addr, hdr->initrd_addr_max); - if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc highmem for initrds\n"); - goto close_handles; - } - - /* We've run out of free low memory. */ - if (initrd_addr > hdr->initrd_addr_max) { - efi_printk("We've run out of free low memory\n"); - status = EFI_INVALID_PARAMETER; - goto free_initrd_total; - } - - addr = initrd_addr; - for (j = 0; j < nr_initrds; j++) { - u64 size; - - size = initrds[j].size; - while (size) { - u64 chunksize; - if (size > EFI_READ_CHUNK_SIZE) - chunksize = EFI_READ_CHUNK_SIZE; - else - chunksize = size; - status = efi_call_phys3(fh->read, - initrds[j].handle, - &chunksize, addr); - if (status != EFI_SUCCESS) { - efi_printk("Failed to read initrd\n"); - goto free_initrd_total; - } - addr += chunksize; - size -= chunksize; - } - - efi_call_phys1(fh->close, initrds[j].handle); - } - - } - - efi_call_phys1(sys_table->boottime->free_pool, initrds); - - hdr->ramdisk_image = initrd_addr; - hdr->ramdisk_size = initrd_total; - - return status; - -free_initrd_total: - low_free(initrd_total, initrd_addr); - -close_handles: - for (k = j; k < i; k++) - efi_call_phys1(fh->close, initrds[k].handle); -free_initrds: - efi_call_phys1(sys_table->boottime->free_pool, initrds); -fail: - hdr->ramdisk_image = 0; - hdr->ramdisk_size = 0; - - return status; -} /* * Because the x86 boot code expects to be passed a boot_params we @@ -875,14 +435,15 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) struct efi_info *efi; efi_loaded_image_t *image; void *options; - u32 load_options_size; efi_guid_t proto = LOADED_IMAGE_PROTOCOL_GUID; int options_size = 0; efi_status_t status; - unsigned long cmdline; + char *cmdline_ptr; u16 *s2; u8 *s1; int i; + unsigned long ramdisk_addr; + unsigned long ramdisk_size; sys_table = _table; @@ -893,13 +454,14 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) status = efi_call_phys3(sys_table->boottime->handle_protocol, handle, &proto, (void *)&image); if (status != EFI_SUCCESS) { - efi_printk("Failed to get handle for LOADED_IMAGE_PROTOCOL\n"); + efi_printk(sys_table, "Failed to get handle for LOADED_IMAGE_PROTOCOL\n"); return NULL; } - status = low_alloc(0x4000, 1, (unsigned long *)&boot_params); + status = efi_low_alloc(sys_table, 0x4000, 1, + (unsigned long *)&boot_params); if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc lowmem for boot params\n"); + efi_printk(sys_table, "Failed to alloc lowmem for boot params\n"); return NULL; } @@ -926,40 +488,11 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) hdr->type_of_loader = 0x21; /* Convert unicode cmdline to ascii */ - options = image->load_options; - load_options_size = image->load_options_size / 2; /* ASCII */ - cmdline = 0; - s2 = (u16 *)options; - - if (s2) { - while (*s2 && *s2 != '\n' && options_size < load_options_size) { - s2++; - options_size++; - } - - if (options_size) { - if (options_size > hdr->cmdline_size) - options_size = hdr->cmdline_size; - - options_size++; /* NUL termination */ - - status = low_alloc(options_size, 1, &cmdline); - if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for cmdline\n"); - goto fail; - } - - s1 = (u8 *)(unsigned long)cmdline; - s2 = (u16 *)options; - - for (i = 0; i < options_size - 1; i++) - *s1++ = *s2++; - - *s1 = '\0'; - } - } - - hdr->cmd_line_ptr = cmdline; + cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, + &options_size); + if (!cmdline_ptr) + goto fail; + hdr->cmd_line_ptr = (unsigned long)cmdline_ptr; hdr->ramdisk_image = 0; hdr->ramdisk_size = 0; @@ -969,96 +502,64 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) memset(sdt, 0, sizeof(*sdt)); - status = handle_ramdisks(image, hdr); + status = handle_cmdline_files(sys_table, image, + (char *)(unsigned long)hdr->cmd_line_ptr, + "initrd=", hdr->initrd_addr_max, + &ramdisk_addr, &ramdisk_size); if (status != EFI_SUCCESS) goto fail2; + hdr->ramdisk_image = ramdisk_addr; + hdr->ramdisk_size = ramdisk_size; return boot_params; fail2: - if (options_size) - low_free(options_size, hdr->cmd_line_ptr); + efi_free(sys_table, options_size, hdr->cmd_line_ptr); fail: - low_free(0x4000, (unsigned long)boot_params); + efi_free(sys_table, 0x4000, (unsigned long)boot_params); return NULL; } -static efi_status_t exit_boot(struct boot_params *boot_params, - void *handle) +static void add_e820ext(struct boot_params *params, + struct setup_data *e820ext, u32 nr_entries) { - struct efi_info *efi = &boot_params->efi_info; - struct e820entry *e820_map = &boot_params->e820_map[0]; - struct e820entry *prev = NULL; - unsigned long size, key, desc_size, _size; - efi_memory_desc_t *mem_map; + struct setup_data *data; efi_status_t status; - __u32 desc_version; - bool called_exit = false; - u8 nr_entries; - int i; - - size = sizeof(*mem_map) * 32; - -again: - size += sizeof(*mem_map) * 2; - _size = size; - status = low_alloc(size, 1, (unsigned long *)&mem_map); - if (status != EFI_SUCCESS) - return status; - -get_map: - status = efi_call_phys5(sys_table->boottime->get_memory_map, &size, - mem_map, &key, &desc_size, &desc_version); - if (status == EFI_BUFFER_TOO_SMALL) { - low_free(_size, (unsigned long)mem_map); - goto again; - } + unsigned long size; - if (status != EFI_SUCCESS) - goto free_mem_map; + e820ext->type = SETUP_E820_EXT; + e820ext->len = nr_entries * sizeof(struct e820entry); + e820ext->next = 0; - memcpy(&efi->efi_loader_signature, EFI_LOADER_SIGNATURE, sizeof(__u32)); - efi->efi_systab = (unsigned long)sys_table; - efi->efi_memdesc_size = desc_size; - efi->efi_memdesc_version = desc_version; - efi->efi_memmap = (unsigned long)mem_map; - efi->efi_memmap_size = size; - -#ifdef CONFIG_X86_64 - efi->efi_systab_hi = (unsigned long)sys_table >> 32; - efi->efi_memmap_hi = (unsigned long)mem_map >> 32; -#endif + data = (struct setup_data *)(unsigned long)params->hdr.setup_data; - /* Might as well exit boot services now */ - status = efi_call_phys2(sys_table->boottime->exit_boot_services, - handle, key); - if (status != EFI_SUCCESS) { - /* - * ExitBootServices() will fail if any of the event - * handlers change the memory map. In which case, we - * must be prepared to retry, but only once so that - * we're guaranteed to exit on repeated failures instead - * of spinning forever. - */ - if (called_exit) - goto free_mem_map; + while (data && data->next) + data = (struct setup_data *)(unsigned long)data->next; - called_exit = true; - goto get_map; - } + if (data) + data->next = (unsigned long)e820ext; + else + params->hdr.setup_data = (unsigned long)e820ext; +} - /* Historic? */ - boot_params->alt_mem_k = 32 * 1024; +static efi_status_t setup_e820(struct boot_params *params, + struct setup_data *e820ext, u32 e820ext_size) +{ + struct e820entry *e820_map = ¶ms->e820_map[0]; + struct efi_info *efi = ¶ms->efi_info; + struct e820entry *prev = NULL; + u32 nr_entries; + u32 nr_desc; + int i; - /* - * Convert the EFI memory map to E820. - */ nr_entries = 0; - for (i = 0; i < size / desc_size; i++) { + nr_desc = efi->efi_memmap_size / efi->efi_memdesc_size; + + for (i = 0; i < nr_desc; i++) { efi_memory_desc_t *d; unsigned int e820_type = 0; - unsigned long m = (unsigned long)mem_map; + unsigned long m = efi->efi_memmap; - d = (efi_memory_desc_t *)(m + (i * desc_size)); + d = (efi_memory_desc_t *)(m + (i * efi->efi_memdesc_size)); switch (d->type) { case EFI_RESERVED_TYPE: case EFI_RUNTIME_SERVICES_CODE: @@ -1095,61 +596,151 @@ get_map: /* Merge adjacent mappings */ if (prev && prev->type == e820_type && - (prev->addr + prev->size) == d->phys_addr) + (prev->addr + prev->size) == d->phys_addr) { prev->size += d->num_pages << 12; - else { - e820_map->addr = d->phys_addr; - e820_map->size = d->num_pages << 12; - e820_map->type = e820_type; - prev = e820_map++; - nr_entries++; + continue; + } + + if (nr_entries == ARRAY_SIZE(params->e820_map)) { + u32 need = (nr_desc - i) * sizeof(struct e820entry) + + sizeof(struct setup_data); + + if (!e820ext || e820ext_size < need) + return EFI_BUFFER_TOO_SMALL; + + /* boot_params map full, switch to e820 extended */ + e820_map = (struct e820entry *)e820ext->data; } + + e820_map->addr = d->phys_addr; + e820_map->size = d->num_pages << PAGE_SHIFT; + e820_map->type = e820_type; + prev = e820_map++; + nr_entries++; } - boot_params->e820_entries = nr_entries; + if (nr_entries > ARRAY_SIZE(params->e820_map)) { + u32 nr_e820ext = nr_entries - ARRAY_SIZE(params->e820_map); + + add_e820ext(params, e820ext, nr_e820ext); + nr_entries -= nr_e820ext; + } + + params->e820_entries = (u8)nr_entries; return EFI_SUCCESS; +} + +static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, + u32 *e820ext_size) +{ + efi_status_t status; + unsigned long size; + + size = sizeof(struct setup_data) + + sizeof(struct e820entry) * nr_desc; + + if (*e820ext) { + efi_call_phys1(sys_table->boottime->free_pool, *e820ext); + *e820ext = NULL; + *e820ext_size = 0; + } + + status = efi_call_phys3(sys_table->boottime->allocate_pool, + EFI_LOADER_DATA, size, e820ext); + + if (status == EFI_SUCCESS) + *e820ext_size = size; -free_mem_map: - low_free(_size, (unsigned long)mem_map); return status; } -static efi_status_t relocate_kernel(struct setup_header *hdr) +static efi_status_t exit_boot(struct boot_params *boot_params, + void *handle) { - unsigned long start, nr_pages; + struct efi_info *efi = &boot_params->efi_info; + unsigned long map_sz, key, desc_size; + efi_memory_desc_t *mem_map; + struct setup_data *e820ext; + __u32 e820ext_size; + __u32 nr_desc, prev_nr_desc; efi_status_t status; + __u32 desc_version; + bool called_exit = false; + u8 nr_entries; + int i; - /* - * The EFI firmware loader could have placed the kernel image - * anywhere in memory, but the kernel has various restrictions - * on the max physical address it can run at. Attempt to move - * the kernel to boot_params.pref_address, or as low as - * possible. - */ - start = hdr->pref_address; - nr_pages = round_up(hdr->init_size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; + nr_desc = 0; + e820ext = NULL; + e820ext_size = 0; - status = efi_call_phys4(sys_table->boottime->allocate_pages, - EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, - nr_pages, &start); - if (status != EFI_SUCCESS) { - status = low_alloc(hdr->init_size, hdr->kernel_alignment, - &start); +get_map: + status = efi_get_memory_map(sys_table, &mem_map, &map_sz, &desc_size, + &desc_version, &key); + + if (status != EFI_SUCCESS) + return status; + + prev_nr_desc = nr_desc; + nr_desc = map_sz / desc_size; + if (nr_desc > prev_nr_desc && + nr_desc > ARRAY_SIZE(boot_params->e820_map)) { + u32 nr_e820ext = nr_desc - ARRAY_SIZE(boot_params->e820_map); + + status = alloc_e820ext(nr_e820ext, &e820ext, &e820ext_size); if (status != EFI_SUCCESS) - efi_printk("Failed to alloc mem for kernel\n"); + goto free_mem_map; + + efi_call_phys1(sys_table->boottime->free_pool, mem_map); + goto get_map; /* Allocated memory, get map again */ } - if (status == EFI_SUCCESS) - memcpy((void *)start, (void *)(unsigned long)hdr->code32_start, - hdr->init_size); + memcpy(&efi->efi_loader_signature, EFI_LOADER_SIGNATURE, sizeof(__u32)); + efi->efi_systab = (unsigned long)sys_table; + efi->efi_memdesc_size = desc_size; + efi->efi_memdesc_version = desc_version; + efi->efi_memmap = (unsigned long)mem_map; + efi->efi_memmap_size = map_sz; + +#ifdef CONFIG_X86_64 + efi->efi_systab_hi = (unsigned long)sys_table >> 32; + efi->efi_memmap_hi = (unsigned long)mem_map >> 32; +#endif - hdr->pref_address = hdr->code32_start; - hdr->code32_start = (__u32)start; + /* Might as well exit boot services now */ + status = efi_call_phys2(sys_table->boottime->exit_boot_services, + handle, key); + if (status != EFI_SUCCESS) { + /* + * ExitBootServices() will fail if any of the event + * handlers change the memory map. In which case, we + * must be prepared to retry, but only once so that + * we're guaranteed to exit on repeated failures instead + * of spinning forever. + */ + if (called_exit) + goto free_mem_map; + called_exit = true; + efi_call_phys1(sys_table->boottime->free_pool, mem_map); + goto get_map; + } + + /* Historic? */ + boot_params->alt_mem_k = 32 * 1024; + + status = setup_e820(boot_params, e820ext, e820ext_size); + if (status != EFI_SUCCESS) + return status; + + return EFI_SUCCESS; + +free_mem_map: + efi_call_phys1(sys_table->boottime->free_pool, mem_map); return status; } + /* * On success we return a pointer to a boot_params structure, and NULL * on failure. @@ -1157,7 +748,7 @@ static efi_status_t relocate_kernel(struct setup_header *hdr) struct boot_params *efi_main(void *handle, efi_system_table_t *_table, struct boot_params *boot_params) { - struct desc_ptr *gdt, *idt; + struct desc_ptr *gdt; efi_loaded_image_t *image; struct setup_header *hdr = &boot_params->hdr; efi_status_t status; @@ -1177,37 +768,33 @@ struct boot_params *efi_main(void *handle, efi_system_table_t *_table, EFI_LOADER_DATA, sizeof(*gdt), (void **)&gdt); if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for gdt structure\n"); + efi_printk(sys_table, "Failed to alloc mem for gdt structure\n"); goto fail; } gdt->size = 0x800; - status = low_alloc(gdt->size, 8, (unsigned long *)&gdt->address); - if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for gdt\n"); - goto fail; - } - - status = efi_call_phys3(sys_table->boottime->allocate_pool, - EFI_LOADER_DATA, sizeof(*idt), - (void **)&idt); + status = efi_low_alloc(sys_table, gdt->size, 8, + (unsigned long *)&gdt->address); if (status != EFI_SUCCESS) { - efi_printk("Failed to alloc mem for idt structure\n"); + efi_printk(sys_table, "Failed to alloc mem for gdt\n"); goto fail; } - idt->size = 0; - idt->address = 0; - /* * If the kernel isn't already loaded at the preferred load * address, relocate it. */ if (hdr->pref_address != hdr->code32_start) { - status = relocate_kernel(hdr); - + unsigned long bzimage_addr = hdr->code32_start; + status = efi_relocate_kernel(sys_table, &bzimage_addr, + hdr->init_size, hdr->init_size, + hdr->pref_address, + hdr->kernel_alignment); if (status != EFI_SUCCESS) goto fail; + + hdr->pref_address = hdr->code32_start; + hdr->code32_start = bzimage_addr; } status = exit_boot(boot_params, handle); @@ -1267,10 +854,8 @@ struct boot_params *efi_main(void *handle, efi_system_table_t *_table, desc->base2 = 0x00; #endif /* CONFIG_X86_64 */ - asm volatile ("lidt %0" : : "m" (*idt)); - asm volatile ("lgdt %0" : : "m" (*gdt)); - asm volatile("cli"); + asm volatile ("lgdt %0" : : "m" (*gdt)); return boot_params; fail: diff --git a/arch/x86/boot/compressed/eboot.h b/arch/x86/boot/compressed/eboot.h index e5b0a8f91c5f..81b6b652b46a 100644 --- a/arch/x86/boot/compressed/eboot.h +++ b/arch/x86/boot/compressed/eboot.h @@ -11,9 +11,6 @@ #define DESC_TYPE_CODE_DATA (1 << 0) -#define EFI_PAGE_SIZE (1UL << EFI_PAGE_SHIFT) -#define EFI_READ_CHUNK_SIZE (1024 * 1024) - #define EFI_CONSOLE_OUT_DEVICE_GUID \ EFI_GUID(0xd3b36f2c, 0xd551, 0x11d4, 0x9a, 0x46, 0x0, 0x90, 0x27, \ 0x3f, 0xc1, 0x4d) @@ -62,10 +59,4 @@ struct efi_uga_draw_protocol { void *blt; }; -struct efi_simple_text_output_protocol { - void *reset; - void *output_string; - void *test_string; -}; - #endif /* BOOT_COMPRESSED_EBOOT_H */ diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index 5d6f6891b188..9116aac232c7 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -117,9 +117,11 @@ preferred_addr: addl %eax, %ebx notl %eax andl %eax, %ebx -#else - movl $LOAD_PHYSICAL_ADDR, %ebx + cmpl $LOAD_PHYSICAL_ADDR, %ebx + jge 1f #endif + movl $LOAD_PHYSICAL_ADDR, %ebx +1: /* Target address to relocate to for decompression */ addl $z_extract_offset, %ebx @@ -191,14 +193,14 @@ relocated: leal boot_heap(%ebx), %eax pushl %eax /* heap area */ pushl %esi /* real mode pointer */ - call decompress_kernel + call decompress_kernel /* returns kernel location in %eax */ addl $24, %esp /* * Jump to the decompressed kernel. */ xorl %ebx, %ebx - jmp *%ebp + jmp *%eax /* * Stack and heap for uncompression diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index c337422b575d..c5c1ae0997e7 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -94,9 +94,11 @@ ENTRY(startup_32) addl %eax, %ebx notl %eax andl %eax, %ebx -#else - movl $LOAD_PHYSICAL_ADDR, %ebx + cmpl $LOAD_PHYSICAL_ADDR, %ebx + jge 1f #endif + movl $LOAD_PHYSICAL_ADDR, %ebx +1: /* Target address to relocate to for decompression */ addl $z_extract_offset, %ebx @@ -269,9 +271,11 @@ preferred_addr: addq %rax, %rbp notq %rax andq %rax, %rbp -#else - movq $LOAD_PHYSICAL_ADDR, %rbp + cmpq $LOAD_PHYSICAL_ADDR, %rbp + jge 1f #endif + movq $LOAD_PHYSICAL_ADDR, %rbp +1: /* Target address to relocate to for decompression */ leaq z_extract_offset(%rbp), %rbx @@ -339,13 +343,13 @@ relocated: movl $z_input_len, %ecx /* input_len */ movq %rbp, %r8 /* output target address */ movq $z_output_len, %r9 /* decompressed length */ - call decompress_kernel + call decompress_kernel /* returns kernel location in %rax */ popq %rsi /* * Jump to the decompressed kernel. */ - jmp *%rbp + jmp *%rax .code32 no_longmode: diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 434f077d2c4d..196eaf373a06 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -112,14 +112,8 @@ struct boot_params *real_mode; /* Pointer to real-mode data */ void *memset(void *s, int c, size_t n); void *memcpy(void *dest, const void *src, size_t n); -#ifdef CONFIG_X86_64 -#define memptr long -#else -#define memptr unsigned -#endif - -static memptr free_mem_ptr; -static memptr free_mem_end_ptr; +memptr free_mem_ptr; +memptr free_mem_end_ptr; static char *vidmem; static int vidport; @@ -395,7 +389,7 @@ static void parse_elf(void *output) free(phdrs); } -asmlinkage void decompress_kernel(void *rmode, memptr heap, +asmlinkage void *decompress_kernel(void *rmode, memptr heap, unsigned char *input_data, unsigned long input_len, unsigned char *output, @@ -422,6 +416,10 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap, free_mem_ptr = heap; /* Heap */ free_mem_end_ptr = heap + BOOT_HEAP_SIZE; + output = choose_kernel_location(input_data, input_len, + output, output_len); + + /* Validate memory location choices. */ if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1)) error("Destination address inappropriately aligned"); #ifdef CONFIG_X86_64 @@ -441,5 +439,5 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap, parse_elf(output); handle_relocations(output, output_len); debug_putstr("done.\nBooting the kernel.\n"); - return; + return output; } diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index 674019d8e235..24e3e569a13c 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -23,7 +23,15 @@ #define BOOT_BOOT_H #include "../ctype.h" +#ifdef CONFIG_X86_64 +#define memptr long +#else +#define memptr unsigned +#endif + /* misc.c */ +extern memptr free_mem_ptr; +extern memptr free_mem_end_ptr; extern struct boot_params *real_mode; /* Pointer to real-mode data */ void __putstr(const char *s); #define error_putstr(__x) __putstr(__x) @@ -39,23 +47,40 @@ static inline void debug_putstr(const char *s) #endif -#ifdef CONFIG_EARLY_PRINTK - +#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE /* cmdline.c */ int cmdline_find_option(const char *option, char *buffer, int bufsize); int cmdline_find_option_bool(const char *option); +#endif -/* early_serial_console.c */ -extern int early_serial_base; -void console_init(void); +#if CONFIG_RANDOMIZE_BASE +/* aslr.c */ +unsigned char *choose_kernel_location(unsigned char *input, + unsigned long input_size, + unsigned char *output, + unsigned long output_size); +/* cpuflags.c */ +bool has_cpuflag(int flag); #else +static inline +unsigned char *choose_kernel_location(unsigned char *input, + unsigned long input_size, + unsigned char *output, + unsigned long output_size) +{ + return output; +} +#endif +#ifdef CONFIG_EARLY_PRINTK /* early_serial_console.c */ +extern int early_serial_base; +void console_init(void); +#else static const int early_serial_base; static inline void console_init(void) { } - #endif #endif diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c index 958a641483dd..b669ab65bf6c 100644 --- a/arch/x86/boot/compressed/mkpiggy.c +++ b/arch/x86/boot/compressed/mkpiggy.c @@ -36,11 +36,12 @@ int main(int argc, char *argv[]) uint32_t olen; long ilen; unsigned long offs; - FILE *f; + FILE *f = NULL; + int retval = 1; if (argc < 2) { fprintf(stderr, "Usage: %s compressed_file\n", argv[0]); - return 1; + goto bail; } /* Get the information for the compressed kernel image first */ @@ -48,7 +49,7 @@ int main(int argc, char *argv[]) f = fopen(argv[1], "r"); if (!f) { perror(argv[1]); - return 1; + goto bail; } @@ -58,12 +59,11 @@ int main(int argc, char *argv[]) if (fread(&olen, sizeof(olen), 1, f) != 1) { perror(argv[1]); - return 1; + goto bail; } ilen = ftell(f); olen = get_unaligned_le32(&olen); - fclose(f); /* * Now we have the input (compressed) and output (uncompressed) @@ -91,5 +91,9 @@ int main(int argc, char *argv[]) printf(".incbin \"%s\"\n", argv[1]); printf("input_data_end:\n"); - return 0; + retval = 0; +bail: + if (f) + fclose(f); + return retval; } diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff037201f..100a9a10076a 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -28,8 +28,6 @@ #include #include -struct cpu_features cpu; -static u32 cpu_vendor[3]; static u32 err_flags[NCAPINTS]; static const int req_level = CONFIG_X86_MINIMUM_CPU_FAMILY; @@ -69,92 +67,8 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } -static int has_fpu(void) -{ - u16 fcw = -1, fsw = -1; - u32 cr0; - - asm("movl %%cr0,%0" : "=r" (cr0)); - if (cr0 & (X86_CR0_EM|X86_CR0_TS)) { - cr0 &= ~(X86_CR0_EM|X86_CR0_TS); - asm volatile("movl %0,%%cr0" : : "r" (cr0)); - } - - asm volatile("fninit ; fnstsw %0 ; fnstcw %1" - : "+m" (fsw), "+m" (fcw)); - - return fsw == 0 && (fcw & 0x103f) == 0x003f; -} - -static int has_eflag(u32 mask) -{ - u32 f0, f1; - - asm("pushfl ; " - "pushfl ; " - "popl %0 ; " - "movl %0,%1 ; " - "xorl %2,%1 ; " - "pushl %1 ; " - "popfl ; " - "pushfl ; " - "popl %1 ; " - "popfl" - : "=&r" (f0), "=&r" (f1) - : "ri" (mask)); - - return !!((f0^f1) & mask); -} - -static void get_flags(void) -{ - u32 max_intel_level, max_amd_level; - u32 tfms; - - if (has_fpu()) - set_bit(X86_FEATURE_FPU, cpu.flags); - - if (has_eflag(X86_EFLAGS_ID)) { - asm("cpuid" - : "=a" (max_intel_level), - "=b" (cpu_vendor[0]), - "=d" (cpu_vendor[1]), - "=c" (cpu_vendor[2]) - : "a" (0)); - - if (max_intel_level >= 0x00000001 && - max_intel_level <= 0x0000ffff) { - asm("cpuid" - : "=a" (tfms), - "=c" (cpu.flags[4]), - "=d" (cpu.flags[0]) - : "a" (0x00000001) - : "ebx"); - cpu.level = (tfms >> 8) & 15; - cpu.model = (tfms >> 4) & 15; - if (cpu.level >= 6) - cpu.model += ((tfms >> 16) & 0xf) << 4; - } - - asm("cpuid" - : "=a" (max_amd_level) - : "a" (0x80000000) - : "ebx", "ecx", "edx"); - - if (max_amd_level >= 0x80000001 && - max_amd_level <= 0x8000ffff) { - u32 eax = 0x80000001; - asm("cpuid" - : "+a" (eax), - "=c" (cpu.flags[6]), - "=d" (cpu.flags[1]) - : : "ebx"); - } - } -} - /* Returns a bitmask of which words we have error bits in */ -static int check_flags(void) +static int check_cpuflags(void) { u32 err; int i; @@ -187,8 +101,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) if (has_eflag(X86_EFLAGS_AC)) cpu.level = 4; - get_flags(); - err = check_flags(); + get_cpuflags(); + err = check_cpuflags(); if (test_bit(X86_FEATURE_LM, cpu.flags)) cpu.level = 64; @@ -207,8 +121,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) eax &= ~(1 << 15); asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); - get_flags(); /* Make sure it really did something */ - err = check_flags(); + get_cpuflags(); /* Make sure it really did something */ + err = check_cpuflags(); } else if (err == 0x01 && !(err_flags[0] & ~(1 << X86_FEATURE_CX8)) && is_centaur() && cpu.model >= 6) { @@ -223,7 +137,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); set_bit(X86_FEATURE_CX8, cpu.flags); - err = check_flags(); + err = check_cpuflags(); } else if (err == 0x01 && is_transmeta()) { /* Transmeta might have masked feature bits in word 0 */ @@ -238,7 +152,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) : : "ecx", "ebx"); asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); - err = check_flags(); + err = check_cpuflags(); } if (err_flags_ptr) diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c new file mode 100644 index 000000000000..a9fcb7cfb241 --- /dev/null +++ b/arch/x86/boot/cpuflags.c @@ -0,0 +1,104 @@ +#include +#include "bitops.h" + +#include +#include +#include +#include "cpuflags.h" + +struct cpu_features cpu; +u32 cpu_vendor[3]; + +static bool loaded_flags; + +static int has_fpu(void) +{ + u16 fcw = -1, fsw = -1; + unsigned long cr0; + + asm volatile("mov %%cr0,%0" : "=r" (cr0)); + if (cr0 & (X86_CR0_EM|X86_CR0_TS)) { + cr0 &= ~(X86_CR0_EM|X86_CR0_TS); + asm volatile("mov %0,%%cr0" : : "r" (cr0)); + } + + asm volatile("fninit ; fnstsw %0 ; fnstcw %1" + : "+m" (fsw), "+m" (fcw)); + + return fsw == 0 && (fcw & 0x103f) == 0x003f; +} + +int has_eflag(unsigned long mask) +{ + unsigned long f0, f1; + + asm volatile("pushf \n\t" + "pushf \n\t" + "pop %0 \n\t" + "mov %0,%1 \n\t" + "xor %2,%1 \n\t" + "push %1 \n\t" + "popf \n\t" + "pushf \n\t" + "pop %1 \n\t" + "popf" + : "=&r" (f0), "=&r" (f1) + : "ri" (mask)); + + return !!((f0^f1) & mask); +} + +/* Handle x86_32 PIC using ebx. */ +#if defined(__i386__) && defined(__PIC__) +# define EBX_REG "=r" +#else +# define EBX_REG "=b" +#endif + +static inline void cpuid(u32 id, u32 *a, u32 *b, u32 *c, u32 *d) +{ + asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t" + "cpuid \n\t" + ".ifnc %%ebx,%3 ; xchgl %%ebx,%3 ; .endif \n\t" + : "=a" (*a), "=c" (*c), "=d" (*d), EBX_REG (*b) + : "a" (id) + ); +} + +void get_cpuflags(void) +{ + u32 max_intel_level, max_amd_level; + u32 tfms; + u32 ignored; + + if (loaded_flags) + return; + loaded_flags = true; + + if (has_fpu()) + set_bit(X86_FEATURE_FPU, cpu.flags); + + if (has_eflag(X86_EFLAGS_ID)) { + cpuid(0x0, &max_intel_level, &cpu_vendor[0], &cpu_vendor[2], + &cpu_vendor[1]); + + if (max_intel_level >= 0x00000001 && + max_intel_level <= 0x0000ffff) { + cpuid(0x1, &tfms, &ignored, &cpu.flags[4], + &cpu.flags[0]); + cpu.level = (tfms >> 8) & 15; + cpu.model = (tfms >> 4) & 15; + if (cpu.level >= 6) + cpu.model += ((tfms >> 16) & 0xf) << 4; + } + + cpuid(0x80000000, &max_amd_level, &ignored, &ignored, + &ignored); + + if (max_amd_level >= 0x80000001 && + max_amd_level <= 0x8000ffff) { + cpuid(0x80000001, &ignored, &ignored, &cpu.flags[6], + &cpu.flags[1]); + } + } +} diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h new file mode 100644 index 000000000000..ea97697e51e4 --- /dev/null +++ b/arch/x86/boot/cpuflags.h @@ -0,0 +1,19 @@ +#ifndef BOOT_CPUFLAGS_H +#define BOOT_CPUFLAGS_H + +#include +#include + +struct cpu_features { + int level; /* Family, or 64 for x86-64 */ + int model; + u32 flags[NCAPINTS]; +}; + +extern struct cpu_features cpu; +extern u32 cpu_vendor[3]; + +int has_eflag(unsigned long mask); +void get_cpuflags(void); + +#endif diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c index c941d6a8887f..8e15b22391fc 100644 --- a/arch/x86/boot/tools/build.c +++ b/arch/x86/boot/tools/build.c @@ -5,14 +5,15 @@ */ /* - * This file builds a disk-image from two different files: + * This file builds a disk-image from three different files: * * - setup: 8086 machine code, sets up system parm * - system: 80386 code for actual system + * - zoffset.h: header with ZO_* defines * - * It does some checking that all files are of the correct type, and - * just writes the result to stdout, removing headers and padding to - * the right amount. It also writes some system data to stderr. + * It does some checking that all files are of the correct type, and writes + * the result to the specified destination, removing headers and padding to + * the right amount. It also writes some system data to stdout. */ /* @@ -136,7 +137,7 @@ static void die(const char * str, ...) static void usage(void) { - die("Usage: build setup system [zoffset.h] [> image]"); + die("Usage: build setup system zoffset.h image"); } #ifdef CONFIG_EFI_STUB @@ -265,7 +266,7 @@ int main(int argc, char ** argv) int c; u32 sys_size; struct stat sb; - FILE *file; + FILE *file, *dest; int fd; void *kernel; u32 crc = 0xffffffffUL; @@ -280,10 +281,13 @@ int main(int argc, char ** argv) startup_64 = 0x200; #endif - if (argc == 4) - parse_zoffset(argv[3]); - else if (argc != 3) + if (argc != 5) usage(); + parse_zoffset(argv[3]); + + dest = fopen(argv[4], "w"); + if (!dest) + die("Unable to write `%s': %m", argv[4]); /* Copy the setup code */ file = fopen(argv[1], "r"); @@ -318,7 +322,7 @@ int main(int argc, char ** argv) /* Set the default root device */ put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]); - fprintf(stderr, "Setup is %d bytes (padded to %d bytes).\n", c, i); + printf("Setup is %d bytes (padded to %d bytes).\n", c, i); /* Open and stat the kernel file */ fd = open(argv[2], O_RDONLY); @@ -327,7 +331,7 @@ int main(int argc, char ** argv) if (fstat(fd, &sb)) die("Unable to stat `%s': %m", argv[2]); sz = sb.st_size; - fprintf (stderr, "System is %d kB\n", (sz+1023)/1024); + printf("System is %d kB\n", (sz+1023)/1024); kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0); if (kernel == MAP_FAILED) die("Unable to mmap '%s': %m", argv[2]); @@ -348,27 +352,31 @@ int main(int argc, char ** argv) #endif crc = partial_crc32(buf, i, crc); - if (fwrite(buf, 1, i, stdout) != i) + if (fwrite(buf, 1, i, dest) != i) die("Writing setup failed"); /* Copy the kernel code */ crc = partial_crc32(kernel, sz, crc); - if (fwrite(kernel, 1, sz, stdout) != sz) + if (fwrite(kernel, 1, sz, dest) != sz) die("Writing kernel failed"); /* Add padding leaving 4 bytes for the checksum */ while (sz++ < (sys_size*16) - 4) { crc = partial_crc32_one('\0', crc); - if (fwrite("\0", 1, 1, stdout) != 1) + if (fwrite("\0", 1, 1, dest) != 1) die("Writing padding failed"); } /* Write the CRC */ - fprintf(stderr, "CRC %x\n", crc); + printf("CRC %x\n", crc); put_unaligned_le32(crc, buf); - if (fwrite(buf, 1, 4, stdout) != 4) + if (fwrite(buf, 1, 4, dest) != 4) die("Writing CRC failed"); + /* Catch any delayed write failures */ + if (fclose(dest)) + die("Writing image failed"); + close(fd); /* Everything is OK */ diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 0d9ec770f2f8..e6a92455740e 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -39,6 +39,20 @@ #ifdef CONFIG_ARCH_RANDOM +/* Instead of arch_get_random_long() when alternatives haven't run. */ +static inline int rdrand_long(unsigned long *v) +{ + int ok; + asm volatile("1: " RDRAND_LONG "\n\t" + "jc 2f\n\t" + "decl %0\n\t" + "jnz 1b\n\t" + "2:" + : "=r" (ok), "=a" (*v) + : "0" (RDRAND_RETRY_LOOPS)); + return ok; +} + #define GET_RANDOM(name, type, rdrand, nop) \ static inline int name(type *v) \ { \ @@ -68,6 +82,13 @@ GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3); #endif /* CONFIG_X86_64 */ +#else + +static inline int rdrand_long(unsigned long *v) +{ + return 0; +} + #endif /* CONFIG_ARCH_RANDOM */ extern void x86_init_rdrand(struct cpuinfo_x86 *c); diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index 722aa3b04624..da31c8b8a92d 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -6,6 +6,7 @@ #include #include #include +#include /* * Atomic operations that C can't guarantee us. Useful for @@ -76,12 +77,7 @@ static inline void atomic_sub(int i, atomic_t *v) */ static inline int atomic_sub_and_test(int i, atomic_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "subl %2,%0; sete %1" - : "+m" (v->counter), "=qm" (c) - : "ir" (i) : "memory"); - return c; + GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, i, "%0", "e"); } /** @@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *v) */ static inline int atomic_dec_and_test(atomic_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "decl %0; sete %1" - : "+m" (v->counter), "=qm" (c) - : : "memory"); - return c != 0; + GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e"); } /** @@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(atomic_t *v) */ static inline int atomic_inc_and_test(atomic_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "incl %0; sete %1" - : "+m" (v->counter), "=qm" (c) - : : "memory"); - return c != 0; + GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e"); } /** @@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(atomic_t *v) */ static inline int atomic_add_negative(int i, atomic_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "addl %2,%0; sets %1" - : "+m" (v->counter), "=qm" (c) - : "ir" (i) : "memory"); - return c; + GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, i, "%0", "s"); } /** diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h index 0e1cbfc8ee06..3f065c985aee 100644 --- a/arch/x86/include/asm/atomic64_64.h +++ b/arch/x86/include/asm/atomic64_64.h @@ -72,12 +72,7 @@ static inline void atomic64_sub(long i, atomic64_t *v) */ static inline int atomic64_sub_and_test(long i, atomic64_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "subq %2,%0; sete %1" - : "=m" (v->counter), "=qm" (c) - : "er" (i), "m" (v->counter) : "memory"); - return c; + GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, i, "%0", "e"); } /** @@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64_t *v) */ static inline int atomic64_dec_and_test(atomic64_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "decq %0; sete %1" - : "=m" (v->counter), "=qm" (c) - : "m" (v->counter) : "memory"); - return c != 0; + GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", "e"); } /** @@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(atomic64_t *v) */ static inline int atomic64_inc_and_test(atomic64_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "incq %0; sete %1" - : "=m" (v->counter), "=qm" (c) - : "m" (v->counter) : "memory"); - return c != 0; + GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", "e"); } /** @@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(atomic64_t *v) */ static inline int atomic64_add_negative(long i, atomic64_t *v) { - unsigned char c; - - asm volatile(LOCK_PREFIX "addq %2,%0; sets %1" - : "=m" (v->counter), "=qm" (c) - : "er" (i), "m" (v->counter) : "memory"); - return c; + GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, i, "%0", "s"); } /** diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 41639ce8fd63..6d76d0935989 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -14,6 +14,7 @@ #include #include +#include #if BITS_PER_LONG == 32 # define _BITOPS_LONG_SHIFT 5 @@ -204,12 +205,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr) */ static inline int test_and_set_bit(long nr, volatile unsigned long *addr) { - int oldbit; - - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" - "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); - - return oldbit; + GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, nr, "%0", "c"); } /** @@ -255,13 +251,7 @@ static inline int __test_and_set_bit(long nr, volatile unsigned long *addr) */ static inline int test_and_clear_bit(long nr, volatile unsigned long *addr) { - int oldbit; - - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" - "sbb %0,%0" - : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); - - return oldbit; + GEN_BINARY_RMWcc(LOCK_PREFIX "btr", *addr, nr, "%0", "c"); } /** @@ -314,13 +304,7 @@ static inline int __test_and_change_bit(long nr, volatile unsigned long *addr) */ static inline int test_and_change_bit(long nr, volatile unsigned long *addr) { - int oldbit; - - asm volatile(LOCK_PREFIX "btc %2,%1\n\t" - "sbb %0,%0" - : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); - - return oldbit; + GEN_BINARY_RMWcc(LOCK_PREFIX "btc", *addr, nr, "%0", "c"); } static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr) diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h index 0fa675033912..cb4c73bfeb48 100644 --- a/arch/x86/include/asm/calling.h +++ b/arch/x86/include/asm/calling.h @@ -48,6 +48,8 @@ For 32-bit we have the following conventions - kernel is built with #include +#ifdef CONFIG_X86_64 + /* * 64-bit system call stack frame layout defines and helpers, * for assembly code: @@ -192,3 +194,51 @@ For 32-bit we have the following conventions - kernel is built with .macro icebp .byte 0xf1 .endm + +#else /* CONFIG_X86_64 */ + +/* + * For 32bit only simplified versions of SAVE_ALL/RESTORE_ALL. These + * are different from the entry_32.S versions in not changing the segment + * registers. So only suitable for in kernel use, not when transitioning + * from or to user space. The resulting stack frame is not a standard + * pt_regs frame. The main use case is calling C code from assembler + * when all the registers need to be preserved. + */ + + .macro SAVE_ALL + pushl_cfi %eax + CFI_REL_OFFSET eax, 0 + pushl_cfi %ebp + CFI_REL_OFFSET ebp, 0 + pushl_cfi %edi + CFI_REL_OFFSET edi, 0 + pushl_cfi %esi + CFI_REL_OFFSET esi, 0 + pushl_cfi %edx + CFI_REL_OFFSET edx, 0 + pushl_cfi %ecx + CFI_REL_OFFSET ecx, 0 + pushl_cfi %ebx + CFI_REL_OFFSET ebx, 0 + .endm + + .macro RESTORE_ALL + popl_cfi %ebx + CFI_RESTORE ebx + popl_cfi %ecx + CFI_RESTORE ecx + popl_cfi %edx + CFI_RESTORE edx + popl_cfi %esi + CFI_RESTORE esi + popl_cfi %edi + CFI_RESTORE edi + popl_cfi %ebp + CFI_RESTORE ebp + popl_cfi %eax + CFI_RESTORE eax + .endm + +#endif /* CONFIG_X86_64 */ + diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index d3f5c63078d8..89270b4318db 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -374,7 +374,7 @@ static __always_inline __pure bool __static_cpu_has(u16 bit) * Catch too early usage of this before alternatives * have run. */ - asm goto("1: jmp %l[t_warn]\n" + asm_volatile_goto("1: jmp %l[t_warn]\n" "2:\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" @@ -388,7 +388,7 @@ static __always_inline __pure bool __static_cpu_has(u16 bit) #endif - asm goto("1: jmp %l[t_no]\n" + asm_volatile_goto("1: jmp %l[t_no]\n" "2:\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" @@ -453,7 +453,7 @@ static __always_inline __pure bool _static_cpu_has_safe(u16 bit) * have. Thus, we force the jump to the widest, 4-byte, signed relative * offset even though the last would often fit in less bytes. */ - asm goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n" + asm_volatile_goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n" "2:\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" /* src offset */ diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h new file mode 100644 index 000000000000..459769d39263 --- /dev/null +++ b/arch/x86/include/asm/intel-mid.h @@ -0,0 +1,113 @@ +/* + * intel-mid.h: Intel MID specific setup code + * + * (C) Copyright 2009 Intel Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ +#ifndef _ASM_X86_INTEL_MID_H +#define _ASM_X86_INTEL_MID_H + +#include +#include + +extern int intel_mid_pci_init(void); +extern int get_gpio_by_name(const char *name); +extern void intel_scu_device_register(struct platform_device *pdev); +extern int __init sfi_parse_mrtc(struct sfi_table_header *table); +extern int __init sfi_parse_mtmr(struct sfi_table_header *table); +extern int sfi_mrtc_num; +extern struct sfi_rtc_table_entry sfi_mrtc_array[]; + +/* + * Here defines the array of devices platform data that IAFW would export + * through SFI "DEVS" table, we use name and type to match the device and + * its platform data. + */ +struct devs_id { + char name[SFI_NAME_LEN + 1]; + u8 type; + u8 delay; + void *(*get_platform_data)(void *info); + /* Custom handler for devices */ + void (*device_handler)(struct sfi_device_table_entry *pentry, + struct devs_id *dev); +}; + +#define sfi_device(i) \ + static const struct devs_id *const __intel_mid_sfi_##i##_dev __used \ + __attribute__((__section__(".x86_intel_mid_dev.init"))) = &i + +/* + * Medfield is the follow-up of Moorestown, it combines two chip solution into + * one. Other than that it also added always-on and constant tsc and lapic + * timers. Medfield is the platform name, and the chip name is called Penwell + * we treat Medfield/Penwell as a variant of Moorestown. Penwell can be + * identified via MSRs. + */ +enum intel_mid_cpu_type { + /* 1 was Moorestown */ + INTEL_MID_CPU_CHIP_PENWELL = 2, +}; + +extern enum intel_mid_cpu_type __intel_mid_cpu_chip; + +#ifdef CONFIG_X86_INTEL_MID + +static inline enum intel_mid_cpu_type intel_mid_identify_cpu(void) +{ + return __intel_mid_cpu_chip; +} + +static inline bool intel_mid_has_msic(void) +{ + return (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_PENWELL); +} + +#else /* !CONFIG_X86_INTEL_MID */ + +#define intel_mid_identify_cpu() (0) +#define intel_mid_has_msic() (0) + +#endif /* !CONFIG_X86_INTEL_MID */ + +enum intel_mid_timer_options { + INTEL_MID_TIMER_DEFAULT, + INTEL_MID_TIMER_APBT_ONLY, + INTEL_MID_TIMER_LAPIC_APBT, +}; + +extern enum intel_mid_timer_options intel_mid_timer_options; + +/* + * Penwell uses spread spectrum clock, so the freq number is not exactly + * the same as reported by MSR based on SDM. + */ +#define PENWELL_FSB_FREQ_83SKU 83200 +#define PENWELL_FSB_FREQ_100SKU 99840 + +#define SFI_MTMR_MAX_NUM 8 +#define SFI_MRTC_MAX 8 + +extern struct console early_mrst_console; +extern void mrst_early_console_init(void); + +extern struct console early_hsu_console; +extern void hsu_early_console_init(const char *); + +extern void intel_scu_devices_create(void); +extern void intel_scu_devices_destroy(void); + +/* VRTC timer */ +#define MRST_VRTC_MAP_SZ (1024) +/*#define MRST_VRTC_PGOFFSET (0xc00) */ + +extern void intel_mid_rtc_init(void); + +/* the offset for the mapping of global gpio pin to irq */ +#define INTEL_MID_IRQ_OFFSET 0x100 + +#endif /* _ASM_X86_INTEL_MID_H */ diff --git a/arch/x86/include/asm/mrst-vrtc.h b/arch/x86/include/asm/intel_mid_vrtc.h similarity index 81% rename from arch/x86/include/asm/mrst-vrtc.h rename to arch/x86/include/asm/intel_mid_vrtc.h index 1e69a75412a4..86ff4685c409 100644 --- a/arch/x86/include/asm/mrst-vrtc.h +++ b/arch/x86/include/asm/intel_mid_vrtc.h @@ -1,5 +1,5 @@ -#ifndef _MRST_VRTC_H -#define _MRST_VRTC_H +#ifndef _INTEL_MID_VRTC_H +#define _INTEL_MID_VRTC_H extern unsigned char vrtc_cmos_read(unsigned char reg); extern void vrtc_cmos_write(unsigned char val, unsigned char reg); diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index 64507f35800c..6a2cefb4395a 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -18,7 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) { - asm goto("1:" + asm_volatile_goto("1:" ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h index 2d89e3980cbd..5b23e605e707 100644 --- a/arch/x86/include/asm/local.h +++ b/arch/x86/include/asm/local.h @@ -52,12 +52,7 @@ static inline void local_sub(long i, local_t *l) */ static inline int local_sub_and_test(long i, local_t *l) { - unsigned char c; - - asm volatile(_ASM_SUB "%2,%0; sete %1" - : "+m" (l->a.counter), "=qm" (c) - : "ir" (i) : "memory"); - return c; + GEN_BINARY_RMWcc(_ASM_SUB, l->a.counter, i, "%0", "e"); } /** @@ -70,12 +65,7 @@ static inline int local_sub_and_test(long i, local_t *l) */ static inline int local_dec_and_test(local_t *l) { - unsigned char c; - - asm volatile(_ASM_DEC "%0; sete %1" - : "+m" (l->a.counter), "=qm" (c) - : : "memory"); - return c != 0; + GEN_UNARY_RMWcc(_ASM_DEC, l->a.counter, "%0", "e"); } /** @@ -88,12 +78,7 @@ static inline int local_dec_and_test(local_t *l) */ static inline int local_inc_and_test(local_t *l) { - unsigned char c; - - asm volatile(_ASM_INC "%0; sete %1" - : "+m" (l->a.counter), "=qm" (c) - : : "memory"); - return c != 0; + GEN_UNARY_RMWcc(_ASM_INC, l->a.counter, "%0", "e"); } /** @@ -107,12 +92,7 @@ static inline int local_inc_and_test(local_t *l) */ static inline int local_add_negative(long i, local_t *l) { - unsigned char c; - - asm volatile(_ASM_ADD "%2,%0; sets %1" - : "+m" (l->a.counter), "=qm" (c) - : "ir" (i) : "memory"); - return c; + GEN_BINARY_RMWcc(_ASM_ADD, l->a.counter, i, "%0", "s"); } /** diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h new file mode 100644 index 000000000000..475f5bbc7f53 --- /dev/null +++ b/arch/x86/include/asm/misc.h @@ -0,0 +1,6 @@ +#ifndef _ASM_X86_MISC_H +#define _ASM_X86_MISC_H + +int num_digits(int val); + +#endif /* _ASM_X86_MISC_H */ diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h deleted file mode 100644 index fc18bf3ce7c8..000000000000 --- a/arch/x86/include/asm/mrst.h +++ /dev/null @@ -1,81 +0,0 @@ -/* - * mrst.h: Intel Moorestown platform specific setup code - * - * (C) Copyright 2009 Intel Corporation - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; version 2 - * of the License. - */ -#ifndef _ASM_X86_MRST_H -#define _ASM_X86_MRST_H - -#include - -extern int pci_mrst_init(void); -extern int __init sfi_parse_mrtc(struct sfi_table_header *table); -extern int sfi_mrtc_num; -extern struct sfi_rtc_table_entry sfi_mrtc_array[]; - -/* - * Medfield is the follow-up of Moorestown, it combines two chip solution into - * one. Other than that it also added always-on and constant tsc and lapic - * timers. Medfield is the platform name, and the chip name is called Penwell - * we treat Medfield/Penwell as a variant of Moorestown. Penwell can be - * identified via MSRs. - */ -enum mrst_cpu_type { - /* 1 was Moorestown */ - MRST_CPU_CHIP_PENWELL = 2, -}; - -extern enum mrst_cpu_type __mrst_cpu_chip; - -#ifdef CONFIG_X86_INTEL_MID - -static inline enum mrst_cpu_type mrst_identify_cpu(void) -{ - return __mrst_cpu_chip; -} - -#else /* !CONFIG_X86_INTEL_MID */ - -#define mrst_identify_cpu() (0) - -#endif /* !CONFIG_X86_INTEL_MID */ - -enum mrst_timer_options { - MRST_TIMER_DEFAULT, - MRST_TIMER_APBT_ONLY, - MRST_TIMER_LAPIC_APBT, -}; - -extern enum mrst_timer_options mrst_timer_options; - -/* - * Penwell uses spread spectrum clock, so the freq number is not exactly - * the same as reported by MSR based on SDM. - */ -#define PENWELL_FSB_FREQ_83SKU 83200 -#define PENWELL_FSB_FREQ_100SKU 99840 - -#define SFI_MTMR_MAX_NUM 8 -#define SFI_MRTC_MAX 8 - -extern struct console early_mrst_console; -extern void mrst_early_console_init(void); - -extern struct console early_hsu_console; -extern void hsu_early_console_init(const char *); - -extern void intel_scu_devices_create(void); -extern void intel_scu_devices_destroy(void); - -/* VRTC timer */ -#define MRST_VRTC_MAP_SZ (1024) -/*#define MRST_VRTC_PGOFFSET (0xc00) */ - -extern void mrst_rtc_init(void); - -#endif /* _ASM_X86_MRST_H */ diff --git a/arch/x86/include/asm/mutex_64.h b/arch/x86/include/asm/mutex_64.h index e7e6751648ed..07537a44216e 100644 --- a/arch/x86/include/asm/mutex_64.h +++ b/arch/x86/include/asm/mutex_64.h @@ -20,7 +20,7 @@ static inline void __mutex_fastpath_lock(atomic_t *v, void (*fail_fn)(atomic_t *)) { - asm volatile goto(LOCK_PREFIX " decl %0\n" + asm_volatile_goto(LOCK_PREFIX " decl %0\n" " jns %l[exit]\n" : : "m" (v->counter) : "memory", "cc" @@ -75,7 +75,7 @@ static inline int __mutex_fastpath_lock_retval(atomic_t *count) static inline void __mutex_fastpath_unlock(atomic_t *v, void (*fail_fn)(atomic_t *)) { - asm volatile goto(LOCK_PREFIX " incl %0\n" + asm_volatile_goto(LOCK_PREFIX " incl %0\n" " jg %l[exit]\n" : : "m" (v->counter) : "memory", "cc" diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 43dcd804ebd5..8de6d9cf3b95 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -39,9 +39,18 @@ #define __VIRTUAL_MASK_SHIFT 47 /* - * Kernel image size is limited to 512 MB (see level2_kernel_pgt in - * arch/x86/kernel/head_64.S), and it is mapped here: + * Kernel image size is limited to 1GiB due to the fixmap living in the + * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use + * 512MiB by default, leaving 1.5GiB for modules once the page tables + * are fully set up. If kernel ASLR is configured, it can extend the + * kernel page table mapping, reducing the size of the modules area. */ -#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) +#define KERNEL_IMAGE_SIZE_DEFAULT (512 * 1024 * 1024) +#if defined(CONFIG_RANDOMIZE_BASE) && \ + CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT +#define KERNEL_IMAGE_SIZE CONFIG_RANDOMIZE_BASE_MAX_OFFSET +#else +#define KERNEL_IMAGE_SIZE KERNEL_IMAGE_SIZE_DEFAULT +#endif #endif /* _ASM_X86_PAGE_64_DEFS_H */ diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 2d883440cb9a..c883bf726398 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -58,7 +58,7 @@ typedef struct { pteval_t pte; } pte_t; #define VMALLOC_START _AC(0xffffc90000000000, UL) #define VMALLOC_END _AC(0xffffe8ffffffffff, UL) #define VMEMMAP_START _AC(0xffffea0000000000, UL) -#define MODULES_VADDR _AC(0xffffffffa0000000, UL) +#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) #define MODULES_END _AC(0xffffffffff000000, UL) #define MODULES_LEN (MODULES_END - MODULES_VADDR) diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h new file mode 100644 index 000000000000..8729723636fd --- /dev/null +++ b/arch/x86/include/asm/preempt.h @@ -0,0 +1,100 @@ +#ifndef __ASM_PREEMPT_H +#define __ASM_PREEMPT_H + +#include +#include +#include + +DECLARE_PER_CPU(int, __preempt_count); + +/* + * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users + * that think a non-zero value indicates we cannot preempt. + */ +static __always_inline int preempt_count(void) +{ + return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED; +} + +static __always_inline void preempt_count_set(int pc) +{ + __this_cpu_write_4(__preempt_count, pc); +} + +/* + * must be macros to avoid header recursion hell + */ +#define task_preempt_count(p) \ + (task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED) + +#define init_task_preempt_count(p) do { \ + task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \ +} while (0) + +#define init_idle_preempt_count(p, cpu) do { \ + task_thread_info(p)->saved_preempt_count = PREEMPT_ENABLED; \ + per_cpu(__preempt_count, (cpu)) = PREEMPT_ENABLED; \ +} while (0) + +/* + * We fold the NEED_RESCHED bit into the preempt count such that + * preempt_enable() can decrement and test for needing to reschedule with a + * single instruction. + * + * We invert the actual bit, so that when the decrement hits 0 we know we both + * need to resched (the bit is cleared) and can resched (no preempt count). + */ + +static __always_inline void set_preempt_need_resched(void) +{ + __this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED); +} + +static __always_inline void clear_preempt_need_resched(void) +{ + __this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED); +} + +static __always_inline bool test_preempt_need_resched(void) +{ + return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED); +} + +/* + * The various preempt_count add/sub methods + */ + +static __always_inline void __preempt_count_add(int val) +{ + __this_cpu_add_4(__preempt_count, val); +} + +static __always_inline void __preempt_count_sub(int val) +{ + __this_cpu_add_4(__preempt_count, -val); +} + +static __always_inline bool __preempt_count_dec_and_test(void) +{ + GEN_UNARY_RMWcc("decl", __preempt_count, __percpu_arg(0), "e"); +} + +/* + * Returns true when we need to resched and can (barring IRQ state). + */ +static __always_inline bool should_resched(void) +{ + return unlikely(!__this_cpu_read_4(__preempt_count)); +} + +#ifdef CONFIG_PREEMPT + extern asmlinkage void ___preempt_schedule(void); +# define __preempt_schedule() asm ("call ___preempt_schedule") + extern asmlinkage void preempt_schedule(void); +# ifdef CONFIG_CONTEXT_TRACKING + extern asmlinkage void ___preempt_schedule_context(void); +# define __preempt_schedule_context() asm ("call ___preempt_schedule_context") +# endif +#endif + +#endif /* __ASM_PREEMPT_H */ diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h new file mode 100644 index 000000000000..1ff990f1de8e --- /dev/null +++ b/arch/x86/include/asm/rmwcc.h @@ -0,0 +1,41 @@ +#ifndef _ASM_X86_RMWcc +#define _ASM_X86_RMWcc + +#ifdef CC_HAVE_ASM_GOTO + +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \ + : : "m" (var), ## __VA_ARGS__ \ + : "memory" : cc_label); \ + return 0; \ +cc_label: \ + return 1; \ +} while (0) + +#define GEN_UNARY_RMWcc(op, var, arg0, cc) \ + __GEN_RMWcc(op " " arg0, var, cc) + +#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \ + __GEN_RMWcc(op " %1, " arg0, var, cc, "er" (val)) + +#else /* !CC_HAVE_ASM_GOTO */ + +#define __GEN_RMWcc(fullop, var, cc, ...) \ +do { \ + char c; \ + asm volatile (fullop "; set" cc " %1" \ + : "+m" (var), "=qm" (c) \ + : __VA_ARGS__ : "memory"); \ + return c != 0; \ +} while (0) + +#define GEN_UNARY_RMWcc(op, var, arg0, cc) \ + __GEN_RMWcc(op " " arg0, var, cc) + +#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \ + __GEN_RMWcc(op " %2, " arg0, var, cc, "er" (val)) + +#endif /* CC_HAVE_ASM_GOTO */ + +#endif /* _ASM_X86_RMWcc */ diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h index 347555492dad..59bcf4e22418 100644 --- a/arch/x86/include/asm/setup.h +++ b/arch/x86/include/asm/setup.h @@ -51,9 +51,9 @@ extern void i386_reserve_resources(void); extern void setup_default_timer_irq(void); #ifdef CONFIG_X86_INTEL_MID -extern void x86_mrst_early_setup(void); +extern void x86_intel_mid_early_setup(void); #else -static inline void x86_mrst_early_setup(void) { } +static inline void x86_intel_mid_early_setup(void) { } #endif #ifdef CONFIG_X86_INTEL_CE diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 27811190cbd7..c46a46be1ec6 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -28,8 +28,7 @@ struct thread_info { __u32 flags; /* low level flags */ __u32 status; /* thread synchronous flags */ __u32 cpu; /* current CPU */ - int preempt_count; /* 0 => preemptable, - <0 => BUG */ + int saved_preempt_count; mm_segment_t addr_limit; struct restart_block restart_block; void __user *sysenter_return; @@ -49,7 +48,7 @@ struct thread_info { .exec_domain = &default_exec_domain, \ .flags = 0, \ .cpu = 0, \ - .preempt_count = INIT_PREEMPT_COUNT, \ + .saved_preempt_count = INIT_PREEMPT_COUNT, \ .addr_limit = KERNEL_DS, \ .restart_block = { \ .fn = do_no_restart_syscall, \ diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 6aef9fbc09b7..b913915e8e63 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -79,30 +79,38 @@ static inline int phys_to_machine_mapping_valid(unsigned long pfn) return get_phys_to_machine(pfn) != INVALID_P2M_ENTRY; } -static inline unsigned long mfn_to_pfn(unsigned long mfn) +static inline unsigned long mfn_to_pfn_no_overrides(unsigned long mfn) { unsigned long pfn; - int ret = 0; + int ret; if (xen_feature(XENFEAT_auto_translated_physmap)) return mfn; - if (unlikely(mfn >= machine_to_phys_nr)) { - pfn = ~0; - goto try_override; - } - pfn = 0; + if (unlikely(mfn >= machine_to_phys_nr)) + return ~0; + /* * The array access can fail (e.g., device space beyond end of RAM). * In such cases it doesn't matter what we return (we return garbage), * but we must handle the fault without crashing! */ ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); -try_override: - /* ret might be < 0 if there are no entries in the m2p for mfn */ if (ret < 0) - pfn = ~0; - else if (get_phys_to_machine(pfn) != mfn) + return ~0; + + return pfn; +} + +static inline unsigned long mfn_to_pfn(unsigned long mfn) +{ + unsigned long pfn; + + if (xen_feature(XENFEAT_auto_translated_physmap)) + return mfn; + + pfn = mfn_to_pfn_no_overrides(mfn); + if (get_phys_to_machine(pfn) != mfn) { /* * If this appears to be a foreign mfn (because the pfn * doesn't map back to the mfn), then check the local override @@ -111,6 +119,7 @@ try_override: * m2p_find_override_pfn returns ~0 if it doesn't find anything. */ pfn = m2p_find_override_pfn(mfn, ~0); + } /* * pfn is ~0 if there are no entries in the m2p for mfn or if the diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index c15ddaf90710..9c3733c5f8f7 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -158,7 +158,7 @@ enum { X86_SUBARCH_PC = 0, X86_SUBARCH_LGUEST, X86_SUBARCH_XEN, - X86_SUBARCH_MRST, + X86_SUBARCH_INTEL_MID, X86_SUBARCH_CE4100, X86_NR_SUBARCHS, }; diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index a5408b965c9d..9b0a34e2cd79 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -36,6 +36,8 @@ obj-y += tsc.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-$(CONFIG_PREEMPT) += preempt.o + obj-y += process.o obj-y += i387.o xsave.o obj-y += ptrace.o diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 15e8563e5c24..df94598ad05a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -402,17 +402,6 @@ void alternatives_enable_smp(void) { struct smp_alt_module *mod; -#ifdef CONFIG_LOCKDEP - /* - * Older binutils section handling bug prevented - * alternatives-replacement from working reliably. - * - * If this still occurs then you should see a hang - * or crash shortly after this line: - */ - pr_info("lockdep: fixing up alternatives\n"); -#endif - /* Why bother if there are no other CPUs? */ BUG_ON(num_possible_cpus() == 1); diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c index c9876efecafb..af5b08ab3b71 100644 --- a/arch/x86/kernel/apb_timer.c +++ b/arch/x86/kernel/apb_timer.c @@ -40,7 +40,7 @@ #include #include -#include +#include #include #define APBT_CLOCKEVENT_RATING 110 @@ -157,13 +157,13 @@ static int __init apbt_clockevent_register(void) adev->num = smp_processor_id(); adev->timer = dw_apb_clockevent_init(smp_processor_id(), "apbt0", - mrst_timer_options == MRST_TIMER_LAPIC_APBT ? + intel_mid_timer_options == INTEL_MID_TIMER_LAPIC_APBT ? APBT_CLOCKEVENT_RATING - 100 : APBT_CLOCKEVENT_RATING, adev_virt_addr(adev), 0, apbt_freq); /* Firmware does EOI handling for us. */ adev->timer->eoi = NULL; - if (mrst_timer_options == MRST_TIMER_LAPIC_APBT) { + if (intel_mid_timer_options == INTEL_MID_TIMER_LAPIC_APBT) { global_clock_event = &adev->timer->ced; printk(KERN_DEBUG "%s clockevent registered as global\n", global_clock_event->name); @@ -253,7 +253,7 @@ static int apbt_cpuhp_notify(struct notifier_block *n, static __init int apbt_late_init(void) { - if (mrst_timer_options == MRST_TIMER_LAPIC_APBT || + if (intel_mid_timer_options == INTEL_MID_TIMER_LAPIC_APBT || !apb_timer_block_enabled) return 0; /* This notifier should be called after workqueue is ready */ @@ -340,7 +340,7 @@ void __init apbt_time_init(void) } #ifdef CONFIG_SMP /* kernel cmdline disable apb timer, so we will use lapic timers */ - if (mrst_timer_options == MRST_TIMER_LAPIC_APBT) { + if (intel_mid_timer_options == INTEL_MID_TIMER_LAPIC_APBT) { printk(KERN_INFO "apbt: disabled per cpu timer\n"); return; } diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 1191ac1c9d25..a419814cea57 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -113,7 +113,7 @@ static int __init early_get_pnodeid(void) break; case UV3_HUB_PART_NUMBER: case UV3_HUB_PART_NUMBER_X: - uv_min_hub_revision_id += UV3_HUB_REVISION_BASE - 1; + uv_min_hub_revision_id += UV3_HUB_REVISION_BASE; break; } diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 28610822fb3c..9f6b9341950f 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -32,7 +32,6 @@ void common(void) { OFFSET(TI_flags, thread_info, flags); OFFSET(TI_status, thread_info, status); OFFSET(TI_addr_limit, thread_info, addr_limit); - OFFSET(TI_preempt_count, thread_info, preempt_count); BLANK(); OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 2793d1f095a2..5223fe6dec7b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) = DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1; +DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; +EXPORT_PER_CPU_SYMBOL(__preempt_count); + DEFINE_PER_CPU(struct task_struct *, fpu_owner_task); /* @@ -1169,6 +1172,8 @@ void debug_stack_reset(void) DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task; EXPORT_PER_CPU_SYMBOL(current_task); +DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; +EXPORT_PER_CPU_SYMBOL(__preempt_count); DEFINE_PER_CPU(struct task_struct *, fpu_owner_task); #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index a9c606bb4945..9d8449158cf9 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -1506,7 +1506,7 @@ static int __init init_hw_perf_events(void) err = amd_pmu_init(); break; default: - return 0; + err = -ENOTSUPP; } if (err != 0) { pr_cont("no PMU driver, software events only.\n"); @@ -1888,10 +1888,7 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now) userpg->cap_user_rdpmc = x86_pmu.attr_rdpmc; userpg->pmc_width = x86_pmu.cntval_bits; - if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) - return; - - if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) + if (!sched_clock_stable) return; userpg->cap_user_time = 1; @@ -1899,10 +1896,8 @@ void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now) userpg->time_shift = CYC2NS_SCALE_FACTOR; userpg->time_offset = this_cpu_read(cyc2ns_offset) - now; - if (sched_clock_stable && !check_tsc_disabled()) { - userpg->cap_user_time_zero = 1; - userpg->time_zero = this_cpu_read(cyc2ns_offset); - } + userpg->cap_user_time_zero = 1; + userpg->time_zero = this_cpu_read(cyc2ns_offset); } /* diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index cc16faae0538..fd00bb29425d 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -163,6 +163,11 @@ struct cpu_hw_events { u64 intel_ctrl_host_mask; struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX]; + /* + * Intel checkpoint mask + */ + u64 intel_cp_status; + /* * manage shared (per-core, per-cpu) registers * used on Intel NHM/WSM/SNB @@ -440,6 +445,7 @@ struct x86_pmu { int lbr_nr; /* hardware stack size */ u64 lbr_sel_mask; /* LBR_SELECT valid bits */ const int *lbr_sel_map; /* lbr_select mappings */ + bool lbr_double_abort; /* duplicated lbr aborts */ /* * Extra registers for events diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index f31a1655d1ff..0fa4f242f050 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -190,9 +190,9 @@ static struct extra_reg intel_snbep_extra_regs[] __read_mostly = { EVENT_EXTRA_END }; -EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3"); -EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); -EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2"); +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3"); +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); +EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2"); struct attribute *nhm_events_attrs[] = { EVENT_PTR(mem_ld_nhm), @@ -1184,6 +1184,11 @@ static void intel_pmu_disable_fixed(struct hw_perf_event *hwc) wrmsrl(hwc->config_base, ctrl_val); } +static inline bool event_is_checkpointed(struct perf_event *event) +{ + return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0; +} + static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; @@ -1197,6 +1202,7 @@ static void intel_pmu_disable_event(struct perf_event *event) cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx); cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx); + cpuc->intel_cp_status &= ~(1ull << hwc->idx); /* * must disable before any actual event @@ -1271,6 +1277,9 @@ static void intel_pmu_enable_event(struct perf_event *event) if (event->attr.exclude_guest) cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx); + if (unlikely(event_is_checkpointed(event))) + cpuc->intel_cp_status |= (1ull << hwc->idx); + if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_enable_fixed(hwc); return; @@ -1289,6 +1298,17 @@ static void intel_pmu_enable_event(struct perf_event *event) int intel_pmu_save_and_restart(struct perf_event *event) { x86_perf_event_update(event); + /* + * For a checkpointed counter always reset back to 0. This + * avoids a situation where the counter overflows, aborts the + * transaction and is then set back to shortly before the + * overflow, and overflows and aborts again. + */ + if (unlikely(event_is_checkpointed(event))) { + /* No race with NMIs because the counter should not be armed */ + wrmsrl(event->hw.event_base, 0); + local64_set(&event->hw.prev_count, 0); + } return x86_perf_event_set_period(event); } @@ -1372,6 +1392,13 @@ again: x86_pmu.drain_pebs(regs); } + /* + * Checkpointed counters can lead to 'spurious' PMIs because the + * rollback caused by the PMI will have cleared the overflow status + * bit. Therefore always force probe these counters. + */ + status |= cpuc->intel_cp_status; + for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) { struct perf_event *event = cpuc->events[bit]; @@ -1837,6 +1864,20 @@ static int hsw_hw_config(struct perf_event *event) event->attr.precise_ip > 0)) return -EOPNOTSUPP; + if (event_is_checkpointed(event)) { + /* + * Sampling of checkpointed events can cause situations where + * the CPU constantly aborts because of a overflow, which is + * then checkpointed back and ignored. Forbid checkpointing + * for sampling. + * + * But still allow a long sampling period, so that perf stat + * from KVM works. + */ + if (event->attr.sample_period > 0 && + event->attr.sample_period < 0x7fffffff) + return -EOPNOTSUPP; + } return 0; } @@ -2182,10 +2223,36 @@ static __init void intel_nehalem_quirk(void) } } -EVENT_ATTR_STR(mem-loads, mem_ld_hsw, "event=0xcd,umask=0x1,ldlat=3"); -EVENT_ATTR_STR(mem-stores, mem_st_hsw, "event=0xd0,umask=0x82") +EVENT_ATTR_STR(mem-loads, mem_ld_hsw, "event=0xcd,umask=0x1,ldlat=3"); +EVENT_ATTR_STR(mem-stores, mem_st_hsw, "event=0xd0,umask=0x82") + +/* Haswell special events */ +EVENT_ATTR_STR(tx-start, tx_start, "event=0xc9,umask=0x1"); +EVENT_ATTR_STR(tx-commit, tx_commit, "event=0xc9,umask=0x2"); +EVENT_ATTR_STR(tx-abort, tx_abort, "event=0xc9,umask=0x4"); +EVENT_ATTR_STR(tx-capacity, tx_capacity, "event=0x54,umask=0x2"); +EVENT_ATTR_STR(tx-conflict, tx_conflict, "event=0x54,umask=0x1"); +EVENT_ATTR_STR(el-start, el_start, "event=0xc8,umask=0x1"); +EVENT_ATTR_STR(el-commit, el_commit, "event=0xc8,umask=0x2"); +EVENT_ATTR_STR(el-abort, el_abort, "event=0xc8,umask=0x4"); +EVENT_ATTR_STR(el-capacity, el_capacity, "event=0x54,umask=0x2"); +EVENT_ATTR_STR(el-conflict, el_conflict, "event=0x54,umask=0x1"); +EVENT_ATTR_STR(cycles-t, cycles_t, "event=0x3c,in_tx=1"); +EVENT_ATTR_STR(cycles-ct, cycles_ct, "event=0x3c,in_tx=1,in_tx_cp=1"); static struct attribute *hsw_events_attrs[] = { + EVENT_PTR(tx_start), + EVENT_PTR(tx_commit), + EVENT_PTR(tx_abort), + EVENT_PTR(tx_capacity), + EVENT_PTR(tx_conflict), + EVENT_PTR(el_start), + EVENT_PTR(el_commit), + EVENT_PTR(el_abort), + EVENT_PTR(el_capacity), + EVENT_PTR(el_conflict), + EVENT_PTR(cycles_t), + EVENT_PTR(cycles_ct), EVENT_PTR(mem_ld_hsw), EVENT_PTR(mem_st_hsw), NULL @@ -2452,6 +2519,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.lbr_double_abort = true; pr_cont("Haswell events, "); break; diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index ab3ba1c1b7dd..c1760ff3c757 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -12,6 +12,7 @@ #define BTS_BUFFER_SIZE (PAGE_SIZE << 4) #define PEBS_BUFFER_SIZE PAGE_SIZE +#define PEBS_FIXUP_SIZE PAGE_SIZE /* * pebs_record_32 for p4 and core not supported @@ -182,18 +183,32 @@ struct pebs_record_nhm { * Same as pebs_record_nhm, with two additional fields. */ struct pebs_record_hsw { - struct pebs_record_nhm nhm; - /* - * Real IP of the event. In the Intel documentation this - * is called eventingrip. - */ - u64 real_ip; - /* - * TSX tuning information field: abort cycles and abort flags. - */ - u64 tsx_tuning; + u64 flags, ip; + u64 ax, bx, cx, dx; + u64 si, di, bp, sp; + u64 r8, r9, r10, r11; + u64 r12, r13, r14, r15; + u64 status, dla, dse, lat; + u64 real_ip, tsx_tuning; +}; + +union hsw_tsx_tuning { + struct { + u32 cycles_last_block : 32, + hle_abort : 1, + rtm_abort : 1, + instruction_abort : 1, + non_instruction_abort : 1, + retry : 1, + data_conflict : 1, + capacity_writes : 1, + capacity_reads : 1; + }; + u64 value; }; +#define PEBS_HSW_TSX_FLAGS 0xff00000000ULL + void init_debug_store_on_cpu(int cpu) { struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; @@ -214,12 +229,14 @@ void fini_debug_store_on_cpu(int cpu) wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0); } +static DEFINE_PER_CPU(void *, insn_buffer); + static int alloc_pebs_buffer(int cpu) { struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; int node = cpu_to_node(cpu); int max, thresh = 1; /* always use a single PEBS record */ - void *buffer; + void *buffer, *ibuffer; if (!x86_pmu.pebs) return 0; @@ -228,6 +245,19 @@ static int alloc_pebs_buffer(int cpu) if (unlikely(!buffer)) return -ENOMEM; + /* + * HSW+ already provides us the eventing ip; no need to allocate this + * buffer then. + */ + if (x86_pmu.intel_cap.pebs_format < 2) { + ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node); + if (!ibuffer) { + kfree(buffer); + return -ENOMEM; + } + per_cpu(insn_buffer, cpu) = ibuffer; + } + max = PEBS_BUFFER_SIZE / x86_pmu.pebs_record_size; ds->pebs_buffer_base = (u64)(unsigned long)buffer; @@ -248,6 +278,9 @@ static void release_pebs_buffer(int cpu) if (!ds || !x86_pmu.pebs) return; + kfree(per_cpu(insn_buffer, cpu)); + per_cpu(insn_buffer, cpu) = NULL; + kfree((void *)(unsigned long)ds->pebs_buffer_base); ds->pebs_buffer_base = 0; } @@ -715,6 +748,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs) unsigned long old_to, to = cpuc->lbr_entries[0].to; unsigned long ip = regs->ip; int is_64bit = 0; + void *kaddr; /* * We don't need to fixup if the PEBS assist is fault like @@ -738,7 +772,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs) * unsigned math, either ip is before the start (impossible) or * the basic block is larger than 1 page (sanity) */ - if ((ip - to) > PAGE_SIZE) + if ((ip - to) > PEBS_FIXUP_SIZE) return 0; /* @@ -749,29 +783,33 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs) return 1; } + if (!kernel_ip(ip)) { + int size, bytes; + u8 *buf = this_cpu_read(insn_buffer); + + size = ip - to; /* Must fit our buffer, see above */ + bytes = copy_from_user_nmi(buf, (void __user *)to, size); + if (bytes != size) + return 0; + + kaddr = buf; + } else { + kaddr = (void *)to; + } + do { struct insn insn; - u8 buf[MAX_INSN_SIZE]; - void *kaddr; old_to = to; - if (!kernel_ip(ip)) { - int bytes, size = MAX_INSN_SIZE; - - bytes = copy_from_user_nmi(buf, (void __user *)to, size); - if (bytes != size) - return 0; - - kaddr = buf; - } else - kaddr = (void *)to; #ifdef CONFIG_X86_64 is_64bit = kernel_ip(to) || !test_thread_flag(TIF_IA32); #endif insn_init(&insn, kaddr, is_64bit); insn_get_length(&insn); + to += insn.length; + kaddr += insn.length; } while (to < ip); if (to == ip) { @@ -786,16 +824,34 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs) return 0; } +static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs) +{ + if (pebs->tsx_tuning) { + union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning }; + return tsx.cycles_last_block; + } + return 0; +} + +static inline u64 intel_hsw_transaction(struct pebs_record_hsw *pebs) +{ + u64 txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32; + + /* For RTM XABORTs also log the abort code from AX */ + if ((txn & PERF_TXN_TRANSACTION) && (pebs->ax & 1)) + txn |= ((pebs->ax >> 24) & 0xff) << PERF_TXN_ABORT_SHIFT; + return txn; +} + static void __intel_pmu_pebs_event(struct perf_event *event, struct pt_regs *iregs, void *__pebs) { /* - * We cast to pebs_record_nhm to get the load latency data - * if extra_reg MSR_PEBS_LD_LAT_THRESHOLD used + * We cast to the biggest pebs_record but are careful not to + * unconditionally access the 'extra' entries. */ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); - struct pebs_record_nhm *pebs = __pebs; - struct pebs_record_hsw *pebs_hsw = __pebs; + struct pebs_record_hsw *pebs = __pebs; struct perf_sample_data data; struct pt_regs regs; u64 sample_type; @@ -854,7 +910,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event, regs.sp = pebs->sp; if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) { - regs.ip = pebs_hsw->real_ip; + regs.ip = pebs->real_ip; regs.flags |= PERF_EFLAGS_EXACT; } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(®s)) regs.flags |= PERF_EFLAGS_EXACT; @@ -862,9 +918,18 @@ static void __intel_pmu_pebs_event(struct perf_event *event, regs.flags &= ~PERF_EFLAGS_EXACT; if ((event->attr.sample_type & PERF_SAMPLE_ADDR) && - x86_pmu.intel_cap.pebs_format >= 1) + x86_pmu.intel_cap.pebs_format >= 1) data.addr = pebs->dla; + if (x86_pmu.intel_cap.pebs_format >= 2) { + /* Only set the TSX weight when no memory weight. */ + if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) && !fll) + data.weight = intel_hsw_weight(pebs); + + if (event->attr.sample_type & PERF_SAMPLE_TRANSACTION) + data.txn = intel_hsw_transaction(pebs); + } + if (has_branch_stack(event)) data.br_stack = &cpuc->lbr_stack; @@ -913,17 +978,34 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs) __intel_pmu_pebs_event(event, iregs, at); } -static void __intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, void *at, - void *top) +static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) { struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); struct debug_store *ds = cpuc->ds; struct perf_event *event = NULL; + void *at, *top; u64 status = 0; int bit; + if (!x86_pmu.pebs_active) + return; + + at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base; + top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index; + ds->pebs_index = ds->pebs_buffer_base; + if (unlikely(at > top)) + return; + + /* + * Should not happen, we program the threshold at 1 and do not + * set a reset value. + */ + WARN_ONCE(top - at > x86_pmu.max_pebs_events * x86_pmu.pebs_record_size, + "Unexpected number of pebs records %ld\n", + (long)(top - at) / x86_pmu.pebs_record_size); + for (; at < top; at += x86_pmu.pebs_record_size) { struct pebs_record_nhm *p = at; @@ -951,61 +1033,6 @@ static void __intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, void *at, } } -static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs) -{ - struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); - struct debug_store *ds = cpuc->ds; - struct pebs_record_nhm *at, *top; - int n; - - if (!x86_pmu.pebs_active) - return; - - at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base; - top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index; - - ds->pebs_index = ds->pebs_buffer_base; - - n = top - at; - if (n <= 0) - return; - - /* - * Should not happen, we program the threshold at 1 and do not - * set a reset value. - */ - WARN_ONCE(n > x86_pmu.max_pebs_events, - "Unexpected number of pebs records %d\n", n); - - return __intel_pmu_drain_pebs_nhm(iregs, at, top); -} - -static void intel_pmu_drain_pebs_hsw(struct pt_regs *iregs) -{ - struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); - struct debug_store *ds = cpuc->ds; - struct pebs_record_hsw *at, *top; - int n; - - if (!x86_pmu.pebs_active) - return; - - at = (struct pebs_record_hsw *)(unsigned long)ds->pebs_buffer_base; - top = (struct pebs_record_hsw *)(unsigned long)ds->pebs_index; - - n = top - at; - if (n <= 0) - return; - /* - * Should not happen, we program the threshold at 1 and do not - * set a reset value. - */ - WARN_ONCE(n > x86_pmu.max_pebs_events, - "Unexpected number of pebs records %d\n", n); - - return __intel_pmu_drain_pebs_nhm(iregs, at, top); -} - /* * BTS, PEBS probe and setup */ @@ -1040,7 +1067,7 @@ void intel_ds_init(void) case 2: pr_cont("PEBS fmt2%c, ", pebs_type); x86_pmu.pebs_record_size = sizeof(struct pebs_record_hsw); - x86_pmu.drain_pebs = intel_pmu_drain_pebs_hsw; + x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm; break; default: diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d5be06a5005e..90ee6c1d0542 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -284,6 +284,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) int lbr_format = x86_pmu.intel_cap.lbr_format; u64 tos = intel_pmu_lbr_tos(); int i; + int out = 0; for (i = 0; i < x86_pmu.lbr_nr; i++) { unsigned long lbr_idx = (tos - i) & mask; @@ -306,15 +307,27 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) } from = (u64)((((s64)from) << skip) >> skip); - cpuc->lbr_entries[i].from = from; - cpuc->lbr_entries[i].to = to; - cpuc->lbr_entries[i].mispred = mis; - cpuc->lbr_entries[i].predicted = pred; - cpuc->lbr_entries[i].in_tx = in_tx; - cpuc->lbr_entries[i].abort = abort; - cpuc->lbr_entries[i].reserved = 0; + /* + * Some CPUs report duplicated abort records, + * with the second entry not having an abort bit set. + * Skip them here. This loop runs backwards, + * so we need to undo the previous record. + * If the abort just happened outside the window + * the extra entry cannot be removed. + */ + if (abort && x86_pmu.lbr_double_abort && out > 0) + out--; + + cpuc->lbr_entries[out].from = from; + cpuc->lbr_entries[out].to = to; + cpuc->lbr_entries[out].mispred = mis; + cpuc->lbr_entries[out].predicted = pred; + cpuc->lbr_entries[out].in_tx = in_tx; + cpuc->lbr_entries[out].abort = abort; + cpuc->lbr_entries[out].reserved = 0; + out++; } - cpuc->lbr_stack.nr = i; + cpuc->lbr_stack.nr = out; } void intel_pmu_lbr_read(void) diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c index 88db010845cb..384df5105fbc 100644 --- a/arch/x86/kernel/cpu/rdrand.c +++ b/arch/x86/kernel/cpu/rdrand.c @@ -31,20 +31,6 @@ static int __init x86_rdrand_setup(char *s) } __setup("nordrand", x86_rdrand_setup); -/* We can't use arch_get_random_long() here since alternatives haven't run */ -static inline int rdrand_long(unsigned long *v) -{ - int ok; - asm volatile("1: " RDRAND_LONG "\n\t" - "jc 2f\n\t" - "decl %0\n\t" - "jnz 1b\n\t" - "2:" - : "=r" (ok), "=a" (*v) - : "0" (RDRAND_RETRY_LOOPS)); - return ok; -} - /* * Force a reseed cycle; we are architecturally guaranteed a reseed * after no more than 512 128-bit chunks of random data. This also diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c index d15f575a861b..38ca398fb787 100644 --- a/arch/x86/kernel/early_printk.c +++ b/arch/x86/kernel/early_printk.c @@ -14,7 +14,7 @@ #include #include #include -#include +#include #include #include diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index f0dcb0ceb6a2..fd1bc1b15e6d 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -362,12 +362,9 @@ END(ret_from_exception) #ifdef CONFIG_PREEMPT ENTRY(resume_kernel) DISABLE_INTERRUPTS(CLBR_ANY) - cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? - jnz restore_all need_resched: - movl TI_flags(%ebp), %ecx # need_resched set ? - testb $_TIF_NEED_RESCHED, %cl - jz restore_all + cmpl $0,PER_CPU_VAR(__preempt_count) + jnz restore_all testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all call preempt_schedule_irq diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index b077f4cc225a..603be7c70675 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1103,10 +1103,8 @@ retint_signal: /* Returning to kernel space. Check if we need preemption */ /* rcx: threadinfo. interrupts off. */ ENTRY(retint_kernel) - cmpl $0,TI_preempt_count(%rcx) + cmpl $0,PER_CPU_VAR(__preempt_count) jnz retint_restore_args - bt $TIF_NEED_RESCHED,TI_flags(%rcx) - jnc retint_restore_args bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ jnc retint_restore_args call preempt_schedule_irq @@ -1342,7 +1340,7 @@ bad_gs: .previous /* Call softirq on interrupt stack. Interrupts are off. */ -ENTRY(call_softirq) +ENTRY(do_softirq_own_stack) CFI_STARTPROC pushq_cfi %rbp CFI_REL_OFFSET rbp,0 @@ -1359,7 +1357,7 @@ ENTRY(call_softirq) decl PER_CPU_VAR(irq_count) ret CFI_ENDPROC -END(call_softirq) +END(do_softirq_own_stack) #ifdef CONFIG_XEN zeroentry xen_hypervisor_callback xen_do_hypervisor_callback diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index 06f87bece92a..c61a14a4a310 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -35,8 +35,8 @@ asmlinkage void __init i386_start_kernel(void) /* Call the subarch specific early setup function */ switch (boot_params.hdr.hardware_subarch) { - case X86_SUBARCH_MRST: - x86_mrst_early_setup(); + case X86_SUBARCH_INTEL_MID: + x86_intel_mid_early_setup(); break; case X86_SUBARCH_CE4100: x86_ce4100_early_setup(); diff --git a/arch/x86/kernel/i386_ksyms_32.c b/arch/x86/kernel/i386_ksyms_32.c index 0fa69127209a..05fd74f537d6 100644 --- a/arch/x86/kernel/i386_ksyms_32.c +++ b/arch/x86/kernel/i386_ksyms_32.c @@ -37,3 +37,10 @@ EXPORT_SYMBOL(strstr); EXPORT_SYMBOL(csum_partial); EXPORT_SYMBOL(empty_zero_page); + +#ifdef CONFIG_PREEMPT +EXPORT_SYMBOL(___preempt_schedule); +#ifdef CONFIG_CONTEXT_TRACKING +EXPORT_SYMBOL(___preempt_schedule_context); +#endif +#endif diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c index 9a5c460404dc..2e977b5d61dd 100644 --- a/arch/x86/kernel/i8259.c +++ b/arch/x86/kernel/i8259.c @@ -312,8 +312,7 @@ static void init_8259A(int auto_eoi) */ outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ - /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 on x86-64, - to 0x20-0x27 on i386 */ + /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */ outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR); /* 8259A-1 (the master) has a slave on IR2 */ diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index 4186755f1d7c..d7fcbedc9c43 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -100,9 +100,6 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq) irqctx->tinfo.task = curctx->tinfo.task; irqctx->tinfo.previous_esp = current_stack_pointer; - /* Copy the preempt_count so that the [soft]irq checks work. */ - irqctx->tinfo.preempt_count = curctx->tinfo.preempt_count; - if (unlikely(overflow)) call_on_stack(print_stack_overflow, isp); @@ -131,7 +128,6 @@ void irq_ctx_init(int cpu) THREAD_SIZE_ORDER)); memset(&irqctx->tinfo, 0, sizeof(struct thread_info)); irqctx->tinfo.cpu = cpu; - irqctx->tinfo.preempt_count = HARDIRQ_OFFSET; irqctx->tinfo.addr_limit = MAKE_MM_SEG(0); per_cpu(hardirq_ctx, cpu) = irqctx; @@ -149,35 +145,21 @@ void irq_ctx_init(int cpu) cpu, per_cpu(hardirq_ctx, cpu), per_cpu(softirq_ctx, cpu)); } -asmlinkage void do_softirq(void) +void do_softirq_own_stack(void) { - unsigned long flags; struct thread_info *curctx; union irq_ctx *irqctx; u32 *isp; - if (in_interrupt()) - return; - - local_irq_save(flags); - - if (local_softirq_pending()) { - curctx = current_thread_info(); - irqctx = __this_cpu_read(softirq_ctx); - irqctx->tinfo.task = curctx->task; - irqctx->tinfo.previous_esp = current_stack_pointer; - - /* build the stack frame on the softirq stack */ - isp = (u32 *) ((char *)irqctx + sizeof(*irqctx)); + curctx = current_thread_info(); + irqctx = __this_cpu_read(softirq_ctx); + irqctx->tinfo.task = curctx->task; + irqctx->tinfo.previous_esp = current_stack_pointer; - call_on_stack(__do_softirq, isp); - /* - * Shouldn't happen, we returned above if in_interrupt(): - */ - WARN_ON_ONCE(softirq_count()); - } + /* build the stack frame on the softirq stack */ + isp = (u32 *) ((char *)irqctx + sizeof(*irqctx)); - local_irq_restore(flags); + call_on_stack(__do_softirq, isp); } bool handle_irq(unsigned irq, struct pt_regs *regs) diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c index d04d3ecded62..4d1c746892eb 100644 --- a/arch/x86/kernel/irq_64.c +++ b/arch/x86/kernel/irq_64.c @@ -87,24 +87,3 @@ bool handle_irq(unsigned irq, struct pt_regs *regs) generic_handle_irq_desc(irq, desc); return true; } - - -extern void call_softirq(void); - -asmlinkage void do_softirq(void) -{ - __u32 pending; - unsigned long flags; - - if (in_interrupt()) - return; - - local_irq_save(flags); - pending = local_softirq_pending(); - /* Switch to interrupt stack */ - if (pending) { - call_softirq(); - WARN_ON_ONCE(softirq_count()); - } - local_irq_restore(flags); -} diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 697b93af02dd..a0e2a8a80c94 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -775,11 +775,22 @@ void __init kvm_spinlock_init(void) if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) return; - printk(KERN_INFO "KVM setup paravirtual spinlock\n"); + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); + pv_lock_ops.unlock_kick = kvm_unlock_kick; +} + +static __init int kvm_spinlock_init_jump(void) +{ + if (!kvm_para_available()) + return 0; + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) + return 0; static_key_slow_inc(¶virt_ticketlocks_enabled); + printk(KERN_INFO "KVM setup paravirtual spinlock\n"); - pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); - pv_lock_ops.unlock_kick = kvm_unlock_kick; + return 0; } +early_initcall(kvm_spinlock_init_jump); + #endif /* CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/arch/x86/kernel/microcode_amd.c b/arch/x86/kernel/microcode_amd.c index 7123b5df479d..af99f71aeb7f 100644 --- a/arch/x86/kernel/microcode_amd.c +++ b/arch/x86/kernel/microcode_amd.c @@ -216,6 +216,7 @@ int apply_microcode_amd(int cpu) /* need to apply patch? */ if (rev >= mc_amd->hdr.patch_id) { c->microcode = rev; + uci->cpu_sig.rev = rev; return 0; } diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c index 88458faea2f8..05266b5aae22 100644 --- a/arch/x86/kernel/msr.c +++ b/arch/x86/kernel/msr.c @@ -46,7 +46,7 @@ static struct class *msr_class; static loff_t msr_seek(struct file *file, loff_t offset, int orig) { loff_t ret; - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); mutex_lock(&inode->i_mutex); switch (orig) { diff --git a/arch/x86/kernel/preempt.S b/arch/x86/kernel/preempt.S new file mode 100644 index 000000000000..ca7f0d58a87d --- /dev/null +++ b/arch/x86/kernel/preempt.S @@ -0,0 +1,25 @@ + +#include +#include +#include +#include + +ENTRY(___preempt_schedule) + CFI_STARTPROC + SAVE_ALL + call preempt_schedule + RESTORE_ALL + ret + CFI_ENDPROC + +#ifdef CONFIG_CONTEXT_TRACKING + +ENTRY(___preempt_schedule_context) + CFI_STARTPROC + SAVE_ALL + call preempt_schedule_context + RESTORE_ALL + ret + CFI_ENDPROC + +#endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c83516be1052..3fb8d95ab8b5 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -391,9 +391,9 @@ static void amd_e400_idle(void) * The switch back from broadcast mode needs to be * called with interrupts disabled. */ - local_irq_disable(); - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); - local_irq_enable(); + local_irq_disable(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); + local_irq_enable(); } else default_idle(); } diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 884f98f69354..c2ec1aa6d454 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -291,6 +291,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) set_iopl_mask(next->iopl); + /* + * If it were not for PREEMPT_ACTIVE we could guarantee that the + * preempt_count of all tasks was equal here and this would not be + * needed. + */ + task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count); + this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count); + /* * Now maybe handle debug registers and/or IO bitmaps */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index bb1dc51bab05..45ab4d6fc8a7 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -363,6 +363,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(old_rsp, next->usersp); this_cpu_write(current_task, next_p); + /* + * If it were not for PREEMPT_ACTIVE we could guarantee that the + * preempt_count of all tasks was equal here and this would not be + * needed. + */ + task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count); + this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count); + this_cpu_write(kernel_stack, (unsigned long)task_stack_page(next_p) + THREAD_SIZE - KERNEL_STACK_OFFSET); diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 76925208f50b..6bb3f95037c6 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -325,6 +325,14 @@ static struct dmi_system_id __initdata reboot_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "C6100"), }, }, + { /* Handle problems with rebooting on the Latitude E5410. */ + .callback = set_pci_reboot, + .ident = "Dell Latitude E5410", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Latitude E5410"), + }, + }, { /* Handle problems with rebooting on the Precision M6600. */ .callback = set_pci_reboot, .ident = "Dell Precision M6600", diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c index 0aa29394ed6f..e35cb18b8a00 100644 --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -12,7 +12,7 @@ #include #include #include -#include +#include #include #ifdef CONFIG_X86_32 @@ -189,7 +189,7 @@ static __init int add_rtc_cmos(void) return 0; /* Intel MID platforms don't have ioport rtc */ - if (mrst_identify_cpu()) + if (intel_mid_identify_cpu()) return -ENODEV; platform_device_register(&rtc_device); diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f0de6294b955..1708862fc40d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -823,6 +823,20 @@ static void __init trim_low_memory_range(void) memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } +/* + * Dump out kernel offset information on panic. + */ +static int +dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p) +{ + pr_emerg("Kernel Offset: 0x%lx from 0x%lx " + "(relocation range: 0x%lx-0x%lx)\n", + (unsigned long)&_text - __START_KERNEL, __START_KERNEL, + __START_KERNEL_map, MODULES_VADDR-1); + + return 0; +} + /* * Determine if we were loaded by an EFI loader. If so, then we have also been * passed the efi memmap, systab, etc., so we should use these data structures @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void) } #endif /* CONFIG_X86_32 */ + +static struct notifier_block kernel_offset_notifier = { + .notifier_call = dump_kernel_offset +}; + +static int __init register_kernel_offset_dumper(void) +{ + atomic_notifier_chain_register(&panic_notifier_list, + &kernel_offset_notifier); + return 0; +} +__initcall(register_kernel_offset_dumper); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 6cacab671f9b..2a165580fa16 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -73,11 +73,10 @@ #include #include #include - #include #include - #include +#include /* State of each CPU */ DEFINE_PER_CPU(int, cpu_state) = { 0 }; @@ -648,22 +647,46 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) return (send_status | accept_status); } +void smp_announce(void) +{ + int num_nodes = num_online_nodes(); + + printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n", + num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); +} + /* reduce the number of lines printed when booting a large cpu count system */ static void announce_cpu(int cpu, int apicid) { static int current_node = -1; int node = early_cpu_to_node(cpu); - int max_cpu_present = find_last_bit(cpumask_bits(cpu_present_mask), NR_CPUS); + static int width, node_width; + + if (!width) + width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */ + + if (!node_width) + node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ + + if (cpu == 1) + printk(KERN_INFO "x86: Booting SMP configuration:\n"); if (system_state == SYSTEM_BOOTING) { if (node != current_node) { if (current_node > (-1)) - pr_cont(" OK\n"); + pr_cont("\n"); current_node = node; - pr_info("Booting Node %3d, Processors ", node); + + printk(KERN_INFO ".... node %*s#%d, CPUs: ", + node_width - num_digits(node), " ", node); } - pr_cont(" #%4d%s", cpu, cpu == max_cpu_present ? " OK\n" : ""); - return; + + /* Add padding for the BSP */ + if (cpu == 1) + pr_cont("%*s", width + 1, " "); + + pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu); + } else pr_info("Booting Node %d Processor %d APIC 0x%x\n", node, cpu, apicid); diff --git a/arch/x86/kernel/sysfb_simplefb.c b/arch/x86/kernel/sysfb_simplefb.c index 22513e96b012..86179d409893 100644 --- a/arch/x86/kernel/sysfb_simplefb.c +++ b/arch/x86/kernel/sysfb_simplefb.c @@ -72,14 +72,14 @@ __init int create_simplefb(const struct screen_info *si, * the part that is occupied by the framebuffer */ len = mode->height * mode->stride; len = PAGE_ALIGN(len); - if (len > si->lfb_size << 16) { + if (len > (u64)si->lfb_size << 16) { printk(KERN_WARNING "sysfb: VRAM smaller than advertised\n"); return -EINVAL; } /* setup IORESOURCE_MEM as framebuffer memory */ memset(&res, 0, sizeof(res)); - res.flags = IORESOURCE_MEM; + res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; res.name = simplefb_resname; res.start = si->lfb_base; res.end = si->lfb_base + len - 1; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 8c8093b146ca..729aa779ff75 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -88,7 +88,7 @@ static inline void conditional_sti(struct pt_regs *regs) static inline void preempt_conditional_sti(struct pt_regs *regs) { - inc_preempt_count(); + preempt_count_inc(); if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); } @@ -103,7 +103,7 @@ static inline void preempt_conditional_cli(struct pt_regs *regs) { if (regs->flags & X86_EFLAGS_IF) local_irq_disable(); - dec_preempt_count(); + preempt_count_dec(); } static int __kprobes diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 10c4f3006afd..da6b35a98260 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -199,6 +199,15 @@ SECTIONS __x86_cpu_dev_end = .; } +#ifdef CONFIG_X86_INTEL_MID + .x86_intel_mid_dev.init : AT(ADDR(.x86_intel_mid_dev.init) - \ + LOAD_OFFSET) { + __x86_intel_mid_dev_start = .; + *(.x86_intel_mid_dev.init) + __x86_intel_mid_dev_end = .; + } +#endif + /* * start address and size of operations which during runtime * can be patched with virtualization friendly instructions or diff --git a/arch/x86/kernel/x8664_ksyms_64.c b/arch/x86/kernel/x8664_ksyms_64.c index b014d9414d08..040681928e9d 100644 --- a/arch/x86/kernel/x8664_ksyms_64.c +++ b/arch/x86/kernel/x8664_ksyms_64.c @@ -66,3 +66,10 @@ EXPORT_SYMBOL(empty_zero_page); #ifndef CONFIG_PARAVIRT EXPORT_SYMBOL(native_load_gs_index); #endif + +#ifdef CONFIG_PREEMPT +EXPORT_SYMBOL(___preempt_schedule); +#ifdef CONFIG_CONTEXT_TRACKING +EXPORT_SYMBOL(___preempt_schedule_context); +#endif +#endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a1216de9ffda..2b2fce1b2009 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3255,25 +3255,29 @@ static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu) static void ept_load_pdptrs(struct kvm_vcpu *vcpu) { + struct kvm_mmu *mmu = vcpu->arch.walk_mmu; + if (!test_bit(VCPU_EXREG_PDPTR, (unsigned long *)&vcpu->arch.regs_dirty)) return; if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) { - vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]); - vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]); - vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]); - vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]); + vmcs_write64(GUEST_PDPTR0, mmu->pdptrs[0]); + vmcs_write64(GUEST_PDPTR1, mmu->pdptrs[1]); + vmcs_write64(GUEST_PDPTR2, mmu->pdptrs[2]); + vmcs_write64(GUEST_PDPTR3, mmu->pdptrs[3]); } } static void ept_save_pdptrs(struct kvm_vcpu *vcpu) { + struct kvm_mmu *mmu = vcpu->arch.walk_mmu; + if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) { - vcpu->arch.mmu.pdptrs[0] = vmcs_read64(GUEST_PDPTR0); - vcpu->arch.mmu.pdptrs[1] = vmcs_read64(GUEST_PDPTR1); - vcpu->arch.mmu.pdptrs[2] = vmcs_read64(GUEST_PDPTR2); - vcpu->arch.mmu.pdptrs[3] = vmcs_read64(GUEST_PDPTR3); + mmu->pdptrs[0] = vmcs_read64(GUEST_PDPTR0); + mmu->pdptrs[1] = vmcs_read64(GUEST_PDPTR1); + mmu->pdptrs[2] = vmcs_read64(GUEST_PDPTR2); + mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3); } __set_bit(VCPU_EXREG_PDPTR, @@ -5345,7 +5349,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) * There are errata that may cause this bit to not be set: * AAK134, BY25. */ - if (exit_qualification & INTR_INFO_UNBLOCK_NMI) + if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && + cpu_has_virtual_nmis() && + (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); @@ -7775,10 +7781,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1); vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2); vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3); - __clear_bit(VCPU_EXREG_PDPTR, - (unsigned long *)&vcpu->arch.regs_avail); - __clear_bit(VCPU_EXREG_PDPTR, - (unsigned long *)&vcpu->arch.regs_dirty); } kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp); diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 96b2c6697c9d..992d63bb154f 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -16,7 +16,7 @@ clean-files := inat-tables.c obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o -lib-y := delay.o +lib-y := delay.o misc.o lib-y += thunk_$(BITS).o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c new file mode 100644 index 000000000000..76b373af03f0 --- /dev/null +++ b/arch/x86/lib/misc.c @@ -0,0 +1,21 @@ +/* + * Count the digits of @val including a possible sign. + * + * (Typed on and submitted from hpa's mobile phone.) + */ +int num_digits(int val) +{ + int m = 10; + int d = 1; + + if (val < 0) { + d++; + val = -val; + } + + while (val >= m) { + m *= 10; + d++; + } + return d; +} diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 04664cdb7fda..ce32017c5e38 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -399,8 +399,25 @@ static unsigned long __init init_range_memory_mapping( return mapped_ram_size; } -/* (PUD_SHIFT-PMD_SHIFT)/2 */ -#define STEP_SIZE_SHIFT 5 +static unsigned long __init get_new_step_size(unsigned long step_size) +{ + /* + * Explain why we shift by 5 and why we don't have to worry about + * 'step_size << 5' overflowing: + * + * initial mapped size is PMD_SIZE (2M). + * We can not set step_size to be PUD_SIZE (1G) yet. + * In worse case, when we cross the 1G boundary, and + * PG_LEVEL_2M is not set, we will need 1+1+512 pages (2M + 8k) + * to map 1G range with PTE. Use 5 as shift for now. + * + * Don't need to worry about overflow, on 32bit, when step_size + * is 0, round_down() returns 0 for start, and that turns it + * into 0x100000000ULL. + */ + return step_size << 5; +} + void __init init_mem_mapping(void) { unsigned long end, real_end, start, last_start; @@ -449,7 +466,7 @@ void __init init_mem_mapping(void) min_pfn_mapped = last_start >> PAGE_SHIFT; /* only increase step_size after big range get mapped */ if (new_mapped_ram_size > mapped_ram_size) - step_size <<= STEP_SIZE_SHIFT; + step_size = get_new_step_size(step_size); mapped_ram_size += new_mapped_ram_size; } diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 4287f1ffba7e..5bdc5430597c 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -806,6 +806,9 @@ void __init mem_init(void) BUILD_BUG_ON(VMALLOC_START >= VMALLOC_END); #undef high_memory #undef __FIXADDR_TOP +#ifdef CONFIG_RANDOMIZE_BASE + BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE); +#endif #ifdef CONFIG_HIGHMEM BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE > FIXADDR_START); diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile index ee0af58ca5bd..e063eed0f912 100644 --- a/arch/x86/pci/Makefile +++ b/arch/x86/pci/Makefile @@ -18,7 +18,7 @@ obj-$(CONFIG_X86_VISWS) += visws.o obj-$(CONFIG_X86_NUMAQ) += numaq_32.o obj-$(CONFIG_X86_NUMACHIP) += numachip.o -obj-$(CONFIG_X86_INTEL_MID) += mrst.o +obj-$(CONFIG_X86_INTEL_MID) += intel_mid_pci.o obj-y += common.o early.o obj-y += bus_numa.o diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/intel_mid_pci.c similarity index 96% rename from arch/x86/pci/mrst.c rename to arch/x86/pci/intel_mid_pci.c index 903fded50786..51384ca727ad 100644 --- a/arch/x86/pci/mrst.c +++ b/arch/x86/pci/intel_mid_pci.c @@ -1,5 +1,5 @@ /* - * Moorestown PCI support + * Intel MID PCI support * Copyright (c) 2008 Intel Corporation * Jesse Barnes * @@ -150,12 +150,12 @@ static bool type1_access_ok(unsigned int bus, unsigned int devfn, int reg) * shim. Therefore, use the header type in shim instead. */ if (reg >= 0x100 || reg == PCI_STATUS || reg == PCI_HEADER_TYPE) - return 0; + return false; if (bus == 0 && (devfn == PCI_DEVFN(2, 0) || devfn == PCI_DEVFN(0, 0) || devfn == PCI_DEVFN(3, 0))) - return 1; - return 0; /* Langwell on others */ + return true; + return false; /* Langwell on others */ } static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, @@ -205,7 +205,7 @@ static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, where, size, value); } -static int mrst_pci_irq_enable(struct pci_dev *dev) +static int intel_mid_pci_irq_enable(struct pci_dev *dev) { u8 pin; struct io_apic_irq_attr irq_attr; @@ -225,23 +225,23 @@ static int mrst_pci_irq_enable(struct pci_dev *dev) return 0; } -struct pci_ops pci_mrst_ops = { +struct pci_ops intel_mid_pci_ops = { .read = pci_read, .write = pci_write, }; /** - * pci_mrst_init - installs pci_mrst_ops + * intel_mid_pci_init - installs intel_mid_pci_ops * * Moorestown has an interesting PCI implementation (see above). * Called when the early platform detection installs it. */ -int __init pci_mrst_init(void) +int __init intel_mid_pci_init(void) { pr_info("Intel MID platform detected, using MID PCI ops\n"); pci_mmcfg_late_init(); - pcibios_enable_irq = mrst_pci_irq_enable; - pci_root_ops = pci_mrst_ops; + pcibios_enable_irq = intel_mid_pci_irq_enable; + pci_root_ops = intel_mid_pci_ops; pci_soc_mode = 1; /* Continue with standard init */ return 1; diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 5596c7bdd327..082e88129712 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -700,7 +700,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) return -ENODEV; - if (start > end || !addr) + if (start > end) return -EINVAL; mutex_lock(&pci_mmcfg_lock); @@ -716,6 +716,11 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, return -EEXIST; } + if (!addr) { + mutex_unlock(&pci_mmcfg_lock); + return -EINVAL; + } + rc = -EBUSY; cfg = pci_mmconfig_alloc(seg, start, end, addr); if (cfg == NULL) { diff --git a/arch/x86/platform/Makefile b/arch/x86/platform/Makefile index 01e0231a113e..20342d4c82ce 100644 --- a/arch/x86/platform/Makefile +++ b/arch/x86/platform/Makefile @@ -4,7 +4,7 @@ obj-y += efi/ obj-y += geode/ obj-y += goldfish/ obj-y += iris/ -obj-y += mrst/ +obj-y += intel-mid/ obj-y += olpc/ obj-y += scx200/ obj-y += sfi/ diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index c7e22ab29a5a..92c02344a060 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -60,19 +60,6 @@ static efi_char16_t efi_dummy_name[6] = { 'D', 'U', 'M', 'M', 'Y', 0 }; -struct efi __read_mostly efi = { - .mps = EFI_INVALID_TABLE_ADDR, - .acpi = EFI_INVALID_TABLE_ADDR, - .acpi20 = EFI_INVALID_TABLE_ADDR, - .smbios = EFI_INVALID_TABLE_ADDR, - .sal_systab = EFI_INVALID_TABLE_ADDR, - .boot_info = EFI_INVALID_TABLE_ADDR, - .hcdp = EFI_INVALID_TABLE_ADDR, - .uga = EFI_INVALID_TABLE_ADDR, - .uv_systab = EFI_INVALID_TABLE_ADDR, -}; -EXPORT_SYMBOL(efi); - struct efi_memory_map memmap; static struct efi efi_phys __initdata; @@ -80,6 +67,13 @@ static efi_system_table_t efi_systab __initdata; unsigned long x86_efi_facility; +static __initdata efi_config_table_type_t arch_tables[] = { +#ifdef CONFIG_X86_UV + {UV_SYSTEM_TABLE_GUID, "UVsystab", &efi.uv_systab}, +#endif + {NULL_GUID, NULL, NULL}, +}; + /* * Returns 1 if 'facility' is enabled, 0 otherwise. */ @@ -399,6 +393,8 @@ int __init efi_memblock_x86_reserve_range(void) memblock_reserve(pmap, memmap.nr_map * memmap.desc_size); + efi.memmap = &memmap; + return 0; } @@ -578,80 +574,6 @@ static int __init efi_systab_init(void *phys) return 0; } -static int __init efi_config_init(u64 tables, int nr_tables) -{ - void *config_tables, *tablep; - int i, sz; - - if (efi_enabled(EFI_64BIT)) - sz = sizeof(efi_config_table_64_t); - else - sz = sizeof(efi_config_table_32_t); - - /* - * Let's see what config tables the firmware passed to us. - */ - config_tables = early_ioremap(tables, nr_tables * sz); - if (config_tables == NULL) { - pr_err("Could not map Configuration table!\n"); - return -ENOMEM; - } - - tablep = config_tables; - pr_info(""); - for (i = 0; i < efi.systab->nr_tables; i++) { - efi_guid_t guid; - unsigned long table; - - if (efi_enabled(EFI_64BIT)) { - u64 table64; - guid = ((efi_config_table_64_t *)tablep)->guid; - table64 = ((efi_config_table_64_t *)tablep)->table; - table = table64; -#ifdef CONFIG_X86_32 - if (table64 >> 32) { - pr_cont("\n"); - pr_err("Table located above 4GB, disabling EFI.\n"); - early_iounmap(config_tables, - efi.systab->nr_tables * sz); - return -EINVAL; - } -#endif - } else { - guid = ((efi_config_table_32_t *)tablep)->guid; - table = ((efi_config_table_32_t *)tablep)->table; - } - if (!efi_guidcmp(guid, MPS_TABLE_GUID)) { - efi.mps = table; - pr_cont(" MPS=0x%lx ", table); - } else if (!efi_guidcmp(guid, ACPI_20_TABLE_GUID)) { - efi.acpi20 = table; - pr_cont(" ACPI 2.0=0x%lx ", table); - } else if (!efi_guidcmp(guid, ACPI_TABLE_GUID)) { - efi.acpi = table; - pr_cont(" ACPI=0x%lx ", table); - } else if (!efi_guidcmp(guid, SMBIOS_TABLE_GUID)) { - efi.smbios = table; - pr_cont(" SMBIOS=0x%lx ", table); -#ifdef CONFIG_X86_UV - } else if (!efi_guidcmp(guid, UV_SYSTEM_TABLE_GUID)) { - efi.uv_systab = table; - pr_cont(" UVsystab=0x%lx ", table); -#endif - } else if (!efi_guidcmp(guid, HCDP_TABLE_GUID)) { - efi.hcdp = table; - pr_cont(" HCDP=0x%lx ", table); - } else if (!efi_guidcmp(guid, UGA_IO_PROTOCOL_GUID)) { - efi.uga = table; - pr_cont(" UGA=0x%lx ", table); - } - tablep += sz; - } - pr_cont("\n"); - early_iounmap(config_tables, efi.systab->nr_tables * sz); - return 0; -} - static int __init efi_runtime_init(void) { efi_runtime_services_t *runtime; @@ -745,7 +667,7 @@ void __init efi_init(void) efi.systab->hdr.revision >> 16, efi.systab->hdr.revision & 0xffff, vendor); - if (efi_config_init(efi.systab->tables, efi.systab->nr_tables)) + if (efi_config_init(arch_tables)) return; set_bit(EFI_CONFIG_TABLES, &x86_efi_facility); @@ -816,34 +738,6 @@ static void __init runtime_code_page_mkexec(void) } } -/* - * We can't ioremap data in EFI boot services RAM, because we've already mapped - * it as RAM. So, look it up in the existing EFI memory map instead. Only - * callable after efi_enter_virtual_mode and before efi_free_boot_services. - */ -void __iomem *efi_lookup_mapped_addr(u64 phys_addr) -{ - void *p; - if (WARN_ON(!memmap.map)) - return NULL; - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { - efi_memory_desc_t *md = p; - u64 size = md->num_pages << EFI_PAGE_SHIFT; - u64 end = md->phys_addr + size; - if (!(md->attribute & EFI_MEMORY_RUNTIME) && - md->type != EFI_BOOT_SERVICES_CODE && - md->type != EFI_BOOT_SERVICES_DATA) - continue; - if (!md->virt_addr) - continue; - if (phys_addr >= md->phys_addr && phys_addr < end) { - phys_addr += md->virt_addr - md->phys_addr; - return (__force void __iomem *)(unsigned long)phys_addr; - } - } - return NULL; -} - void efi_memory_uc(u64 addr, unsigned long size) { unsigned long page_shift = 1UL << EFI_PAGE_SHIFT; diff --git a/arch/x86/platform/geode/alix.c b/arch/x86/platform/geode/alix.c index 90e23e7679a5..76b6632d3143 100644 --- a/arch/x86/platform/geode/alix.c +++ b/arch/x86/platform/geode/alix.c @@ -98,7 +98,7 @@ static struct platform_device alix_leds_dev = { .dev.platform_data = &alix_leds_data, }; -static struct __initdata platform_device *alix_devs[] = { +static struct platform_device *alix_devs[] __initdata = { &alix_buttons_dev, &alix_leds_dev, }; diff --git a/arch/x86/platform/geode/geos.c b/arch/x86/platform/geode/geos.c index c2e6d53558be..aa733fba2471 100644 --- a/arch/x86/platform/geode/geos.c +++ b/arch/x86/platform/geode/geos.c @@ -87,7 +87,7 @@ static struct platform_device geos_leds_dev = { .dev.platform_data = &geos_leds_data, }; -static struct __initdata platform_device *geos_devs[] = { +static struct platform_device *geos_devs[] __initdata = { &geos_buttons_dev, &geos_leds_dev, }; diff --git a/arch/x86/platform/geode/net5501.c b/arch/x86/platform/geode/net5501.c index 646e3b5b4bb6..927e38c0089f 100644 --- a/arch/x86/platform/geode/net5501.c +++ b/arch/x86/platform/geode/net5501.c @@ -78,7 +78,7 @@ static struct platform_device net5501_leds_dev = { .dev.platform_data = &net5501_leds_data, }; -static struct __initdata platform_device *net5501_devs[] = { +static struct platform_device *net5501_devs[] __initdata = { &net5501_buttons_dev, &net5501_leds_dev, }; diff --git a/arch/x86/platform/intel-mid/Makefile b/arch/x86/platform/intel-mid/Makefile new file mode 100644 index 000000000000..01cc29ea5ff7 --- /dev/null +++ b/arch/x86/platform/intel-mid/Makefile @@ -0,0 +1,7 @@ +obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o +obj-$(CONFIG_X86_INTEL_MID) += intel_mid_vrtc.o +obj-$(CONFIG_EARLY_PRINTK_INTEL_MID) += early_printk_intel_mid.o +# SFI specific code +ifdef CONFIG_X86_INTEL_MID +obj-$(CONFIG_SFI) += sfi.o device_libs/ +endif diff --git a/arch/x86/platform/intel-mid/device_libs/Makefile b/arch/x86/platform/intel-mid/device_libs/Makefile new file mode 100644 index 000000000000..097e7a7940d8 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/Makefile @@ -0,0 +1,22 @@ +# IPC Devices +obj-y += platform_ipc.o +obj-$(subst m,y,$(CONFIG_MFD_INTEL_MSIC)) += platform_msic.o +obj-$(subst m,y,$(CONFIG_SND_MFLD_MACHINE)) += platform_msic_audio.o +obj-$(subst m,y,$(CONFIG_GPIO_MSIC)) += platform_msic_gpio.o +obj-$(subst m,y,$(CONFIG_MFD_INTEL_MSIC)) += platform_msic_ocd.o +obj-$(subst m,y,$(CONFIG_MFD_INTEL_MSIC)) += platform_msic_battery.o +obj-$(subst m,y,$(CONFIG_INTEL_MID_POWER_BUTTON)) += platform_msic_power_btn.o +obj-$(subst m,y,$(CONFIG_GPIO_INTEL_PMIC)) += platform_pmic_gpio.o +obj-$(subst m,y,$(CONFIG_INTEL_MFLD_THERMAL)) += platform_msic_thermal.o +# I2C Devices +obj-$(subst m,y,$(CONFIG_SENSORS_EMC1403)) += platform_emc1403.o +obj-$(subst m,y,$(CONFIG_SENSORS_LIS3LV02D)) += platform_lis331.o +obj-$(subst m,y,$(CONFIG_GPIO_PCA953X)) += platform_max7315.o +obj-$(subst m,y,$(CONFIG_INPUT_MPU3050)) += platform_mpu3050.o +obj-$(subst m,y,$(CONFIG_INPUT_BMA150)) += platform_bma023.o +obj-$(subst m,y,$(CONFIG_GPIO_PCA953X)) += platform_tca6416.o +obj-$(subst m,y,$(CONFIG_DRM_MEDFIELD)) += platform_tc35876x.o +# SPI Devices +obj-$(subst m,y,$(CONFIG_SERIAL_MRST_MAX3110)) += platform_max3111.o +# MISC Devices +obj-$(subst m,y,$(CONFIG_KEYBOARD_GPIO)) += platform_gpio_keys.o diff --git a/arch/x86/platform/intel-mid/device_libs/platform_bma023.c b/arch/x86/platform/intel-mid/device_libs/platform_bma023.c new file mode 100644 index 000000000000..0ae7f2ae2296 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_bma023.c @@ -0,0 +1,20 @@ +/* + * platform_bma023.c: bma023 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include + +static const struct devs_id bma023_dev_id __initconst = { + .name = "bma023", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, +}; + +sfi_device(bma023_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_emc1403.c b/arch/x86/platform/intel-mid/device_libs/platform_emc1403.c new file mode 100644 index 000000000000..0d942c1d26d5 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_emc1403.c @@ -0,0 +1,41 @@ +/* + * platform_emc1403.c: emc1403 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include + +static void __init *emc1403_platform_data(void *info) +{ + static short intr2nd_pdata; + struct i2c_board_info *i2c_info = info; + int intr = get_gpio_by_name("thermal_int"); + int intr2nd = get_gpio_by_name("thermal_alert"); + + if (intr == -1 || intr2nd == -1) + return NULL; + + i2c_info->irq = intr + INTEL_MID_IRQ_OFFSET; + intr2nd_pdata = intr2nd + INTEL_MID_IRQ_OFFSET; + + return &intr2nd_pdata; +} + +static const struct devs_id emc1403_dev_id __initconst = { + .name = "emc1403", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, + .get_platform_data = &emc1403_platform_data, +}; + +sfi_device(emc1403_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_gpio_keys.c b/arch/x86/platform/intel-mid/device_libs/platform_gpio_keys.c new file mode 100644 index 000000000000..a013a4834bbe --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_gpio_keys.c @@ -0,0 +1,83 @@ +/* + * platform_gpio_keys.c: gpio_keys platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DEVICE_NAME "gpio-keys" + +/* + * we will search these buttons in SFI GPIO table (by name) + * and register them dynamically. Please add all possible + * buttons here, we will shrink them if no GPIO found. + */ +static struct gpio_keys_button gpio_button[] = { + {KEY_POWER, -1, 1, "power_btn", EV_KEY, 0, 3000}, + {KEY_PROG1, -1, 1, "prog_btn1", EV_KEY, 0, 20}, + {KEY_PROG2, -1, 1, "prog_btn2", EV_KEY, 0, 20}, + {SW_LID, -1, 1, "lid_switch", EV_SW, 0, 20}, + {KEY_VOLUMEUP, -1, 1, "vol_up", EV_KEY, 0, 20}, + {KEY_VOLUMEDOWN, -1, 1, "vol_down", EV_KEY, 0, 20}, + {KEY_CAMERA, -1, 1, "camera_full", EV_KEY, 0, 20}, + {KEY_CAMERA_FOCUS, -1, 1, "camera_half", EV_KEY, 0, 20}, + {SW_KEYPAD_SLIDE, -1, 1, "MagSw1", EV_SW, 0, 20}, + {SW_KEYPAD_SLIDE, -1, 1, "MagSw2", EV_SW, 0, 20}, +}; + +static struct gpio_keys_platform_data gpio_keys = { + .buttons = gpio_button, + .rep = 1, + .nbuttons = -1, /* will fill it after search */ +}; + +static struct platform_device pb_device = { + .name = DEVICE_NAME, + .id = -1, + .dev = { + .platform_data = &gpio_keys, + }, +}; + +/* + * Shrink the non-existent buttons, register the gpio button + * device if there is some + */ +static int __init pb_keys_init(void) +{ + struct gpio_keys_button *gb = gpio_button; + int i, num, good = 0; + + num = sizeof(gpio_button) / sizeof(struct gpio_keys_button); + for (i = 0; i < num; i++) { + gb[i].gpio = get_gpio_by_name(gb[i].desc); + pr_debug("info[%2d]: name = %s, gpio = %d\n", i, gb[i].desc, + gb[i].gpio); + if (gb[i].gpio == -1) + continue; + + if (i != good) + gb[good] = gb[i]; + good++; + } + + if (good) { + gpio_keys.nbuttons = good; + return platform_device_register(&pb_device); + } + return 0; +} +late_initcall(pb_keys_init); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_ipc.c b/arch/x86/platform/intel-mid/device_libs/platform_ipc.c new file mode 100644 index 000000000000..a84b73d6c4a0 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_ipc.c @@ -0,0 +1,68 @@ +/* + * platform_ipc.c: IPC platform library file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include "platform_ipc.h" + +void __init ipc_device_handler(struct sfi_device_table_entry *pentry, + struct devs_id *dev) +{ + struct platform_device *pdev; + void *pdata = NULL; + static struct resource res __initdata = { + .name = "IRQ", + .flags = IORESOURCE_IRQ, + }; + + pr_debug("IPC bus, name = %16.16s, irq = 0x%2x\n", + pentry->name, pentry->irq); + + /* + * We need to call platform init of IPC devices to fill misc_pdata + * structure. It will be used in msic_init for initialization. + */ + if (dev != NULL) + pdata = dev->get_platform_data(pentry); + + /* + * On Medfield the platform device creation is handled by the MSIC + * MFD driver so we don't need to do it here. + */ + if (intel_mid_has_msic()) + return; + + pdev = platform_device_alloc(pentry->name, 0); + if (pdev == NULL) { + pr_err("out of memory for SFI platform device '%s'.\n", + pentry->name); + return; + } + res.start = pentry->irq; + platform_device_add_resources(pdev, &res, 1); + + pdev->dev.platform_data = pdata; + intel_scu_device_register(pdev); +} + +static const struct devs_id pmic_audio_dev_id __initconst = { + .name = "pmic_audio", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .device_handler = &ipc_device_handler, +}; + +sfi_device(pmic_audio_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_ipc.h b/arch/x86/platform/intel-mid/device_libs/platform_ipc.h new file mode 100644 index 000000000000..8f568dd79605 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_ipc.h @@ -0,0 +1,17 @@ +/* + * platform_ipc.h: IPC platform library header file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ +#ifndef _PLATFORM_IPC_H_ +#define _PLATFORM_IPC_H_ + +extern void __init ipc_device_handler(struct sfi_device_table_entry *pentry, + struct devs_id *dev) __attribute__((weak)); +#endif diff --git a/arch/x86/platform/intel-mid/device_libs/platform_lis331.c b/arch/x86/platform/intel-mid/device_libs/platform_lis331.c new file mode 100644 index 000000000000..15278c11f714 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_lis331.c @@ -0,0 +1,39 @@ +/* + * platform_lis331.c: lis331 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include + +static void __init *lis331dl_platform_data(void *info) +{ + static short intr2nd_pdata; + struct i2c_board_info *i2c_info = info; + int intr = get_gpio_by_name("accel_int"); + int intr2nd = get_gpio_by_name("accel_2"); + + if (intr == -1 || intr2nd == -1) + return NULL; + + i2c_info->irq = intr + INTEL_MID_IRQ_OFFSET; + intr2nd_pdata = intr2nd + INTEL_MID_IRQ_OFFSET; + + return &intr2nd_pdata; +} + +static const struct devs_id lis331dl_dev_id __initconst = { + .name = "i2c_accel", + .type = SFI_DEV_TYPE_I2C, + .get_platform_data = &lis331dl_platform_data, +}; + +sfi_device(lis331dl_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_max3111.c b/arch/x86/platform/intel-mid/device_libs/platform_max3111.c new file mode 100644 index 000000000000..afd1df94e0e5 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_max3111.c @@ -0,0 +1,35 @@ +/* + * platform_max3111.c: max3111 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include + +static void __init *max3111_platform_data(void *info) +{ + struct spi_board_info *spi_info = info; + int intr = get_gpio_by_name("max3111_int"); + + spi_info->mode = SPI_MODE_0; + if (intr == -1) + return NULL; + spi_info->irq = intr + INTEL_MID_IRQ_OFFSET; + return NULL; +} + +static const struct devs_id max3111_dev_id __initconst = { + .name = "spi_max3111", + .type = SFI_DEV_TYPE_SPI, + .get_platform_data = &max3111_platform_data, +}; + +sfi_device(max3111_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_max7315.c b/arch/x86/platform/intel-mid/device_libs/platform_max7315.c new file mode 100644 index 000000000000..94ade10024ae --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_max7315.c @@ -0,0 +1,79 @@ +/* + * platform_max7315.c: max7315 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include + +#define MAX7315_NUM 2 + +static void __init *max7315_platform_data(void *info) +{ + static struct pca953x_platform_data max7315_pdata[MAX7315_NUM]; + static int nr; + struct pca953x_platform_data *max7315 = &max7315_pdata[nr]; + struct i2c_board_info *i2c_info = info; + int gpio_base, intr; + char base_pin_name[SFI_NAME_LEN + 1]; + char intr_pin_name[SFI_NAME_LEN + 1]; + + if (nr == MAX7315_NUM) { + pr_err("too many max7315s, we only support %d\n", + MAX7315_NUM); + return NULL; + } + /* we have several max7315 on the board, we only need load several + * instances of the same pca953x driver to cover them + */ + strcpy(i2c_info->type, "max7315"); + if (nr++) { + sprintf(base_pin_name, "max7315_%d_base", nr); + sprintf(intr_pin_name, "max7315_%d_int", nr); + } else { + strcpy(base_pin_name, "max7315_base"); + strcpy(intr_pin_name, "max7315_int"); + } + + gpio_base = get_gpio_by_name(base_pin_name); + intr = get_gpio_by_name(intr_pin_name); + + if (gpio_base == -1) + return NULL; + max7315->gpio_base = gpio_base; + if (intr != -1) { + i2c_info->irq = intr + INTEL_MID_IRQ_OFFSET; + max7315->irq_base = gpio_base + INTEL_MID_IRQ_OFFSET; + } else { + i2c_info->irq = -1; + max7315->irq_base = -1; + } + return max7315; +} + +static const struct devs_id max7315_dev_id __initconst = { + .name = "i2c_max7315", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, + .get_platform_data = &max7315_platform_data, +}; + +static const struct devs_id max7315_2_dev_id __initconst = { + .name = "i2c_max7315_2", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, + .get_platform_data = &max7315_platform_data, +}; + +sfi_device(max7315_dev_id); +sfi_device(max7315_2_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_mpu3050.c b/arch/x86/platform/intel-mid/device_libs/platform_mpu3050.c new file mode 100644 index 000000000000..dd28d63c84fb --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_mpu3050.c @@ -0,0 +1,36 @@ +/* + * platform_mpu3050.c: mpu3050 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include + +static void *mpu3050_platform_data(void *info) +{ + struct i2c_board_info *i2c_info = info; + int intr = get_gpio_by_name("mpu3050_int"); + + if (intr == -1) + return NULL; + + i2c_info->irq = intr + INTEL_MID_IRQ_OFFSET; + return NULL; +} + +static const struct devs_id mpu3050_dev_id __initconst = { + .name = "mpu3050", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, + .get_platform_data = &mpu3050_platform_data, +}; + +sfi_device(mpu3050_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic.c b/arch/x86/platform/intel-mid/device_libs/platform_msic.c new file mode 100644 index 000000000000..9f4a775a69d6 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic.c @@ -0,0 +1,87 @@ +/* + * platform_msic.c: MSIC platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include "platform_msic.h" + +struct intel_msic_platform_data msic_pdata; + +static struct resource msic_resources[] = { + { + .start = INTEL_MSIC_IRQ_PHYS_BASE, + .end = INTEL_MSIC_IRQ_PHYS_BASE + 64 - 1, + .flags = IORESOURCE_MEM, + }, +}; + +static struct platform_device msic_device = { + .name = "intel_msic", + .id = -1, + .dev = { + .platform_data = &msic_pdata, + }, + .num_resources = ARRAY_SIZE(msic_resources), + .resource = msic_resources, +}; + +static int msic_scu_status_change(struct notifier_block *nb, + unsigned long code, void *data) +{ + if (code == SCU_DOWN) { + platform_device_unregister(&msic_device); + return 0; + } + + return platform_device_register(&msic_device); +} + +static int __init msic_init(void) +{ + static struct notifier_block msic_scu_notifier = { + .notifier_call = msic_scu_status_change, + }; + + /* + * We need to be sure that the SCU IPC is ready before MSIC device + * can be registered. + */ + if (intel_mid_has_msic()) + intel_scu_notifier_add(&msic_scu_notifier); + + return 0; +} +arch_initcall(msic_init); + +/* + * msic_generic_platform_data - sets generic platform data for the block + * @info: pointer to the SFI device table entry for this block + * @block: MSIC block + * + * Function sets IRQ number from the SFI table entry for given device to + * the MSIC platform data. + */ +void *msic_generic_platform_data(void *info, enum intel_msic_block block) +{ + struct sfi_device_table_entry *entry = info; + + BUG_ON(block < 0 || block >= INTEL_MSIC_BLOCK_LAST); + msic_pdata.irq[block] = entry->irq; + + return NULL; +} diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic.h b/arch/x86/platform/intel-mid/device_libs/platform_msic.h new file mode 100644 index 000000000000..917eb56d77da --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic.h @@ -0,0 +1,19 @@ +/* + * platform_msic.h: MSIC platform data header file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ +#ifndef _PLATFORM_MSIC_H_ +#define _PLATFORM_MSIC_H_ + +extern struct intel_msic_platform_data msic_pdata; + +extern void *msic_generic_platform_data(void *info, + enum intel_msic_block block) __attribute__((weak)); +#endif diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_audio.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_audio.c new file mode 100644 index 000000000000..29629397d2b3 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_audio.c @@ -0,0 +1,47 @@ +/* + * platform_msic_audio.c: MSIC audio platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void *msic_audio_platform_data(void *info) +{ + struct platform_device *pdev; + + pdev = platform_device_register_simple("sst-platform", -1, NULL, 0); + + if (IS_ERR(pdev)) { + pr_err("failed to create audio platform device\n"); + return NULL; + } + + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_AUDIO); +} + +static const struct devs_id msic_audio_dev_id __initconst = { + .name = "msic_audio", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_audio_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_audio_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_battery.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_battery.c new file mode 100644 index 000000000000..f446c33df1a8 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_battery.c @@ -0,0 +1,37 @@ +/* + * platform_msic_battery.c: MSIC battery platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void __init *msic_battery_platform_data(void *info) +{ + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_BATTERY); +} + +static const struct devs_id msic_battery_dev_id __initconst = { + .name = "msic_battery", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_battery_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_battery_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_gpio.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_gpio.c new file mode 100644 index 000000000000..2a4f7b1dd917 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_gpio.c @@ -0,0 +1,48 @@ +/* + * platform_msic_gpio.c: MSIC GPIO platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void __init *msic_gpio_platform_data(void *info) +{ + static struct intel_msic_gpio_pdata msic_gpio_pdata; + + int gpio = get_gpio_by_name("msic_gpio_base"); + + if (gpio < 0) + return NULL; + + msic_gpio_pdata.gpio_base = gpio; + msic_pdata.gpio = &msic_gpio_pdata; + + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_GPIO); +} + +static const struct devs_id msic_gpio_dev_id __initconst = { + .name = "msic_gpio", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_gpio_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_gpio_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_ocd.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_ocd.c new file mode 100644 index 000000000000..6497111ddb54 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_ocd.c @@ -0,0 +1,49 @@ +/* + * platform_msic_ocd.c: MSIC OCD platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void __init *msic_ocd_platform_data(void *info) +{ + static struct intel_msic_ocd_pdata msic_ocd_pdata; + int gpio; + + gpio = get_gpio_by_name("ocd_gpio"); + + if (gpio < 0) + return NULL; + + msic_ocd_pdata.gpio = gpio; + msic_pdata.ocd = &msic_ocd_pdata; + + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_OCD); +} + +static const struct devs_id msic_ocd_dev_id __initconst = { + .name = "msic_ocd", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_ocd_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_ocd_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_power_btn.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_power_btn.c new file mode 100644 index 000000000000..83a3459bc337 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_power_btn.c @@ -0,0 +1,36 @@ +/* + * platform_msic_power_btn.c: MSIC power btn platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void __init *msic_power_btn_platform_data(void *info) +{ + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_POWER_BTN); +} + +static const struct devs_id msic_power_btn_dev_id __initconst = { + .name = "msic_power_btn", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_power_btn_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_power_btn_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_msic_thermal.c b/arch/x86/platform/intel-mid/device_libs/platform_msic_thermal.c new file mode 100644 index 000000000000..a351878b96bc --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_msic_thermal.c @@ -0,0 +1,37 @@ +/* + * platform_msic_thermal.c: msic_thermal platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "platform_msic.h" +#include "platform_ipc.h" + +static void __init *msic_thermal_platform_data(void *info) +{ + return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_THERMAL); +} + +static const struct devs_id msic_thermal_dev_id __initconst = { + .name = "msic_thermal", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &msic_thermal_platform_data, + .device_handler = &ipc_device_handler, +}; + +sfi_device(msic_thermal_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_pmic_gpio.c b/arch/x86/platform/intel-mid/device_libs/platform_pmic_gpio.c new file mode 100644 index 000000000000..d87182a09263 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_pmic_gpio.c @@ -0,0 +1,54 @@ +/* + * platform_pmic_gpio.c: PMIC GPIO platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "platform_ipc.h" + +static void __init *pmic_gpio_platform_data(void *info) +{ + static struct intel_pmic_gpio_platform_data pmic_gpio_pdata; + int gpio_base = get_gpio_by_name("pmic_gpio_base"); + + if (gpio_base == -1) + gpio_base = 64; + pmic_gpio_pdata.gpio_base = gpio_base; + pmic_gpio_pdata.irq_base = gpio_base + INTEL_MID_IRQ_OFFSET; + pmic_gpio_pdata.gpiointr = 0xffffeff8; + + return &pmic_gpio_pdata; +} + +static const struct devs_id pmic_gpio_spi_dev_id __initconst = { + .name = "pmic_gpio", + .type = SFI_DEV_TYPE_SPI, + .delay = 1, + .get_platform_data = &pmic_gpio_platform_data, +}; + +static const struct devs_id pmic_gpio_ipc_dev_id __initconst = { + .name = "pmic_gpio", + .type = SFI_DEV_TYPE_IPC, + .delay = 1, + .get_platform_data = &pmic_gpio_platform_data, + .device_handler = &ipc_device_handler +}; + +sfi_device(pmic_gpio_spi_dev_id); +sfi_device(pmic_gpio_ipc_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_tc35876x.c b/arch/x86/platform/intel-mid/device_libs/platform_tc35876x.c new file mode 100644 index 000000000000..740fc757050c --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_tc35876x.c @@ -0,0 +1,36 @@ +/* + * platform_tc35876x.c: tc35876x platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include + +/*tc35876x DSI_LVDS bridge chip and panel platform data*/ +static void *tc35876x_platform_data(void *data) +{ + static struct tc35876x_platform_data pdata; + + /* gpio pins set to -1 will not be used by the driver */ + pdata.gpio_bridge_reset = get_gpio_by_name("LCMB_RXEN"); + pdata.gpio_panel_bl_en = get_gpio_by_name("6S6P_BL_EN"); + pdata.gpio_panel_vadd = get_gpio_by_name("EN_VREG_LCD_V3P3"); + + return &pdata; +} + +static const struct devs_id tc35876x_dev_id __initconst = { + .name = "i2c_disp_brig", + .type = SFI_DEV_TYPE_I2C, + .get_platform_data = &tc35876x_platform_data, +}; + +sfi_device(tc35876x_dev_id); diff --git a/arch/x86/platform/intel-mid/device_libs/platform_tca6416.c b/arch/x86/platform/intel-mid/device_libs/platform_tca6416.c new file mode 100644 index 000000000000..22881c9a6737 --- /dev/null +++ b/arch/x86/platform/intel-mid/device_libs/platform_tca6416.c @@ -0,0 +1,57 @@ +/* + * platform_tca6416.c: tca6416 platform data initilization file + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include + +#define TCA6416_NAME "tca6416" +#define TCA6416_BASE "tca6416_base" +#define TCA6416_INTR "tca6416_int" + +static void *tca6416_platform_data(void *info) +{ + static struct pca953x_platform_data tca6416; + struct i2c_board_info *i2c_info = info; + int gpio_base, intr; + char base_pin_name[SFI_NAME_LEN + 1]; + char intr_pin_name[SFI_NAME_LEN + 1]; + + strcpy(i2c_info->type, TCA6416_NAME); + strcpy(base_pin_name, TCA6416_BASE); + strcpy(intr_pin_name, TCA6416_INTR); + + gpio_base = get_gpio_by_name(base_pin_name); + intr = get_gpio_by_name(intr_pin_name); + + if (gpio_base == -1) + return NULL; + tca6416.gpio_base = gpio_base; + if (intr != -1) { + i2c_info->irq = intr + INTEL_MID_IRQ_OFFSET; + tca6416.irq_base = gpio_base + INTEL_MID_IRQ_OFFSET; + } else { + i2c_info->irq = -1; + tca6416.irq_base = -1; + } + return &tca6416; +} + +static const struct devs_id tca6416_dev_id __initconst = { + .name = "tca6416", + .type = SFI_DEV_TYPE_I2C, + .delay = 1, + .get_platform_data = &tca6416_platform_data, +}; + +sfi_device(tca6416_dev_id); diff --git a/arch/x86/platform/mrst/early_printk_mrst.c b/arch/x86/platform/intel-mid/early_printk_intel_mid.c similarity index 97% rename from arch/x86/platform/mrst/early_printk_mrst.c rename to arch/x86/platform/intel-mid/early_printk_intel_mid.c index 028454f0c3a5..4f702f554f6e 100644 --- a/arch/x86/platform/mrst/early_printk_mrst.c +++ b/arch/x86/platform/intel-mid/early_printk_intel_mid.c @@ -1,5 +1,5 @@ /* - * early_printk_mrst.c - early consoles for Intel MID platforms + * early_printk_intel_mid.c - early consoles for Intel MID platforms * * Copyright (c) 2008-2010, Intel Corporation * @@ -27,7 +27,7 @@ #include #include -#include +#include #define MRST_SPI_TIMEOUT 0x200000 #define MRST_REGBASE_SPI0 0xff128000 @@ -152,7 +152,7 @@ void mrst_early_console_init(void) spi0_cdiv = ((*pclk_spi0) & 0xe00) >> 9; freq = 100000000 / (spi0_cdiv + 1); - if (mrst_identify_cpu() == MRST_CPU_CHIP_PENWELL) + if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_PENWELL) mrst_spi_paddr = MRST_REGBASE_SPI1; pspi = (void *)set_fixmap_offset_nocache(FIX_EARLYCON_MEM_BASE, @@ -213,13 +213,14 @@ static void early_mrst_spi_putc(char c) } if (!timeout) - pr_warning("MRST earlycon: timed out\n"); + pr_warn("MRST earlycon: timed out\n"); else max3110_write_data(c); } /* Early SPI only uses polling mode */ -static void early_mrst_spi_write(struct console *con, const char *str, unsigned n) +static void early_mrst_spi_write(struct console *con, const char *str, + unsigned n) { int i; diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c new file mode 100644 index 000000000000..523a1c8f7f07 --- /dev/null +++ b/arch/x86/platform/intel-mid/intel-mid.c @@ -0,0 +1,213 @@ +/* + * intel-mid.c: Intel MID platform setup code + * + * (C) Copyright 2008, 2012 Intel Corporation + * Author: Jacob Pan (jacob.jun.pan@intel.com) + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#define pr_fmt(fmt) "intel_mid: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * the clockevent devices on Moorestown/Medfield can be APBT or LAPIC clock, + * cmdline option x86_intel_mid_timer can be used to override the configuration + * to prefer one or the other. + * at runtime, there are basically three timer configurations: + * 1. per cpu apbt clock only + * 2. per cpu always-on lapic clocks only, this is Penwell/Medfield only + * 3. per cpu lapic clock (C3STOP) and one apbt clock, with broadcast. + * + * by default (without cmdline option), platform code first detects cpu type + * to see if we are on lincroft or penwell, then set up both lapic or apbt + * clocks accordingly. + * i.e. by default, medfield uses configuration #2, moorestown uses #1. + * config #3 is supported but not recommended on medfield. + * + * rating and feature summary: + * lapic (with C3STOP) --------- 100 + * apbt (always-on) ------------ 110 + * lapic (always-on,ARAT) ------ 150 + */ + +enum intel_mid_timer_options intel_mid_timer_options; + +enum intel_mid_cpu_type __intel_mid_cpu_chip; +EXPORT_SYMBOL_GPL(__intel_mid_cpu_chip); + +static void intel_mid_power_off(void) +{ +} + +static void intel_mid_reboot(void) +{ + intel_scu_ipc_simple_command(IPCMSG_COLD_BOOT, 0); +} + +static unsigned long __init intel_mid_calibrate_tsc(void) +{ + unsigned long fast_calibrate; + u32 lo, hi, ratio, fsb; + + rdmsr(MSR_IA32_PERF_STATUS, lo, hi); + pr_debug("IA32 perf status is 0x%x, 0x%0x\n", lo, hi); + ratio = (hi >> 8) & 0x1f; + pr_debug("ratio is %d\n", ratio); + if (!ratio) { + pr_err("read a zero ratio, should be incorrect!\n"); + pr_err("force tsc ratio to 16 ...\n"); + ratio = 16; + } + rdmsr(MSR_FSB_FREQ, lo, hi); + if ((lo & 0x7) == 0x7) + fsb = PENWELL_FSB_FREQ_83SKU; + else + fsb = PENWELL_FSB_FREQ_100SKU; + fast_calibrate = ratio * fsb; + pr_debug("read penwell tsc %lu khz\n", fast_calibrate); + lapic_timer_frequency = fsb * 1000 / HZ; + /* mark tsc clocksource as reliable */ + set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE); + + if (fast_calibrate) + return fast_calibrate; + + return 0; +} + +static void __init intel_mid_time_init(void) +{ + sfi_table_parse(SFI_SIG_MTMR, NULL, NULL, sfi_parse_mtmr); + switch (intel_mid_timer_options) { + case INTEL_MID_TIMER_APBT_ONLY: + break; + case INTEL_MID_TIMER_LAPIC_APBT: + x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; + x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; + break; + default: + if (!boot_cpu_has(X86_FEATURE_ARAT)) + break; + x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; + x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; + return; + } + /* we need at least one APB timer */ + pre_init_apic_IRQ0(); + apbt_time_init(); +} + +static void __cpuinit intel_mid_arch_setup(void) +{ + if (boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x27) + __intel_mid_cpu_chip = INTEL_MID_CPU_CHIP_PENWELL; + else { + pr_err("Unknown Intel MID CPU (%d:%d), default to Penwell\n", + boot_cpu_data.x86, boot_cpu_data.x86_model); + __intel_mid_cpu_chip = INTEL_MID_CPU_CHIP_PENWELL; + } +} + +/* MID systems don't have i8042 controller */ +static int intel_mid_i8042_detect(void) +{ + return 0; +} + +/* + * Moorestown does not have external NMI source nor port 0x61 to report + * NMI status. The possible NMI sources are from pmu as a result of NMI + * watchdog or lock debug. Reading io port 0x61 results in 0xff which + * misled NMI handler. + */ +static unsigned char intel_mid_get_nmi_reason(void) +{ + return 0; +} + +/* + * Moorestown specific x86_init function overrides and early setup + * calls. + */ +void __init x86_intel_mid_early_setup(void) +{ + x86_init.resources.probe_roms = x86_init_noop; + x86_init.resources.reserve_resources = x86_init_noop; + + x86_init.timers.timer_init = intel_mid_time_init; + x86_init.timers.setup_percpu_clockev = x86_init_noop; + + x86_init.irqs.pre_vector_init = x86_init_noop; + + x86_init.oem.arch_setup = intel_mid_arch_setup; + + x86_cpuinit.setup_percpu_clockev = apbt_setup_secondary_clock; + + x86_platform.calibrate_tsc = intel_mid_calibrate_tsc; + x86_platform.i8042_detect = intel_mid_i8042_detect; + x86_init.timers.wallclock_init = intel_mid_rtc_init; + x86_platform.get_nmi_reason = intel_mid_get_nmi_reason; + + x86_init.pci.init = intel_mid_pci_init; + x86_init.pci.fixup_irqs = x86_init_noop; + + legacy_pic = &null_legacy_pic; + + pm_power_off = intel_mid_power_off; + machine_ops.emergency_restart = intel_mid_reboot; + + /* Avoid searching for BIOS MP tables */ + x86_init.mpparse.find_smp_config = x86_init_noop; + x86_init.mpparse.get_smp_config = x86_init_uint_noop; + set_bit(MP_BUS_ISA, mp_bus_not_pci); +} + +/* + * if user does not want to use per CPU apb timer, just give it a lower rating + * than local apic timer and skip the late per cpu timer init. + */ +static inline int __init setup_x86_intel_mid_timer(char *arg) +{ + if (!arg) + return -EINVAL; + + if (strcmp("apbt_only", arg) == 0) + intel_mid_timer_options = INTEL_MID_TIMER_APBT_ONLY; + else if (strcmp("lapic_and_apbt", arg) == 0) + intel_mid_timer_options = INTEL_MID_TIMER_LAPIC_APBT; + else { + pr_warn("X86 INTEL_MID timer option %s not recognised" + " use x86_intel_mid_timer=apbt_only or lapic_and_apbt\n", + arg); + return -EINVAL; + } + return 0; +} +__setup("x86_intel_mid_timer=", setup_x86_intel_mid_timer); + diff --git a/arch/x86/platform/mrst/vrtc.c b/arch/x86/platform/intel-mid/intel_mid_vrtc.c similarity index 90% rename from arch/x86/platform/mrst/vrtc.c rename to arch/x86/platform/intel-mid/intel_mid_vrtc.c index 5e355b134ba4..4762cff7facd 100644 --- a/arch/x86/platform/mrst/vrtc.c +++ b/arch/x86/platform/intel-mid/intel_mid_vrtc.c @@ -1,5 +1,5 @@ /* - * vrtc.c: Driver for virtual RTC device on Intel MID platform + * intel_mid_vrtc.c: Driver for virtual RTC device on Intel MID platform * * (C) Copyright 2009 Intel Corporation * @@ -23,8 +23,8 @@ #include #include -#include -#include +#include +#include #include #include @@ -79,7 +79,7 @@ void vrtc_get_time(struct timespec *now) /* vRTC YEAR reg contains the offset to 1972 */ year += 1972; - printk(KERN_INFO "vRTC: sec: %d min: %d hour: %d day: %d " + pr_info("vRTC: sec: %d min: %d hour: %d day: %d " "mon: %d year: %d\n", sec, min, hour, mday, mon, year); now->tv_sec = mktime(year, mon, mday, hour, min, sec); @@ -109,15 +109,14 @@ int vrtc_set_mmss(const struct timespec *now) vrtc_cmos_write(tm.tm_sec, RTC_SECONDS); spin_unlock_irqrestore(&rtc_lock, flags); } else { - printk(KERN_ERR - "%s: Invalid vRTC value: write of %lx to vRTC failed\n", + pr_err("%s: Invalid vRTC value: write of %lx to vRTC failed\n", __FUNCTION__, now->tv_sec); retval = -EINVAL; } return retval; } -void __init mrst_rtc_init(void) +void __init intel_mid_rtc_init(void) { unsigned long vrtc_paddr; @@ -155,10 +154,10 @@ static struct platform_device vrtc_device = { }; /* Register the RTC device if appropriate */ -static int __init mrst_device_create(void) +static int __init intel_mid_device_create(void) { /* No Moorestown, no device */ - if (!mrst_identify_cpu()) + if (!intel_mid_identify_cpu()) return -ENODEV; /* No timer, no device */ if (!sfi_mrtc_num) @@ -175,4 +174,4 @@ static int __init mrst_device_create(void) return platform_device_register(&vrtc_device); } -module_init(mrst_device_create); +module_init(intel_mid_device_create); diff --git a/arch/x86/platform/intel-mid/sfi.c b/arch/x86/platform/intel-mid/sfi.c new file mode 100644 index 000000000000..c84c1ca396bf --- /dev/null +++ b/arch/x86/platform/intel-mid/sfi.c @@ -0,0 +1,488 @@ +/* + * intel_mid_sfi.c: Intel MID SFI initialization code + * + * (C) Copyright 2013 Intel Corporation + * Author: Sathyanarayanan Kuppuswamy + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SFI_SIG_OEM0 "OEM0" +#define MAX_IPCDEVS 24 +#define MAX_SCU_SPI 24 +#define MAX_SCU_I2C 24 + +static struct platform_device *ipc_devs[MAX_IPCDEVS]; +static struct spi_board_info *spi_devs[MAX_SCU_SPI]; +static struct i2c_board_info *i2c_devs[MAX_SCU_I2C]; +static struct sfi_gpio_table_entry *gpio_table; +static struct sfi_timer_table_entry sfi_mtimer_array[SFI_MTMR_MAX_NUM]; +static int ipc_next_dev; +static int spi_next_dev; +static int i2c_next_dev; +static int i2c_bus[MAX_SCU_I2C]; +static int gpio_num_entry; +static u32 sfi_mtimer_usage[SFI_MTMR_MAX_NUM]; +int sfi_mrtc_num; +int sfi_mtimer_num; + +struct sfi_rtc_table_entry sfi_mrtc_array[SFI_MRTC_MAX]; +EXPORT_SYMBOL_GPL(sfi_mrtc_array); + +struct blocking_notifier_head intel_scu_notifier = + BLOCKING_NOTIFIER_INIT(intel_scu_notifier); +EXPORT_SYMBOL_GPL(intel_scu_notifier); + +#define intel_mid_sfi_get_pdata(dev, priv) \ + ((dev)->get_platform_data ? (dev)->get_platform_data(priv) : NULL) + +/* parse all the mtimer info to a static mtimer array */ +int __init sfi_parse_mtmr(struct sfi_table_header *table) +{ + struct sfi_table_simple *sb; + struct sfi_timer_table_entry *pentry; + struct mpc_intsrc mp_irq; + int totallen; + + sb = (struct sfi_table_simple *)table; + if (!sfi_mtimer_num) { + sfi_mtimer_num = SFI_GET_NUM_ENTRIES(sb, + struct sfi_timer_table_entry); + pentry = (struct sfi_timer_table_entry *) sb->pentry; + totallen = sfi_mtimer_num * sizeof(*pentry); + memcpy(sfi_mtimer_array, pentry, totallen); + } + + pr_debug("SFI MTIMER info (num = %d):\n", sfi_mtimer_num); + pentry = sfi_mtimer_array; + for (totallen = 0; totallen < sfi_mtimer_num; totallen++, pentry++) { + pr_debug("timer[%d]: paddr = 0x%08x, freq = %dHz, irq = %d\n", + totallen, (u32)pentry->phys_addr, + pentry->freq_hz, pentry->irq); + if (!pentry->irq) + continue; + mp_irq.type = MP_INTSRC; + mp_irq.irqtype = mp_INT; +/* triggering mode edge bit 2-3, active high polarity bit 0-1 */ + mp_irq.irqflag = 5; + mp_irq.srcbus = MP_BUS_ISA; + mp_irq.srcbusirq = pentry->irq; /* IRQ */ + mp_irq.dstapic = MP_APIC_ALL; + mp_irq.dstirq = pentry->irq; + mp_save_irq(&mp_irq); + } + + return 0; +} + +struct sfi_timer_table_entry *sfi_get_mtmr(int hint) +{ + int i; + if (hint < sfi_mtimer_num) { + if (!sfi_mtimer_usage[hint]) { + pr_debug("hint taken for timer %d irq %d\n", + hint, sfi_mtimer_array[hint].irq); + sfi_mtimer_usage[hint] = 1; + return &sfi_mtimer_array[hint]; + } + } + /* take the first timer available */ + for (i = 0; i < sfi_mtimer_num;) { + if (!sfi_mtimer_usage[i]) { + sfi_mtimer_usage[i] = 1; + return &sfi_mtimer_array[i]; + } + i++; + } + return NULL; +} + +void sfi_free_mtmr(struct sfi_timer_table_entry *mtmr) +{ + int i; + for (i = 0; i < sfi_mtimer_num;) { + if (mtmr->irq == sfi_mtimer_array[i].irq) { + sfi_mtimer_usage[i] = 0; + return; + } + i++; + } +} + +/* parse all the mrtc info to a global mrtc array */ +int __init sfi_parse_mrtc(struct sfi_table_header *table) +{ + struct sfi_table_simple *sb; + struct sfi_rtc_table_entry *pentry; + struct mpc_intsrc mp_irq; + + int totallen; + + sb = (struct sfi_table_simple *)table; + if (!sfi_mrtc_num) { + sfi_mrtc_num = SFI_GET_NUM_ENTRIES(sb, + struct sfi_rtc_table_entry); + pentry = (struct sfi_rtc_table_entry *)sb->pentry; + totallen = sfi_mrtc_num * sizeof(*pentry); + memcpy(sfi_mrtc_array, pentry, totallen); + } + + pr_debug("SFI RTC info (num = %d):\n", sfi_mrtc_num); + pentry = sfi_mrtc_array; + for (totallen = 0; totallen < sfi_mrtc_num; totallen++, pentry++) { + pr_debug("RTC[%d]: paddr = 0x%08x, irq = %d\n", + totallen, (u32)pentry->phys_addr, pentry->irq); + mp_irq.type = MP_INTSRC; + mp_irq.irqtype = mp_INT; + mp_irq.irqflag = 0xf; /* level trigger and active low */ + mp_irq.srcbus = MP_BUS_ISA; + mp_irq.srcbusirq = pentry->irq; /* IRQ */ + mp_irq.dstapic = MP_APIC_ALL; + mp_irq.dstirq = pentry->irq; + mp_save_irq(&mp_irq); + } + return 0; +} + + +/* + * Parsing GPIO table first, since the DEVS table will need this table + * to map the pin name to the actual pin. + */ +static int __init sfi_parse_gpio(struct sfi_table_header *table) +{ + struct sfi_table_simple *sb; + struct sfi_gpio_table_entry *pentry; + int num, i; + + if (gpio_table) + return 0; + sb = (struct sfi_table_simple *)table; + num = SFI_GET_NUM_ENTRIES(sb, struct sfi_gpio_table_entry); + pentry = (struct sfi_gpio_table_entry *)sb->pentry; + + gpio_table = kmalloc(num * sizeof(*pentry), GFP_KERNEL); + if (!gpio_table) + return -1; + memcpy(gpio_table, pentry, num * sizeof(*pentry)); + gpio_num_entry = num; + + pr_debug("GPIO pin info:\n"); + for (i = 0; i < num; i++, pentry++) + pr_debug("info[%2d]: controller = %16.16s, pin_name = %16.16s," + " pin = %d\n", i, + pentry->controller_name, + pentry->pin_name, + pentry->pin_no); + return 0; +} + +int get_gpio_by_name(const char *name) +{ + struct sfi_gpio_table_entry *pentry = gpio_table; + int i; + + if (!pentry) + return -1; + for (i = 0; i < gpio_num_entry; i++, pentry++) { + if (!strncmp(name, pentry->pin_name, SFI_NAME_LEN)) + return pentry->pin_no; + } + return -1; +} + +void __init intel_scu_device_register(struct platform_device *pdev) +{ + if (ipc_next_dev == MAX_IPCDEVS) + pr_err("too many SCU IPC devices"); + else + ipc_devs[ipc_next_dev++] = pdev; +} + +static void __init intel_scu_spi_device_register(struct spi_board_info *sdev) +{ + struct spi_board_info *new_dev; + + if (spi_next_dev == MAX_SCU_SPI) { + pr_err("too many SCU SPI devices"); + return; + } + + new_dev = kzalloc(sizeof(*sdev), GFP_KERNEL); + if (!new_dev) { + pr_err("failed to alloc mem for delayed spi dev %s\n", + sdev->modalias); + return; + } + memcpy(new_dev, sdev, sizeof(*sdev)); + + spi_devs[spi_next_dev++] = new_dev; +} + +static void __init intel_scu_i2c_device_register(int bus, + struct i2c_board_info *idev) +{ + struct i2c_board_info *new_dev; + + if (i2c_next_dev == MAX_SCU_I2C) { + pr_err("too many SCU I2C devices"); + return; + } + + new_dev = kzalloc(sizeof(*idev), GFP_KERNEL); + if (!new_dev) { + pr_err("failed to alloc mem for delayed i2c dev %s\n", + idev->type); + return; + } + memcpy(new_dev, idev, sizeof(*idev)); + + i2c_bus[i2c_next_dev] = bus; + i2c_devs[i2c_next_dev++] = new_dev; +} + +/* Called by IPC driver */ +void intel_scu_devices_create(void) +{ + int i; + + for (i = 0; i < ipc_next_dev; i++) + platform_device_add(ipc_devs[i]); + + for (i = 0; i < spi_next_dev; i++) + spi_register_board_info(spi_devs[i], 1); + + for (i = 0; i < i2c_next_dev; i++) { + struct i2c_adapter *adapter; + struct i2c_client *client; + + adapter = i2c_get_adapter(i2c_bus[i]); + if (adapter) { + client = i2c_new_device(adapter, i2c_devs[i]); + if (!client) + pr_err("can't create i2c device %s\n", + i2c_devs[i]->type); + } else + i2c_register_board_info(i2c_bus[i], i2c_devs[i], 1); + } + intel_scu_notifier_post(SCU_AVAILABLE, NULL); +} +EXPORT_SYMBOL_GPL(intel_scu_devices_create); + +/* Called by IPC driver */ +void intel_scu_devices_destroy(void) +{ + int i; + + intel_scu_notifier_post(SCU_DOWN, NULL); + + for (i = 0; i < ipc_next_dev; i++) + platform_device_del(ipc_devs[i]); +} +EXPORT_SYMBOL_GPL(intel_scu_devices_destroy); + +static void __init install_irq_resource(struct platform_device *pdev, int irq) +{ + /* Single threaded */ + static struct resource res __initdata = { + .name = "IRQ", + .flags = IORESOURCE_IRQ, + }; + res.start = irq; + platform_device_add_resources(pdev, &res, 1); +} + +static void __init sfi_handle_ipc_dev(struct sfi_device_table_entry *pentry, + struct devs_id *dev) +{ + struct platform_device *pdev; + void *pdata = NULL; + + pr_debug("IPC bus, name = %16.16s, irq = 0x%2x\n", + pentry->name, pentry->irq); + pdata = intel_mid_sfi_get_pdata(dev, pentry); + + pdev = platform_device_alloc(pentry->name, 0); + if (pdev == NULL) { + pr_err("out of memory for SFI platform device '%s'.\n", + pentry->name); + return; + } + install_irq_resource(pdev, pentry->irq); + + pdev->dev.platform_data = pdata; + platform_device_add(pdev); +} + +static void __init sfi_handle_spi_dev(struct sfi_device_table_entry *pentry, + struct devs_id *dev) +{ + struct spi_board_info spi_info; + void *pdata = NULL; + + memset(&spi_info, 0, sizeof(spi_info)); + strncpy(spi_info.modalias, pentry->name, SFI_NAME_LEN); + spi_info.irq = ((pentry->irq == (u8)0xff) ? 0 : pentry->irq); + spi_info.bus_num = pentry->host_num; + spi_info.chip_select = pentry->addr; + spi_info.max_speed_hz = pentry->max_freq; + pr_debug("SPI bus=%d, name=%16.16s, irq=0x%2x, max_freq=%d, cs=%d\n", + spi_info.bus_num, + spi_info.modalias, + spi_info.irq, + spi_info.max_speed_hz, + spi_info.chip_select); + + pdata = intel_mid_sfi_get_pdata(dev, &spi_info); + + spi_info.platform_data = pdata; + if (dev->delay) + intel_scu_spi_device_register(&spi_info); + else + spi_register_board_info(&spi_info, 1); +} + +static void __init sfi_handle_i2c_dev(struct sfi_device_table_entry *pentry, + struct devs_id *dev) +{ + struct i2c_board_info i2c_info; + void *pdata = NULL; + + memset(&i2c_info, 0, sizeof(i2c_info)); + strncpy(i2c_info.type, pentry->name, SFI_NAME_LEN); + i2c_info.irq = ((pentry->irq == (u8)0xff) ? 0 : pentry->irq); + i2c_info.addr = pentry->addr; + pr_debug("I2C bus = %d, name = %16.16s, irq = 0x%2x, addr = 0x%x\n", + pentry->host_num, + i2c_info.type, + i2c_info.irq, + i2c_info.addr); + pdata = intel_mid_sfi_get_pdata(dev, &i2c_info); + i2c_info.platform_data = pdata; + + if (dev->delay) + intel_scu_i2c_device_register(pentry->host_num, &i2c_info); + else + i2c_register_board_info(pentry->host_num, &i2c_info, 1); +} + +extern struct devs_id *const __x86_intel_mid_dev_start[], + *const __x86_intel_mid_dev_end[]; + +static struct devs_id __init *get_device_id(u8 type, char *name) +{ + struct devs_id *const *dev_table; + + for (dev_table = __x86_intel_mid_dev_start; + dev_table < __x86_intel_mid_dev_end; dev_table++) { + struct devs_id *dev = *dev_table; + if (dev->type == type && + !strncmp(dev->name, name, SFI_NAME_LEN)) { + return dev; + } + } + + return NULL; +} + +static int __init sfi_parse_devs(struct sfi_table_header *table) +{ + struct sfi_table_simple *sb; + struct sfi_device_table_entry *pentry; + struct devs_id *dev = NULL; + int num, i; + int ioapic; + struct io_apic_irq_attr irq_attr; + + sb = (struct sfi_table_simple *)table; + num = SFI_GET_NUM_ENTRIES(sb, struct sfi_device_table_entry); + pentry = (struct sfi_device_table_entry *)sb->pentry; + + for (i = 0; i < num; i++, pentry++) { + int irq = pentry->irq; + + if (irq != (u8)0xff) { /* native RTE case */ + /* these SPI2 devices are not exposed to system as PCI + * devices, but they have separate RTE entry in IOAPIC + * so we have to enable them one by one here + */ + ioapic = mp_find_ioapic(irq); + irq_attr.ioapic = ioapic; + irq_attr.ioapic_pin = irq; + irq_attr.trigger = 1; + irq_attr.polarity = 1; + io_apic_set_pci_routing(NULL, irq, &irq_attr); + } else + irq = 0; /* No irq */ + + dev = get_device_id(pentry->type, pentry->name); + + if (!dev) + continue; + + if (dev->device_handler) { + dev->device_handler(pentry, dev); + } else { + switch (pentry->type) { + case SFI_DEV_TYPE_IPC: + sfi_handle_ipc_dev(pentry, dev); + break; + case SFI_DEV_TYPE_SPI: + sfi_handle_spi_dev(pentry, dev); + break; + case SFI_DEV_TYPE_I2C: + sfi_handle_i2c_dev(pentry, dev); + break; + case SFI_DEV_TYPE_UART: + case SFI_DEV_TYPE_HSI: + default: + break; + } + } + } + return 0; +} + +static int __init intel_mid_platform_init(void) +{ + sfi_table_parse(SFI_SIG_GPIO, NULL, NULL, sfi_parse_gpio); + sfi_table_parse(SFI_SIG_DEVS, NULL, NULL, sfi_parse_devs); + return 0; +} +arch_initcall(intel_mid_platform_init); diff --git a/arch/x86/platform/mrst/Makefile b/arch/x86/platform/mrst/Makefile deleted file mode 100644 index af1da7e623f9..000000000000 --- a/arch/x86/platform/mrst/Makefile +++ /dev/null @@ -1,3 +0,0 @@ -obj-$(CONFIG_X86_INTEL_MID) += mrst.o -obj-$(CONFIG_X86_INTEL_MID) += vrtc.o -obj-$(CONFIG_EARLY_PRINTK_INTEL_MID) += early_printk_mrst.o diff --git a/arch/x86/platform/mrst/mrst.c b/arch/x86/platform/mrst/mrst.c deleted file mode 100644 index 3ca5957b7a34..000000000000 --- a/arch/x86/platform/mrst/mrst.c +++ /dev/null @@ -1,1052 +0,0 @@ -/* - * mrst.c: Intel Moorestown platform specific setup code - * - * (C) Copyright 2008 Intel Corporation - * Author: Jacob Pan (jacob.jun.pan@intel.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; version 2 - * of the License. - */ - -#define pr_fmt(fmt) "mrst: " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * the clockevent devices on Moorestown/Medfield can be APBT or LAPIC clock, - * cmdline option x86_mrst_timer can be used to override the configuration - * to prefer one or the other. - * at runtime, there are basically three timer configurations: - * 1. per cpu apbt clock only - * 2. per cpu always-on lapic clocks only, this is Penwell/Medfield only - * 3. per cpu lapic clock (C3STOP) and one apbt clock, with broadcast. - * - * by default (without cmdline option), platform code first detects cpu type - * to see if we are on lincroft or penwell, then set up both lapic or apbt - * clocks accordingly. - * i.e. by default, medfield uses configuration #2, moorestown uses #1. - * config #3 is supported but not recommended on medfield. - * - * rating and feature summary: - * lapic (with C3STOP) --------- 100 - * apbt (always-on) ------------ 110 - * lapic (always-on,ARAT) ------ 150 - */ - -enum mrst_timer_options mrst_timer_options; - -static u32 sfi_mtimer_usage[SFI_MTMR_MAX_NUM]; -static struct sfi_timer_table_entry sfi_mtimer_array[SFI_MTMR_MAX_NUM]; -enum mrst_cpu_type __mrst_cpu_chip; -EXPORT_SYMBOL_GPL(__mrst_cpu_chip); - -int sfi_mtimer_num; - -struct sfi_rtc_table_entry sfi_mrtc_array[SFI_MRTC_MAX]; -EXPORT_SYMBOL_GPL(sfi_mrtc_array); -int sfi_mrtc_num; - -static void mrst_power_off(void) -{ -} - -static void mrst_reboot(void) -{ - intel_scu_ipc_simple_command(IPCMSG_COLD_BOOT, 0); -} - -/* parse all the mtimer info to a static mtimer array */ -static int __init sfi_parse_mtmr(struct sfi_table_header *table) -{ - struct sfi_table_simple *sb; - struct sfi_timer_table_entry *pentry; - struct mpc_intsrc mp_irq; - int totallen; - - sb = (struct sfi_table_simple *)table; - if (!sfi_mtimer_num) { - sfi_mtimer_num = SFI_GET_NUM_ENTRIES(sb, - struct sfi_timer_table_entry); - pentry = (struct sfi_timer_table_entry *) sb->pentry; - totallen = sfi_mtimer_num * sizeof(*pentry); - memcpy(sfi_mtimer_array, pentry, totallen); - } - - pr_debug("SFI MTIMER info (num = %d):\n", sfi_mtimer_num); - pentry = sfi_mtimer_array; - for (totallen = 0; totallen < sfi_mtimer_num; totallen++, pentry++) { - pr_debug("timer[%d]: paddr = 0x%08x, freq = %dHz," - " irq = %d\n", totallen, (u32)pentry->phys_addr, - pentry->freq_hz, pentry->irq); - if (!pentry->irq) - continue; - mp_irq.type = MP_INTSRC; - mp_irq.irqtype = mp_INT; -/* triggering mode edge bit 2-3, active high polarity bit 0-1 */ - mp_irq.irqflag = 5; - mp_irq.srcbus = MP_BUS_ISA; - mp_irq.srcbusirq = pentry->irq; /* IRQ */ - mp_irq.dstapic = MP_APIC_ALL; - mp_irq.dstirq = pentry->irq; - mp_save_irq(&mp_irq); - } - - return 0; -} - -struct sfi_timer_table_entry *sfi_get_mtmr(int hint) -{ - int i; - if (hint < sfi_mtimer_num) { - if (!sfi_mtimer_usage[hint]) { - pr_debug("hint taken for timer %d irq %d\n",\ - hint, sfi_mtimer_array[hint].irq); - sfi_mtimer_usage[hint] = 1; - return &sfi_mtimer_array[hint]; - } - } - /* take the first timer available */ - for (i = 0; i < sfi_mtimer_num;) { - if (!sfi_mtimer_usage[i]) { - sfi_mtimer_usage[i] = 1; - return &sfi_mtimer_array[i]; - } - i++; - } - return NULL; -} - -void sfi_free_mtmr(struct sfi_timer_table_entry *mtmr) -{ - int i; - for (i = 0; i < sfi_mtimer_num;) { - if (mtmr->irq == sfi_mtimer_array[i].irq) { - sfi_mtimer_usage[i] = 0; - return; - } - i++; - } -} - -/* parse all the mrtc info to a global mrtc array */ -int __init sfi_parse_mrtc(struct sfi_table_header *table) -{ - struct sfi_table_simple *sb; - struct sfi_rtc_table_entry *pentry; - struct mpc_intsrc mp_irq; - - int totallen; - - sb = (struct sfi_table_simple *)table; - if (!sfi_mrtc_num) { - sfi_mrtc_num = SFI_GET_NUM_ENTRIES(sb, - struct sfi_rtc_table_entry); - pentry = (struct sfi_rtc_table_entry *)sb->pentry; - totallen = sfi_mrtc_num * sizeof(*pentry); - memcpy(sfi_mrtc_array, pentry, totallen); - } - - pr_debug("SFI RTC info (num = %d):\n", sfi_mrtc_num); - pentry = sfi_mrtc_array; - for (totallen = 0; totallen < sfi_mrtc_num; totallen++, pentry++) { - pr_debug("RTC[%d]: paddr = 0x%08x, irq = %d\n", - totallen, (u32)pentry->phys_addr, pentry->irq); - mp_irq.type = MP_INTSRC; - mp_irq.irqtype = mp_INT; - mp_irq.irqflag = 0xf; /* level trigger and active low */ - mp_irq.srcbus = MP_BUS_ISA; - mp_irq.srcbusirq = pentry->irq; /* IRQ */ - mp_irq.dstapic = MP_APIC_ALL; - mp_irq.dstirq = pentry->irq; - mp_save_irq(&mp_irq); - } - return 0; -} - -static unsigned long __init mrst_calibrate_tsc(void) -{ - unsigned long fast_calibrate; - u32 lo, hi, ratio, fsb; - - rdmsr(MSR_IA32_PERF_STATUS, lo, hi); - pr_debug("IA32 perf status is 0x%x, 0x%0x\n", lo, hi); - ratio = (hi >> 8) & 0x1f; - pr_debug("ratio is %d\n", ratio); - if (!ratio) { - pr_err("read a zero ratio, should be incorrect!\n"); - pr_err("force tsc ratio to 16 ...\n"); - ratio = 16; - } - rdmsr(MSR_FSB_FREQ, lo, hi); - if ((lo & 0x7) == 0x7) - fsb = PENWELL_FSB_FREQ_83SKU; - else - fsb = PENWELL_FSB_FREQ_100SKU; - fast_calibrate = ratio * fsb; - pr_debug("read penwell tsc %lu khz\n", fast_calibrate); - lapic_timer_frequency = fsb * 1000 / HZ; - /* mark tsc clocksource as reliable */ - set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE); - - if (fast_calibrate) - return fast_calibrate; - - return 0; -} - -static void __init mrst_time_init(void) -{ - sfi_table_parse(SFI_SIG_MTMR, NULL, NULL, sfi_parse_mtmr); - switch (mrst_timer_options) { - case MRST_TIMER_APBT_ONLY: - break; - case MRST_TIMER_LAPIC_APBT: - x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; - x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; - break; - default: - if (!boot_cpu_has(X86_FEATURE_ARAT)) - break; - x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; - x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; - return; - } - /* we need at least one APB timer */ - pre_init_apic_IRQ0(); - apbt_time_init(); -} - -static void mrst_arch_setup(void) -{ - if (boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x27) - __mrst_cpu_chip = MRST_CPU_CHIP_PENWELL; - else { - pr_err("Unknown Intel MID CPU (%d:%d), default to Penwell\n", - boot_cpu_data.x86, boot_cpu_data.x86_model); - __mrst_cpu_chip = MRST_CPU_CHIP_PENWELL; - } -} - -/* MID systems don't have i8042 controller */ -static int mrst_i8042_detect(void) -{ - return 0; -} - -/* - * Moorestown does not have external NMI source nor port 0x61 to report - * NMI status. The possible NMI sources are from pmu as a result of NMI - * watchdog or lock debug. Reading io port 0x61 results in 0xff which - * misled NMI handler. - */ -static unsigned char mrst_get_nmi_reason(void) -{ - return 0; -} - -/* - * Moorestown specific x86_init function overrides and early setup - * calls. - */ -void __init x86_mrst_early_setup(void) -{ - x86_init.resources.probe_roms = x86_init_noop; - x86_init.resources.reserve_resources = x86_init_noop; - - x86_init.timers.timer_init = mrst_time_init; - x86_init.timers.setup_percpu_clockev = x86_init_noop; - - x86_init.irqs.pre_vector_init = x86_init_noop; - - x86_init.oem.arch_setup = mrst_arch_setup; - - x86_cpuinit.setup_percpu_clockev = apbt_setup_secondary_clock; - - x86_platform.calibrate_tsc = mrst_calibrate_tsc; - x86_platform.i8042_detect = mrst_i8042_detect; - x86_init.timers.wallclock_init = mrst_rtc_init; - x86_platform.get_nmi_reason = mrst_get_nmi_reason; - - x86_init.pci.init = pci_mrst_init; - x86_init.pci.fixup_irqs = x86_init_noop; - - legacy_pic = &null_legacy_pic; - - /* Moorestown specific power_off/restart method */ - pm_power_off = mrst_power_off; - machine_ops.emergency_restart = mrst_reboot; - - /* Avoid searching for BIOS MP tables */ - x86_init.mpparse.find_smp_config = x86_init_noop; - x86_init.mpparse.get_smp_config = x86_init_uint_noop; - set_bit(MP_BUS_ISA, mp_bus_not_pci); -} - -/* - * if user does not want to use per CPU apb timer, just give it a lower rating - * than local apic timer and skip the late per cpu timer init. - */ -static inline int __init setup_x86_mrst_timer(char *arg) -{ - if (!arg) - return -EINVAL; - - if (strcmp("apbt_only", arg) == 0) - mrst_timer_options = MRST_TIMER_APBT_ONLY; - else if (strcmp("lapic_and_apbt", arg) == 0) - mrst_timer_options = MRST_TIMER_LAPIC_APBT; - else { - pr_warning("X86 MRST timer option %s not recognised" - " use x86_mrst_timer=apbt_only or lapic_and_apbt\n", - arg); - return -EINVAL; - } - return 0; -} -__setup("x86_mrst_timer=", setup_x86_mrst_timer); - -/* - * Parsing GPIO table first, since the DEVS table will need this table - * to map the pin name to the actual pin. - */ -static struct sfi_gpio_table_entry *gpio_table; -static int gpio_num_entry; - -static int __init sfi_parse_gpio(struct sfi_table_header *table) -{ - struct sfi_table_simple *sb; - struct sfi_gpio_table_entry *pentry; - int num, i; - - if (gpio_table) - return 0; - sb = (struct sfi_table_simple *)table; - num = SFI_GET_NUM_ENTRIES(sb, struct sfi_gpio_table_entry); - pentry = (struct sfi_gpio_table_entry *)sb->pentry; - - gpio_table = kmalloc(num * sizeof(*pentry), GFP_KERNEL); - if (!gpio_table) - return -1; - memcpy(gpio_table, pentry, num * sizeof(*pentry)); - gpio_num_entry = num; - - pr_debug("GPIO pin info:\n"); - for (i = 0; i < num; i++, pentry++) - pr_debug("info[%2d]: controller = %16.16s, pin_name = %16.16s," - " pin = %d\n", i, - pentry->controller_name, - pentry->pin_name, - pentry->pin_no); - return 0; -} - -static int get_gpio_by_name(const char *name) -{ - struct sfi_gpio_table_entry *pentry = gpio_table; - int i; - - if (!pentry) - return -1; - for (i = 0; i < gpio_num_entry; i++, pentry++) { - if (!strncmp(name, pentry->pin_name, SFI_NAME_LEN)) - return pentry->pin_no; - } - return -1; -} - -/* - * Here defines the array of devices platform data that IAFW would export - * through SFI "DEVS" table, we use name and type to match the device and - * its platform data. - */ -struct devs_id { - char name[SFI_NAME_LEN + 1]; - u8 type; - u8 delay; - void *(*get_platform_data)(void *info); -}; - -/* the offset for the mapping of global gpio pin to irq */ -#define MRST_IRQ_OFFSET 0x100 - -static void __init *pmic_gpio_platform_data(void *info) -{ - static struct intel_pmic_gpio_platform_data pmic_gpio_pdata; - int gpio_base = get_gpio_by_name("pmic_gpio_base"); - - if (gpio_base == -1) - gpio_base = 64; - pmic_gpio_pdata.gpio_base = gpio_base; - pmic_gpio_pdata.irq_base = gpio_base + MRST_IRQ_OFFSET; - pmic_gpio_pdata.gpiointr = 0xffffeff8; - - return &pmic_gpio_pdata; -} - -static void __init *max3111_platform_data(void *info) -{ - struct spi_board_info *spi_info = info; - int intr = get_gpio_by_name("max3111_int"); - - spi_info->mode = SPI_MODE_0; - if (intr == -1) - return NULL; - spi_info->irq = intr + MRST_IRQ_OFFSET; - return NULL; -} - -/* we have multiple max7315 on the board ... */ -#define MAX7315_NUM 2 -static void __init *max7315_platform_data(void *info) -{ - static struct pca953x_platform_data max7315_pdata[MAX7315_NUM]; - static int nr; - struct pca953x_platform_data *max7315 = &max7315_pdata[nr]; - struct i2c_board_info *i2c_info = info; - int gpio_base, intr; - char base_pin_name[SFI_NAME_LEN + 1]; - char intr_pin_name[SFI_NAME_LEN + 1]; - - if (nr == MAX7315_NUM) { - pr_err("too many max7315s, we only support %d\n", - MAX7315_NUM); - return NULL; - } - /* we have several max7315 on the board, we only need load several - * instances of the same pca953x driver to cover them - */ - strcpy(i2c_info->type, "max7315"); - if (nr++) { - sprintf(base_pin_name, "max7315_%d_base", nr); - sprintf(intr_pin_name, "max7315_%d_int", nr); - } else { - strcpy(base_pin_name, "max7315_base"); - strcpy(intr_pin_name, "max7315_int"); - } - - gpio_base = get_gpio_by_name(base_pin_name); - intr = get_gpio_by_name(intr_pin_name); - - if (gpio_base == -1) - return NULL; - max7315->gpio_base = gpio_base; - if (intr != -1) { - i2c_info->irq = intr + MRST_IRQ_OFFSET; - max7315->irq_base = gpio_base + MRST_IRQ_OFFSET; - } else { - i2c_info->irq = -1; - max7315->irq_base = -1; - } - return max7315; -} - -static void *tca6416_platform_data(void *info) -{ - static struct pca953x_platform_data tca6416; - struct i2c_board_info *i2c_info = info; - int gpio_base, intr; - char base_pin_name[SFI_NAME_LEN + 1]; - char intr_pin_name[SFI_NAME_LEN + 1]; - - strcpy(i2c_info->type, "tca6416"); - strcpy(base_pin_name, "tca6416_base"); - strcpy(intr_pin_name, "tca6416_int"); - - gpio_base = get_gpio_by_name(base_pin_name); - intr = get_gpio_by_name(intr_pin_name); - - if (gpio_base == -1) - return NULL; - tca6416.gpio_base = gpio_base; - if (intr != -1) { - i2c_info->irq = intr + MRST_IRQ_OFFSET; - tca6416.irq_base = gpio_base + MRST_IRQ_OFFSET; - } else { - i2c_info->irq = -1; - tca6416.irq_base = -1; - } - return &tca6416; -} - -static void *mpu3050_platform_data(void *info) -{ - struct i2c_board_info *i2c_info = info; - int intr = get_gpio_by_name("mpu3050_int"); - - if (intr == -1) - return NULL; - - i2c_info->irq = intr + MRST_IRQ_OFFSET; - return NULL; -} - -static void __init *emc1403_platform_data(void *info) -{ - static short intr2nd_pdata; - struct i2c_board_info *i2c_info = info; - int intr = get_gpio_by_name("thermal_int"); - int intr2nd = get_gpio_by_name("thermal_alert"); - - if (intr == -1 || intr2nd == -1) - return NULL; - - i2c_info->irq = intr + MRST_IRQ_OFFSET; - intr2nd_pdata = intr2nd + MRST_IRQ_OFFSET; - - return &intr2nd_pdata; -} - -static void __init *lis331dl_platform_data(void *info) -{ - static short intr2nd_pdata; - struct i2c_board_info *i2c_info = info; - int intr = get_gpio_by_name("accel_int"); - int intr2nd = get_gpio_by_name("accel_2"); - - if (intr == -1 || intr2nd == -1) - return NULL; - - i2c_info->irq = intr + MRST_IRQ_OFFSET; - intr2nd_pdata = intr2nd + MRST_IRQ_OFFSET; - - return &intr2nd_pdata; -} - -static void __init *no_platform_data(void *info) -{ - return NULL; -} - -static struct resource msic_resources[] = { - { - .start = INTEL_MSIC_IRQ_PHYS_BASE, - .end = INTEL_MSIC_IRQ_PHYS_BASE + 64 - 1, - .flags = IORESOURCE_MEM, - }, -}; - -static struct intel_msic_platform_data msic_pdata; - -static struct platform_device msic_device = { - .name = "intel_msic", - .id = -1, - .dev = { - .platform_data = &msic_pdata, - }, - .num_resources = ARRAY_SIZE(msic_resources), - .resource = msic_resources, -}; - -static inline bool mrst_has_msic(void) -{ - return mrst_identify_cpu() == MRST_CPU_CHIP_PENWELL; -} - -static int msic_scu_status_change(struct notifier_block *nb, - unsigned long code, void *data) -{ - if (code == SCU_DOWN) { - platform_device_unregister(&msic_device); - return 0; - } - - return platform_device_register(&msic_device); -} - -static int __init msic_init(void) -{ - static struct notifier_block msic_scu_notifier = { - .notifier_call = msic_scu_status_change, - }; - - /* - * We need to be sure that the SCU IPC is ready before MSIC device - * can be registered. - */ - if (mrst_has_msic()) - intel_scu_notifier_add(&msic_scu_notifier); - - return 0; -} -arch_initcall(msic_init); - -/* - * msic_generic_platform_data - sets generic platform data for the block - * @info: pointer to the SFI device table entry for this block - * @block: MSIC block - * - * Function sets IRQ number from the SFI table entry for given device to - * the MSIC platform data. - */ -static void *msic_generic_platform_data(void *info, enum intel_msic_block block) -{ - struct sfi_device_table_entry *entry = info; - - BUG_ON(block < 0 || block >= INTEL_MSIC_BLOCK_LAST); - msic_pdata.irq[block] = entry->irq; - - return no_platform_data(info); -} - -static void *msic_battery_platform_data(void *info) -{ - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_BATTERY); -} - -static void *msic_gpio_platform_data(void *info) -{ - static struct intel_msic_gpio_pdata pdata; - int gpio = get_gpio_by_name("msic_gpio_base"); - - if (gpio < 0) - return NULL; - - pdata.gpio_base = gpio; - msic_pdata.gpio = &pdata; - - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_GPIO); -} - -static void *msic_audio_platform_data(void *info) -{ - struct platform_device *pdev; - - pdev = platform_device_register_simple("sst-platform", -1, NULL, 0); - if (IS_ERR(pdev)) { - pr_err("failed to create audio platform device\n"); - return NULL; - } - - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_AUDIO); -} - -static void *msic_power_btn_platform_data(void *info) -{ - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_POWER_BTN); -} - -static void *msic_ocd_platform_data(void *info) -{ - static struct intel_msic_ocd_pdata pdata; - int gpio = get_gpio_by_name("ocd_gpio"); - - if (gpio < 0) - return NULL; - - pdata.gpio = gpio; - msic_pdata.ocd = &pdata; - - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_OCD); -} - -static void *msic_thermal_platform_data(void *info) -{ - return msic_generic_platform_data(info, INTEL_MSIC_BLOCK_THERMAL); -} - -/* tc35876x DSI-LVDS bridge chip and panel platform data */ -static void *tc35876x_platform_data(void *data) -{ - static struct tc35876x_platform_data pdata; - - /* gpio pins set to -1 will not be used by the driver */ - pdata.gpio_bridge_reset = get_gpio_by_name("LCMB_RXEN"); - pdata.gpio_panel_bl_en = get_gpio_by_name("6S6P_BL_EN"); - pdata.gpio_panel_vadd = get_gpio_by_name("EN_VREG_LCD_V3P3"); - - return &pdata; -} - -static const struct devs_id __initconst device_ids[] = { - {"bma023", SFI_DEV_TYPE_I2C, 1, &no_platform_data}, - {"pmic_gpio", SFI_DEV_TYPE_SPI, 1, &pmic_gpio_platform_data}, - {"pmic_gpio", SFI_DEV_TYPE_IPC, 1, &pmic_gpio_platform_data}, - {"spi_max3111", SFI_DEV_TYPE_SPI, 0, &max3111_platform_data}, - {"i2c_max7315", SFI_DEV_TYPE_I2C, 1, &max7315_platform_data}, - {"i2c_max7315_2", SFI_DEV_TYPE_I2C, 1, &max7315_platform_data}, - {"tca6416", SFI_DEV_TYPE_I2C, 1, &tca6416_platform_data}, - {"emc1403", SFI_DEV_TYPE_I2C, 1, &emc1403_platform_data}, - {"i2c_accel", SFI_DEV_TYPE_I2C, 0, &lis331dl_platform_data}, - {"pmic_audio", SFI_DEV_TYPE_IPC, 1, &no_platform_data}, - {"mpu3050", SFI_DEV_TYPE_I2C, 1, &mpu3050_platform_data}, - {"i2c_disp_brig", SFI_DEV_TYPE_I2C, 0, &tc35876x_platform_data}, - - /* MSIC subdevices */ - {"msic_battery", SFI_DEV_TYPE_IPC, 1, &msic_battery_platform_data}, - {"msic_gpio", SFI_DEV_TYPE_IPC, 1, &msic_gpio_platform_data}, - {"msic_audio", SFI_DEV_TYPE_IPC, 1, &msic_audio_platform_data}, - {"msic_power_btn", SFI_DEV_TYPE_IPC, 1, &msic_power_btn_platform_data}, - {"msic_ocd", SFI_DEV_TYPE_IPC, 1, &msic_ocd_platform_data}, - {"msic_thermal", SFI_DEV_TYPE_IPC, 1, &msic_thermal_platform_data}, - - {}, -}; - -#define MAX_IPCDEVS 24 -static struct platform_device *ipc_devs[MAX_IPCDEVS]; -static int ipc_next_dev; - -#define MAX_SCU_SPI 24 -static struct spi_board_info *spi_devs[MAX_SCU_SPI]; -static int spi_next_dev; - -#define MAX_SCU_I2C 24 -static struct i2c_board_info *i2c_devs[MAX_SCU_I2C]; -static int i2c_bus[MAX_SCU_I2C]; -static int i2c_next_dev; - -static void __init intel_scu_device_register(struct platform_device *pdev) -{ - if(ipc_next_dev == MAX_IPCDEVS) - pr_err("too many SCU IPC devices"); - else - ipc_devs[ipc_next_dev++] = pdev; -} - -static void __init intel_scu_spi_device_register(struct spi_board_info *sdev) -{ - struct spi_board_info *new_dev; - - if (spi_next_dev == MAX_SCU_SPI) { - pr_err("too many SCU SPI devices"); - return; - } - - new_dev = kzalloc(sizeof(*sdev), GFP_KERNEL); - if (!new_dev) { - pr_err("failed to alloc mem for delayed spi dev %s\n", - sdev->modalias); - return; - } - memcpy(new_dev, sdev, sizeof(*sdev)); - - spi_devs[spi_next_dev++] = new_dev; -} - -static void __init intel_scu_i2c_device_register(int bus, - struct i2c_board_info *idev) -{ - struct i2c_board_info *new_dev; - - if (i2c_next_dev == MAX_SCU_I2C) { - pr_err("too many SCU I2C devices"); - return; - } - - new_dev = kzalloc(sizeof(*idev), GFP_KERNEL); - if (!new_dev) { - pr_err("failed to alloc mem for delayed i2c dev %s\n", - idev->type); - return; - } - memcpy(new_dev, idev, sizeof(*idev)); - - i2c_bus[i2c_next_dev] = bus; - i2c_devs[i2c_next_dev++] = new_dev; -} - -BLOCKING_NOTIFIER_HEAD(intel_scu_notifier); -EXPORT_SYMBOL_GPL(intel_scu_notifier); - -/* Called by IPC driver */ -void intel_scu_devices_create(void) -{ - int i; - - for (i = 0; i < ipc_next_dev; i++) - platform_device_add(ipc_devs[i]); - - for (i = 0; i < spi_next_dev; i++) - spi_register_board_info(spi_devs[i], 1); - - for (i = 0; i < i2c_next_dev; i++) { - struct i2c_adapter *adapter; - struct i2c_client *client; - - adapter = i2c_get_adapter(i2c_bus[i]); - if (adapter) { - client = i2c_new_device(adapter, i2c_devs[i]); - if (!client) - pr_err("can't create i2c device %s\n", - i2c_devs[i]->type); - } else - i2c_register_board_info(i2c_bus[i], i2c_devs[i], 1); - } - intel_scu_notifier_post(SCU_AVAILABLE, NULL); -} -EXPORT_SYMBOL_GPL(intel_scu_devices_create); - -/* Called by IPC driver */ -void intel_scu_devices_destroy(void) -{ - int i; - - intel_scu_notifier_post(SCU_DOWN, NULL); - - for (i = 0; i < ipc_next_dev; i++) - platform_device_del(ipc_devs[i]); -} -EXPORT_SYMBOL_GPL(intel_scu_devices_destroy); - -static void __init install_irq_resource(struct platform_device *pdev, int irq) -{ - /* Single threaded */ - static struct resource __initdata res = { - .name = "IRQ", - .flags = IORESOURCE_IRQ, - }; - res.start = irq; - platform_device_add_resources(pdev, &res, 1); -} - -static void __init sfi_handle_ipc_dev(struct sfi_device_table_entry *entry) -{ - const struct devs_id *dev = device_ids; - struct platform_device *pdev; - void *pdata = NULL; - - while (dev->name[0]) { - if (dev->type == SFI_DEV_TYPE_IPC && - !strncmp(dev->name, entry->name, SFI_NAME_LEN)) { - pdata = dev->get_platform_data(entry); - break; - } - dev++; - } - - /* - * On Medfield the platform device creation is handled by the MSIC - * MFD driver so we don't need to do it here. - */ - if (mrst_has_msic()) - return; - - pdev = platform_device_alloc(entry->name, 0); - if (pdev == NULL) { - pr_err("out of memory for SFI platform device '%s'.\n", - entry->name); - return; - } - install_irq_resource(pdev, entry->irq); - - pdev->dev.platform_data = pdata; - intel_scu_device_register(pdev); -} - -static void __init sfi_handle_spi_dev(struct spi_board_info *spi_info) -{ - const struct devs_id *dev = device_ids; - void *pdata = NULL; - - while (dev->name[0]) { - if (dev->type == SFI_DEV_TYPE_SPI && - !strncmp(dev->name, spi_info->modalias, SFI_NAME_LEN)) { - pdata = dev->get_platform_data(spi_info); - break; - } - dev++; - } - spi_info->platform_data = pdata; - if (dev->delay) - intel_scu_spi_device_register(spi_info); - else - spi_register_board_info(spi_info, 1); -} - -static void __init sfi_handle_i2c_dev(int bus, struct i2c_board_info *i2c_info) -{ - const struct devs_id *dev = device_ids; - void *pdata = NULL; - - while (dev->name[0]) { - if (dev->type == SFI_DEV_TYPE_I2C && - !strncmp(dev->name, i2c_info->type, SFI_NAME_LEN)) { - pdata = dev->get_platform_data(i2c_info); - break; - } - dev++; - } - i2c_info->platform_data = pdata; - - if (dev->delay) - intel_scu_i2c_device_register(bus, i2c_info); - else - i2c_register_board_info(bus, i2c_info, 1); - } - - -static int __init sfi_parse_devs(struct sfi_table_header *table) -{ - struct sfi_table_simple *sb; - struct sfi_device_table_entry *pentry; - struct spi_board_info spi_info; - struct i2c_board_info i2c_info; - int num, i, bus; - int ioapic; - struct io_apic_irq_attr irq_attr; - - sb = (struct sfi_table_simple *)table; - num = SFI_GET_NUM_ENTRIES(sb, struct sfi_device_table_entry); - pentry = (struct sfi_device_table_entry *)sb->pentry; - - for (i = 0; i < num; i++, pentry++) { - int irq = pentry->irq; - - if (irq != (u8)0xff) { /* native RTE case */ - /* these SPI2 devices are not exposed to system as PCI - * devices, but they have separate RTE entry in IOAPIC - * so we have to enable them one by one here - */ - ioapic = mp_find_ioapic(irq); - irq_attr.ioapic = ioapic; - irq_attr.ioapic_pin = irq; - irq_attr.trigger = 1; - irq_attr.polarity = 1; - io_apic_set_pci_routing(NULL, irq, &irq_attr); - } else - irq = 0; /* No irq */ - - switch (pentry->type) { - case SFI_DEV_TYPE_IPC: - pr_debug("info[%2d]: IPC bus, name = %16.16s, " - "irq = 0x%2x\n", i, pentry->name, pentry->irq); - sfi_handle_ipc_dev(pentry); - break; - case SFI_DEV_TYPE_SPI: - memset(&spi_info, 0, sizeof(spi_info)); - strncpy(spi_info.modalias, pentry->name, SFI_NAME_LEN); - spi_info.irq = irq; - spi_info.bus_num = pentry->host_num; - spi_info.chip_select = pentry->addr; - spi_info.max_speed_hz = pentry->max_freq; - pr_debug("info[%2d]: SPI bus = %d, name = %16.16s, " - "irq = 0x%2x, max_freq = %d, cs = %d\n", i, - spi_info.bus_num, - spi_info.modalias, - spi_info.irq, - spi_info.max_speed_hz, - spi_info.chip_select); - sfi_handle_spi_dev(&spi_info); - break; - case SFI_DEV_TYPE_I2C: - memset(&i2c_info, 0, sizeof(i2c_info)); - bus = pentry->host_num; - strncpy(i2c_info.type, pentry->name, SFI_NAME_LEN); - i2c_info.irq = irq; - i2c_info.addr = pentry->addr; - pr_debug("info[%2d]: I2C bus = %d, name = %16.16s, " - "irq = 0x%2x, addr = 0x%x\n", i, bus, - i2c_info.type, - i2c_info.irq, - i2c_info.addr); - sfi_handle_i2c_dev(bus, &i2c_info); - break; - case SFI_DEV_TYPE_UART: - case SFI_DEV_TYPE_HSI: - default: - ; - } - } - return 0; -} - -static int __init mrst_platform_init(void) -{ - sfi_table_parse(SFI_SIG_GPIO, NULL, NULL, sfi_parse_gpio); - sfi_table_parse(SFI_SIG_DEVS, NULL, NULL, sfi_parse_devs); - return 0; -} -arch_initcall(mrst_platform_init); - -/* - * we will search these buttons in SFI GPIO table (by name) - * and register them dynamically. Please add all possible - * buttons here, we will shrink them if no GPIO found. - */ -static struct gpio_keys_button gpio_button[] = { - {KEY_POWER, -1, 1, "power_btn", EV_KEY, 0, 3000}, - {KEY_PROG1, -1, 1, "prog_btn1", EV_KEY, 0, 20}, - {KEY_PROG2, -1, 1, "prog_btn2", EV_KEY, 0, 20}, - {SW_LID, -1, 1, "lid_switch", EV_SW, 0, 20}, - {KEY_VOLUMEUP, -1, 1, "vol_up", EV_KEY, 0, 20}, - {KEY_VOLUMEDOWN, -1, 1, "vol_down", EV_KEY, 0, 20}, - {KEY_CAMERA, -1, 1, "camera_full", EV_KEY, 0, 20}, - {KEY_CAMERA_FOCUS, -1, 1, "camera_half", EV_KEY, 0, 20}, - {SW_KEYPAD_SLIDE, -1, 1, "MagSw1", EV_SW, 0, 20}, - {SW_KEYPAD_SLIDE, -1, 1, "MagSw2", EV_SW, 0, 20}, -}; - -static struct gpio_keys_platform_data mrst_gpio_keys = { - .buttons = gpio_button, - .rep = 1, - .nbuttons = -1, /* will fill it after search */ -}; - -static struct platform_device pb_device = { - .name = "gpio-keys", - .id = -1, - .dev = { - .platform_data = &mrst_gpio_keys, - }, -}; - -/* - * Shrink the non-existent buttons, register the gpio button - * device if there is some - */ -static int __init pb_keys_init(void) -{ - struct gpio_keys_button *gb = gpio_button; - int i, num, good = 0; - - num = sizeof(gpio_button) / sizeof(struct gpio_keys_button); - for (i = 0; i < num; i++) { - gb[i].gpio = get_gpio_by_name(gb[i].desc); - pr_debug("info[%2d]: name = %s, gpio = %d\n", i, gb[i].desc, gb[i].gpio); - if (gb[i].gpio == -1) - continue; - - if (i != good) - gb[good] = gb[i]; - good++; - } - - if (good) { - mrst_gpio_keys.nbuttons = good; - return platform_device_register(&pb_device); - } - return 0; -} -late_initcall(pb_keys_init); diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c index f7bab68a4b83..11f9285a2ff6 100644 --- a/arch/x86/tools/relocs.c +++ b/arch/x86/tools/relocs.c @@ -722,15 +722,25 @@ static void percpu_init(void) /* * Check to see if a symbol lies in the .data..percpu section. - * For some as yet not understood reason the "__init_begin" - * symbol which immediately preceeds the .data..percpu section - * also shows up as it it were part of it so we do an explict - * check for that symbol name and ignore it. + * + * The linker incorrectly associates some symbols with the + * .data..percpu section so we also need to check the symbol + * name to make sure that we classify the symbol correctly. + * + * The GNU linker incorrectly associates: + * __init_begin + * __per_cpu_load + * + * The "gold" linker incorrectly associates: + * init_per_cpu__irq_stack_union + * init_per_cpu__gdt_page */ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname) { return (sym->st_shndx == per_cpu_shndx) && - strcmp(symname, "__init_begin"); + strcmp(symname, "__init_begin") && + strcmp(symname, "__per_cpu_load") && + strncmp(symname, "init_per_cpu_", 13); } diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 8b901e8d782d..a61c7d5811be 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -879,7 +879,6 @@ int m2p_add_override(unsigned long mfn, struct page *page, unsigned long uninitialized_var(address); unsigned level; pte_t *ptep = NULL; - int ret = 0; pfn = page_to_pfn(page); if (!PageHighMem(page)) { @@ -926,8 +925,8 @@ int m2p_add_override(unsigned long mfn, struct page *page, * frontend pages while they are being shared with the backend, * because mfn_to_pfn (that ends up being called by GUPF) will * return the backend pfn rather than the frontend pfn. */ - ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); - if (ret == 0 && get_phys_to_machine(pfn) == mfn) + pfn = mfn_to_pfn_no_overrides(mfn); + if (get_phys_to_machine(pfn) == mfn) set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)); return 0; @@ -942,7 +941,6 @@ int m2p_remove_override(struct page *page, unsigned long uninitialized_var(address); unsigned level; pte_t *ptep = NULL; - int ret = 0; pfn = page_to_pfn(page); mfn = get_phys_to_machine(pfn); @@ -1029,8 +1027,8 @@ int m2p_remove_override(struct page *page, * the original pfn causes mfn_to_pfn(mfn) to return the frontend * pfn again. */ mfn &= ~FOREIGN_FRAME_BIT; - ret = __get_user(pfn, &machine_to_phys_mapping[mfn]); - if (ret == 0 && get_phys_to_machine(pfn) == FOREIGN_FRAME(mfn) && + pfn = mfn_to_pfn_no_overrides(mfn); + if (get_phys_to_machine(pfn) == FOREIGN_FRAME(mfn) && m2p_find_override(mfn) == NULL) set_phys_to_machine(pfn, mfn); diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index d1e4777b4e75..31d04758b76f 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -278,6 +278,15 @@ static void __init xen_smp_prepare_boot_cpu(void) old memory can be recycled */ make_lowmem_page_readwrite(xen_initial_gdt); +#ifdef CONFIG_X86_32 + /* + * Xen starts us with XEN_FLAT_RING1_DS, but linux code + * expects __USER_DS + */ + loadsegment(ds, __USER_DS); + loadsegment(es, __USER_DS); +#endif + xen_filter_cpu_maps(); xen_setup_vcpu_info_placement(); } diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 253f63fceea1..be6b86078957 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -259,6 +259,14 @@ void xen_uninit_lock_cpu(int cpu) } +/* + * Our init of PV spinlocks is split in two init functions due to us + * using paravirt patching and jump labels patching and having to do + * all of this before SMP code is invoked. + * + * The paravirt patching needs to be done _before_ the alternative asm code + * is started, otherwise we would not patch the core kernel code. + */ void __init xen_init_spinlocks(void) { @@ -267,12 +275,26 @@ void __init xen_init_spinlocks(void) return; } - static_key_slow_inc(¶virt_ticketlocks_enabled); - pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; } +/* + * While the jump_label init code needs to happend _after_ the jump labels are + * enabled and before SMP is started. Hence we use pre-SMP initcall level + * init. We cannot do it in xen_init_spinlocks as that is done before + * jump labels are activated. + */ +static __init int xen_init_spinlocks_jump(void) +{ + if (!xen_pvspin) + return 0; + + static_key_slow_inc(¶virt_ticketlocks_enabled); + return 0; +} +early_initcall(xen_init_spinlocks_jump); + static __init int xen_parse_nopvspin(char *arg) { xen_pvspin = false; diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index 1b982641ec35..228d6aee3a16 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -28,3 +28,4 @@ generic-y += termios.h generic-y += topology.h generic-y += trace_clock.h generic-y += xor.h +generic-y += preempt.h diff --git a/block/Kconfig b/block/Kconfig index 7f38e40fee08..2429515c05c2 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -99,11 +99,16 @@ config BLK_DEV_THROTTLING See Documentation/cgroups/blkio-controller.txt for more information. -config CMDLINE_PARSER +config BLK_CMDLINE_PARSER bool "Block device command line partition parser" default n ---help--- - Parsing command line, get the partitions information. + Enabling this option allows you to specify the partition layout from + the kernel boot args. This is typically of use for embedded devices + which don't otherwise have any standardized method for listing the + partitions on a block device. + + See Documentation/block/cmdline-partition.txt for more information. menu "Partition Types" diff --git a/block/Makefile b/block/Makefile index 4fa4be544ece..671a83d063a5 100644 --- a/block/Makefile +++ b/block/Makefile @@ -18,4 +18,4 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o obj-$(CONFIG_BLK_DEV_INTEGRITY) += blk-integrity.o -obj-$(CONFIG_CMDLINE_PARSER) += cmdline-parser.o +obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o diff --git a/block/partitions/Kconfig b/block/partitions/Kconfig index 87a32086535d..9b29a996c311 100644 --- a/block/partitions/Kconfig +++ b/block/partitions/Kconfig @@ -263,7 +263,7 @@ config SYSV68_PARTITION config CMDLINE_PARTITION bool "Command line partition support" if PARTITION_ADVANCED - select CMDLINE_PARSER + select BLK_CMDLINE_PARSER help - Say Y here if you would read the partitions table from bootargs. + Say Y here if you want to read the partition table from bootargs. The format for the command line is just like mtdparts. diff --git a/block/partitions/cmdline.c b/block/partitions/cmdline.c index 56cf4ffad51e..5141b563adf1 100644 --- a/block/partitions/cmdline.c +++ b/block/partitions/cmdline.c @@ -2,15 +2,15 @@ * Copyright (C) 2013 HUAWEI * Author: Cai Zhiyong * - * Read block device partition table from command line. - * The partition used for fixed block device (eMMC) embedded device. - * It is no MBR, save storage space. Bootloader can be easily accessed + * Read block device partition table from the command line. + * Typically used for fixed block (eMMC) embedded devices. + * It has no MBR, so saves storage space. Bootloader can be easily accessed * by absolute address of data on the block device. * Users can easily change the partition. * * The format for the command line is just like mtdparts. * - * Verbose config please reference "Documentation/block/cmdline-partition.txt" + * For further information, see "Documentation/block/cmdline-partition.txt" * */ diff --git a/block/partitions/efi.c b/block/partitions/efi.c index 1eb09ee5311b..a8287b49d062 100644 --- a/block/partitions/efi.c +++ b/block/partitions/efi.c @@ -222,11 +222,16 @@ check_hybrid: * the disk size. * * Hybrid MBRs do not necessarily comply with this. + * + * Consider a bad value here to be a warning to support dd'ing + * an image from a smaller disk to a larger disk. */ if (ret == GPT_MBR_PROTECTIVE) { sz = le32_to_cpu(mbr->partition_record[part].size_in_lba); if (sz != (uint32_t) total_sectors - 1 && sz != 0xFFFFFFFF) - ret = 0; + pr_debug("GPT: mbr size in lba (%u) different than whole disk (%u).\n", + sz, min_t(uint32_t, + total_sectors - 1, 0xFFFFFFFF)); } done: return ret; diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 22327e6a7236..6efe2ac6902f 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -24,7 +24,7 @@ menuconfig ACPI are configured, ACPI is used. The project home page for the Linux ACPI subsystem is here: - + Linux support for ACPI is based on Intel Corporation's ACPI Component Architecture (ACPI CA). For more information on the @@ -123,9 +123,9 @@ config ACPI_BUTTON default y help This driver handles events on the power, sleep, and lid buttons. - A daemon reads /proc/acpi/event and perform user-defined actions - such as shutting down the system. This is necessary for - software-controlled poweroff. + A daemon reads events from input devices or via netlink and + performs user-defined actions such as shutting down the system. + This is necessary for software-controlled poweroff. To compile this driver as a module, choose M here: the module will be called button. diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index f40acef80269..a6977e12d574 100644 --- a/drivers/acpi/acpi_ipmi.c +++ b/drivers/acpi/acpi_ipmi.c @@ -39,6 +39,7 @@ #include #include #include +#include MODULE_AUTHOR("Zhao Yakui"); MODULE_DESCRIPTION("ACPI IPMI Opregion driver"); @@ -57,7 +58,7 @@ struct acpi_ipmi_device { struct list_head head; /* the IPMI request message list */ struct list_head tx_msg_list; - struct mutex tx_msg_lock; + spinlock_t tx_msg_lock; acpi_handle handle; struct pnp_dev *pnp_dev; ipmi_user_t user_interface; @@ -147,6 +148,7 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, struct kernel_ipmi_msg *msg; struct acpi_ipmi_buffer *buffer; struct acpi_ipmi_device *device; + unsigned long flags; msg = &tx_msg->tx_message; /* @@ -177,10 +179,10 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg, /* Get the msgid */ device = tx_msg->device; - mutex_lock(&device->tx_msg_lock); + spin_lock_irqsave(&device->tx_msg_lock, flags); device->curr_msgid++; tx_msg->tx_msgid = device->curr_msgid; - mutex_unlock(&device->tx_msg_lock); + spin_unlock_irqrestore(&device->tx_msg_lock, flags); } static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg, @@ -242,6 +244,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) int msg_found = 0; struct acpi_ipmi_msg *tx_msg; struct pnp_dev *pnp_dev = ipmi_device->pnp_dev; + unsigned long flags; if (msg->user != ipmi_device->user_interface) { dev_warn(&pnp_dev->dev, "Unexpected response is returned. " @@ -250,7 +253,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) ipmi_free_recv_msg(msg); return; } - mutex_lock(&ipmi_device->tx_msg_lock); + spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags); list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) { if (msg->msgid == tx_msg->tx_msgid) { msg_found = 1; @@ -258,7 +261,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) } } - mutex_unlock(&ipmi_device->tx_msg_lock); + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags); if (!msg_found) { dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is " "returned.\n", msg->msgid); @@ -378,6 +381,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, struct acpi_ipmi_device *ipmi_device = handler_context; int err, rem_time; acpi_status status; + unsigned long flags; /* * IPMI opregion message. * IPMI message is firstly written to the BMC and system software @@ -395,9 +399,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, return AE_NO_MEMORY; acpi_format_ipmi_msg(tx_msg, address, value); - mutex_lock(&ipmi_device->tx_msg_lock); + spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags); list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list); - mutex_unlock(&ipmi_device->tx_msg_lock); + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags); err = ipmi_request_settime(ipmi_device->user_interface, &tx_msg->addr, tx_msg->tx_msgid, @@ -413,9 +417,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address, status = AE_OK; end_label: - mutex_lock(&ipmi_device->tx_msg_lock); + spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags); list_del(&tx_msg->head); - mutex_unlock(&ipmi_device->tx_msg_lock); + spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags); kfree(tx_msg); return status; } @@ -457,7 +461,7 @@ static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device) INIT_LIST_HEAD(&ipmi_device->head); - mutex_init(&ipmi_device->tx_msg_lock); + spin_lock_init(&ipmi_device->tx_msg_lock); INIT_LIST_HEAD(&ipmi_device->tx_msg_list); ipmi_install_space_handler(ipmi_device); diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 59d3202f6b36..a94383d1f350 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -1025,60 +1025,4 @@ void acpi_dev_pm_detach(struct device *dev, bool power_off) } } EXPORT_SYMBOL_GPL(acpi_dev_pm_detach); - -/** - * acpi_dev_pm_add_dependent - Add physical device depending for PM. - * @handle: Handle of ACPI device node. - * @depdev: Device depending on that node for PM. - */ -void acpi_dev_pm_add_dependent(acpi_handle handle, struct device *depdev) -{ - struct acpi_device_physical_node *dep; - struct acpi_device *adev; - - if (!depdev || acpi_bus_get_device(handle, &adev)) - return; - - mutex_lock(&adev->physical_node_lock); - - list_for_each_entry(dep, &adev->power_dependent, node) - if (dep->dev == depdev) - goto out; - - dep = kzalloc(sizeof(*dep), GFP_KERNEL); - if (dep) { - dep->dev = depdev; - list_add_tail(&dep->node, &adev->power_dependent); - } - - out: - mutex_unlock(&adev->physical_node_lock); -} -EXPORT_SYMBOL_GPL(acpi_dev_pm_add_dependent); - -/** - * acpi_dev_pm_remove_dependent - Remove physical device depending for PM. - * @handle: Handle of ACPI device node. - * @depdev: Device depending on that node for PM. - */ -void acpi_dev_pm_remove_dependent(acpi_handle handle, struct device *depdev) -{ - struct acpi_device_physical_node *dep; - struct acpi_device *adev; - - if (!depdev || acpi_bus_get_device(handle, &adev)) - return; - - mutex_lock(&adev->physical_node_lock); - - list_for_each_entry(dep, &adev->power_dependent, node) - if (dep->dev == depdev) { - list_del(&dep->node); - kfree(dep); - break; - } - - mutex_unlock(&adev->physical_node_lock); -} -EXPORT_SYMBOL_GPL(acpi_dev_pm_remove_dependent); #endif /* CONFIG_PM */ diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c index 0dbe5cdf3396..c2ad391d8041 100644 --- a/drivers/acpi/power.c +++ b/drivers/acpi/power.c @@ -59,16 +59,9 @@ ACPI_MODULE_NAME("power"); #define ACPI_POWER_RESOURCE_STATE_ON 0x01 #define ACPI_POWER_RESOURCE_STATE_UNKNOWN 0xFF -struct acpi_power_dependent_device { - struct list_head node; - struct acpi_device *adev; - struct work_struct work; -}; - struct acpi_power_resource { struct acpi_device device; struct list_head list_node; - struct list_head dependent; char *name; u32 system_level; u32 order; @@ -233,32 +226,6 @@ static int acpi_power_get_list_state(struct list_head *list, int *state) return 0; } -static void acpi_power_resume_dependent(struct work_struct *work) -{ - struct acpi_power_dependent_device *dep; - struct acpi_device_physical_node *pn; - struct acpi_device *adev; - int state; - - dep = container_of(work, struct acpi_power_dependent_device, work); - adev = dep->adev; - if (acpi_power_get_inferred_state(adev, &state)) - return; - - if (state > ACPI_STATE_D0) - return; - - mutex_lock(&adev->physical_node_lock); - - list_for_each_entry(pn, &adev->physical_node_list, node) - pm_request_resume(pn->dev); - - list_for_each_entry(pn, &adev->power_dependent, node) - pm_request_resume(pn->dev); - - mutex_unlock(&adev->physical_node_lock); -} - static int __acpi_power_on(struct acpi_power_resource *resource) { acpi_status status = AE_OK; @@ -283,14 +250,8 @@ static int acpi_power_on_unlocked(struct acpi_power_resource *resource) resource->name)); } else { result = __acpi_power_on(resource); - if (result) { + if (result) resource->ref_count--; - } else { - struct acpi_power_dependent_device *dep; - - list_for_each_entry(dep, &resource->dependent, node) - schedule_work(&dep->work); - } } return result; } @@ -390,52 +351,6 @@ static int acpi_power_on_list(struct list_head *list) return result; } -static void acpi_power_add_dependent(struct acpi_power_resource *resource, - struct acpi_device *adev) -{ - struct acpi_power_dependent_device *dep; - - mutex_lock(&resource->resource_lock); - - list_for_each_entry(dep, &resource->dependent, node) - if (dep->adev == adev) - goto out; - - dep = kzalloc(sizeof(*dep), GFP_KERNEL); - if (!dep) - goto out; - - dep->adev = adev; - INIT_WORK(&dep->work, acpi_power_resume_dependent); - list_add_tail(&dep->node, &resource->dependent); - - out: - mutex_unlock(&resource->resource_lock); -} - -static void acpi_power_remove_dependent(struct acpi_power_resource *resource, - struct acpi_device *adev) -{ - struct acpi_power_dependent_device *dep; - struct work_struct *work = NULL; - - mutex_lock(&resource->resource_lock); - - list_for_each_entry(dep, &resource->dependent, node) - if (dep->adev == adev) { - list_del(&dep->node); - work = &dep->work; - break; - } - - mutex_unlock(&resource->resource_lock); - - if (work) { - cancel_work_sync(work); - kfree(dep); - } -} - static struct attribute *attrs[] = { NULL, }; @@ -524,8 +439,6 @@ static void acpi_power_expose_hide(struct acpi_device *adev, void acpi_power_add_remove_device(struct acpi_device *adev, bool add) { - struct acpi_device_power_state *ps; - struct acpi_power_resource_entry *entry; int state; if (adev->wakeup.flags.valid) @@ -535,16 +448,6 @@ void acpi_power_add_remove_device(struct acpi_device *adev, bool add) if (!adev->power.flags.power_resources) return; - ps = &adev->power.states[ACPI_STATE_D0]; - list_for_each_entry(entry, &ps->resources, node) { - struct acpi_power_resource *resource = entry->resource; - - if (add) - acpi_power_add_dependent(resource, adev); - else - acpi_power_remove_dependent(resource, adev); - } - for (state = ACPI_STATE_D0; state <= ACPI_STATE_D3_HOT; state++) acpi_power_expose_hide(adev, &adev->power.states[state].resources, @@ -882,7 +785,6 @@ int acpi_add_power_resource(acpi_handle handle) acpi_init_device_object(device, handle, ACPI_BUS_TYPE_POWER, ACPI_STA_DEFAULT); mutex_init(&resource->resource_lock); - INIT_LIST_HEAD(&resource->dependent); INIT_LIST_HEAD(&resource->list_node); resource->name = device->pnp.bus_id; strcpy(acpi_device_name(device), ACPI_POWER_DEVICE_NAME); @@ -936,8 +838,10 @@ void acpi_resume_power_resources(void) mutex_lock(&resource->resource_lock); result = acpi_power_get_state(resource->device.handle, &state); - if (result) + if (result) { + mutex_unlock(&resource->resource_lock); continue; + } if (state == ACPI_POWER_RESOURCE_STATE_OFF && resource->ref_count) { diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index f98dd00b51a9..c7414a545a4f 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -119,17 +119,10 @@ static struct dmi_system_id processor_power_dmi_table[] = { */ static void acpi_safe_halt(void) { - current_thread_info()->status &= ~TS_POLLING; - /* - * TS_POLLING-cleared state must be visible before we - * test NEED_RESCHED: - */ - smp_mb(); - if (!need_resched()) { + if (!tif_need_resched()) { safe_halt(); local_irq_disable(); } - current_thread_info()->status |= TS_POLLING; } #ifdef ARCH_APICTIMER_STOPS_ON_C3 @@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev, if (unlikely(!pr)) return -EINVAL; + if (cx->entry_method == ACPI_CSTATE_FFH) { + if (current_set_polling_and_test()) + return -EINVAL; + } + lapic_timer_state_broadcast(pr, cx, 1); acpi_idle_do_entry(cx); @@ -790,18 +788,9 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev, if (unlikely(!pr)) return -EINVAL; - if (cx->entry_method != ACPI_CSTATE_FFH) { - current_thread_info()->status &= ~TS_POLLING; - /* - * TS_POLLING-cleared state must be visible before we test - * NEED_RESCHED: - */ - smp_mb(); - - if (unlikely(need_resched())) { - current_thread_info()->status |= TS_POLLING; + if (cx->entry_method == ACPI_CSTATE_FFH) { + if (current_set_polling_and_test()) return -EINVAL; - } } /* @@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev, sched_clock_idle_wakeup_event(0); - if (cx->entry_method != ACPI_CSTATE_FFH) - current_thread_info()->status |= TS_POLLING; - lapic_timer_state_broadcast(pr, cx, 0); return index; } @@ -858,18 +844,9 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, } } - if (cx->entry_method != ACPI_CSTATE_FFH) { - current_thread_info()->status &= ~TS_POLLING; - /* - * TS_POLLING-cleared state must be visible before we test - * NEED_RESCHED: - */ - smp_mb(); - - if (unlikely(need_resched())) { - current_thread_info()->status |= TS_POLLING; + if (cx->entry_method == ACPI_CSTATE_FFH) { + if (current_set_polling_and_test()) return -EINVAL; - } } acpi_unlazy_tlb(smp_processor_id()); @@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, sched_clock_idle_wakeup_event(0); - if (cx->entry_method != ACPI_CSTATE_FFH) - current_thread_info()->status |= TS_POLLING; - lapic_timer_state_broadcast(pr, cx, 0); return index; } diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index fbdb82e70d10..fee8a297c7d9 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -968,7 +968,7 @@ int acpi_bus_get_device(acpi_handle handle, struct acpi_device **device) } return 0; } -EXPORT_SYMBOL_GPL(acpi_bus_get_device); +EXPORT_SYMBOL(acpi_bus_get_device); int acpi_device_add(struct acpi_device *device, void (*release)(struct device *)) @@ -999,7 +999,6 @@ int acpi_device_add(struct acpi_device *device, INIT_LIST_HEAD(&device->wakeup_list); INIT_LIST_HEAD(&device->physical_node_list); mutex_init(&device->physical_node_lock); - INIT_LIST_HEAD(&device->power_dependent); new_bus_id = kzalloc(sizeof(struct acpi_device_bus_id), GFP_KERNEL); if (!new_bus_id) { @@ -1121,7 +1120,7 @@ int acpi_bus_register_driver(struct acpi_driver *driver) EXPORT_SYMBOL(acpi_bus_register_driver); /** - * acpi_bus_unregister_driver - unregisters a driver with the APIC bus + * acpi_bus_unregister_driver - unregisters a driver with the ACPI bus * @driver: driver to unregister * * Unregisters a driver with the ACPI bus. Searches the namespace for all diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 4ba8b0405572..ab714d2ad978 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -1035,17 +1035,3 @@ void ata_acpi_on_disable(struct ata_device *dev) { ata_acpi_clear_gtf(dev); } - -void ata_scsi_acpi_bind(struct ata_device *dev) -{ - acpi_handle handle = ata_dev_acpi_handle(dev); - if (handle) - acpi_dev_pm_add_dependent(handle, &dev->sdev->sdev_gendev); -} - -void ata_scsi_acpi_unbind(struct ata_device *dev) -{ - acpi_handle handle = ata_dev_acpi_handle(dev); - if (handle) - acpi_dev_pm_remove_dependent(handle, &dev->sdev->sdev_gendev); -} diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 97a0cef12959..db6dfcfa3e2e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3679,7 +3679,6 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync) if (!IS_ERR(sdev)) { dev->sdev = sdev; scsi_device_put(sdev); - ata_scsi_acpi_bind(dev); } else { dev->sdev = NULL; } @@ -3767,8 +3766,6 @@ static void ata_scsi_remove_dev(struct ata_device *dev) struct scsi_device *sdev; unsigned long flags; - ata_scsi_acpi_unbind(dev); - /* Alas, we need to grab scan_mutex to ensure SCSI device * state doesn't change underneath us and thus * scsi_device_get() always succeeds. The mutex locking can diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h index eeeb77845d48..45b5ab3a95d5 100644 --- a/drivers/ata/libata.h +++ b/drivers/ata/libata.h @@ -121,8 +121,6 @@ extern void ata_acpi_set_state(struct ata_port *ap, pm_message_t state); extern void ata_acpi_bind_port(struct ata_port *ap); extern void ata_acpi_bind_dev(struct ata_device *dev); extern acpi_handle ata_dev_acpi_handle(struct ata_device *dev); -extern void ata_scsi_acpi_bind(struct ata_device *dev); -extern void ata_scsi_acpi_unbind(struct ata_device *dev); #else static inline void ata_acpi_dissociate(struct ata_host *host) { } static inline int ata_acpi_on_suspend(struct ata_port *ap) { return 0; } @@ -133,8 +131,6 @@ static inline void ata_acpi_set_state(struct ata_port *ap, pm_message_t state) { } static inline void ata_acpi_bind_port(struct ata_port *ap) {} static inline void ata_acpi_bind_dev(struct ata_device *dev) {} -static inline void ata_scsi_acpi_bind(struct ata_device *dev) {} -static inline void ata_scsi_acpi_unbind(struct ata_device *dev) {} #endif /* libata-scsi.c */ diff --git a/drivers/base/core.c b/drivers/base/core.c index c7cfadcf6752..34abf4d8a45f 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2017,7 +2017,7 @@ EXPORT_SYMBOL_GPL(device_move); */ void device_shutdown(void) { - struct device *dev; + struct device *dev, *parent; spin_lock(&devices_kset->list_lock); /* @@ -2034,7 +2034,7 @@ void device_shutdown(void) * prevent it from being freed because parent's * lock is to be held */ - get_device(dev->parent); + parent = get_device(dev->parent); get_device(dev); /* * Make sure the device is off the kset list, in the @@ -2044,8 +2044,8 @@ void device_shutdown(void) spin_unlock(&devices_kset->list_lock); /* hold lock to avoid race with probe/release */ - if (dev->parent) - device_lock(dev->parent); + if (parent) + device_lock(parent); device_lock(dev); /* Don't allow any more runtime suspends */ @@ -2063,11 +2063,11 @@ void device_shutdown(void) } device_unlock(dev); - if (dev->parent) - device_unlock(dev->parent); + if (parent) + device_unlock(parent); put_device(dev); - put_device(dev->parent); + put_device(parent); spin_lock(&devices_kset->list_lock); } diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 9e59f6535c44..bece691cb5d9 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -333,8 +333,10 @@ store_mem_state(struct device *dev, online_type = ONLINE_KEEP; else if (!strncmp(buf, "offline", min_t(int, count, 7))) online_type = -1; - else - return -EINVAL; + else { + ret = -EINVAL; + goto err; + } switch (online_type) { case ONLINE_KERNEL: @@ -357,6 +359,7 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } +err: unlock_device_hotplug(); if (ret) diff --git a/drivers/bcma/driver_pci.c b/drivers/bcma/driver_pci.c index c9fd6943ce45..50329d1057ed 100644 --- a/drivers/bcma/driver_pci.c +++ b/drivers/bcma/driver_pci.c @@ -210,25 +210,6 @@ static void bcma_core_pci_config_fixup(struct bcma_drv_pci *pc) } } -static void bcma_core_pci_power_save(struct bcma_drv_pci *pc, bool up) -{ - u16 data; - - if (pc->core->id.rev >= 15 && pc->core->id.rev <= 20) { - data = up ? 0x74 : 0x7C; - bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, - BCMA_CORE_PCI_MDIO_BLK1_MGMT1, 0x7F64); - bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, - BCMA_CORE_PCI_MDIO_BLK1_MGMT3, data); - } else if (pc->core->id.rev >= 21 && pc->core->id.rev <= 22) { - data = up ? 0x75 : 0x7D; - bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, - BCMA_CORE_PCI_MDIO_BLK1_MGMT1, 0x7E65); - bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, - BCMA_CORE_PCI_MDIO_BLK1_MGMT3, data); - } -} - /************************************************** * Init. **************************************************/ @@ -255,6 +236,32 @@ void bcma_core_pci_init(struct bcma_drv_pci *pc) bcma_core_pci_clientmode_init(pc); } +void bcma_core_pci_power_save(struct bcma_bus *bus, bool up) +{ + struct bcma_drv_pci *pc; + u16 data; + + if (bus->hosttype != BCMA_HOSTTYPE_PCI) + return; + + pc = &bus->drv_pci[0]; + + if (pc->core->id.rev >= 15 && pc->core->id.rev <= 20) { + data = up ? 0x74 : 0x7C; + bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, + BCMA_CORE_PCI_MDIO_BLK1_MGMT1, 0x7F64); + bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, + BCMA_CORE_PCI_MDIO_BLK1_MGMT3, data); + } else if (pc->core->id.rev >= 21 && pc->core->id.rev <= 22) { + data = up ? 0x75 : 0x7D; + bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, + BCMA_CORE_PCI_MDIO_BLK1_MGMT1, 0x7E65); + bcma_pcie_mdio_writeread(pc, BCMA_CORE_PCI_MDIO_BLK1, + BCMA_CORE_PCI_MDIO_BLK1_MGMT3, data); + } +} +EXPORT_SYMBOL_GPL(bcma_core_pci_power_save); + int bcma_core_pci_irq_ctl(struct bcma_drv_pci *pc, struct bcma_device *core, bool enable) { @@ -310,8 +317,6 @@ void bcma_core_pci_up(struct bcma_bus *bus) pc = &bus->drv_pci[0]; - bcma_core_pci_power_save(pc, true); - bcma_core_pci_extend_L1timer(pc, true); } EXPORT_SYMBOL_GPL(bcma_core_pci_up); @@ -326,7 +331,5 @@ void bcma_core_pci_down(struct bcma_bus *bus) pc = &bus->drv_pci[0]; bcma_core_pci_extend_L1timer(pc, false); - - bcma_core_pci_power_save(pc, false); } EXPORT_SYMBOL_GPL(bcma_core_pci_down); diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c index a12b923bbaca..0a327f4154a2 100644 --- a/drivers/bluetooth/ath3k.c +++ b/drivers/bluetooth/ath3k.c @@ -85,6 +85,7 @@ static struct usb_device_id ath3k_table[] = { { USB_DEVICE(0x04CA, 0x3008) }, { USB_DEVICE(0x13d3, 0x3362) }, { USB_DEVICE(0x0CF3, 0xE004) }, + { USB_DEVICE(0x0CF3, 0xE005) }, { USB_DEVICE(0x0930, 0x0219) }, { USB_DEVICE(0x0489, 0xe057) }, { USB_DEVICE(0x13d3, 0x3393) }, @@ -126,6 +127,7 @@ static struct usb_device_id ath3k_blist_tbl[] = { { USB_DEVICE(0x04ca, 0x3008), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3362), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 }, + { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe057), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3393), .driver_info = BTUSB_ATH3012 }, diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 8e16f0af6358..f3dfc0a88fdc 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -102,6 +102,7 @@ static struct usb_device_id btusb_table[] = { /* Broadcom BCM20702A0 */ { USB_DEVICE(0x0b05, 0x17b5) }, + { USB_DEVICE(0x0b05, 0x17cb) }, { USB_DEVICE(0x04ca, 0x2003) }, { USB_DEVICE(0x0489, 0xe042) }, { USB_DEVICE(0x413c, 0x8197) }, @@ -112,6 +113,9 @@ static struct usb_device_id btusb_table[] = { /*Broadcom devices with vendor specific id */ { USB_VENDOR_AND_INTERFACE_INFO(0x0a5c, 0xff, 0x01, 0x01) }, + /* Belkin F8065bf - Broadcom based */ + { USB_VENDOR_AND_INTERFACE_INFO(0x050d, 0xff, 0x01, 0x01) }, + { } /* Terminating entry */ }; @@ -148,6 +152,7 @@ static struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x04ca, 0x3008), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3362), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 }, + { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe057), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3393), .driver_info = BTUSB_ATH3012 }, diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c index 19ab6ff53d59..2394e9753ef5 100644 --- a/drivers/bus/mvebu-mbus.c +++ b/drivers/bus/mvebu-mbus.c @@ -700,6 +700,7 @@ static int __init mvebu_mbus_common_init(struct mvebu_mbus_state *mbus, phys_addr_t sdramwins_phys_base, size_t sdramwins_size) { + struct device_node *np; int win; mbus->mbuswins_base = ioremap(mbuswins_phys_base, mbuswins_size); @@ -712,8 +713,11 @@ static int __init mvebu_mbus_common_init(struct mvebu_mbus_state *mbus, return -ENOMEM; } - if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) + np = of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"); + if (np) { mbus->hw_io_coherency = 1; + of_node_put(np); + } for (win = 0; win < mbus->soc->num_wins; win++) mvebu_mbus_disable_window(mbus, win); @@ -861,11 +865,13 @@ static void __init mvebu_mbus_get_pcie_resources(struct device_node *np, int ret; /* - * These are optional, so we clear them and they'll - * be zero if they are missing from the DT. + * These are optional, so we make sure that resource_size(x) will + * return 0. */ memset(mem, 0, sizeof(struct resource)); + mem->end = -1; memset(io, 0, sizeof(struct resource)); + io->end = -1; ret = of_property_read_u32_array(np, "pcie-mem-aperture", reg, ARRAY_SIZE(reg)); if (!ret) { diff --git a/drivers/char/random.c b/drivers/char/random.c index 7737b5bd26af..7a744d391756 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -640,7 +640,7 @@ struct timer_rand_state { */ void add_device_randomness(const void *buf, unsigned int size) { - unsigned long time = get_cycles() ^ jiffies; + unsigned long time = random_get_entropy() ^ jiffies; mix_pool_bytes(&input_pool, buf, size, NULL); mix_pool_bytes(&input_pool, &time, sizeof(time), NULL); @@ -677,7 +677,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num) goto out; sample.jiffies = jiffies; - sample.cycles = get_cycles(); + sample.cycles = random_get_entropy(); sample.num = num; mix_pool_bytes(&input_pool, &sample, sizeof(sample), NULL); @@ -744,7 +744,7 @@ void add_interrupt_randomness(int irq, int irq_flags) struct fast_pool *fast_pool = &__get_cpu_var(irq_randomness); struct pt_regs *regs = get_irq_regs(); unsigned long now = jiffies; - __u32 input[4], cycles = get_cycles(); + __u32 input[4], cycles = random_get_entropy(); input[0] = cycles ^ jiffies; input[1] = irq; @@ -1459,12 +1459,11 @@ struct ctl_table random_table[] = { static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned; -static int __init random_int_secret_init(void) +int random_int_secret_init(void) { get_random_bytes(random_int_secret, sizeof(random_int_secret)); return 0; } -late_initcall(random_int_secret_init); /* * Get a random word for internal kernel use only. Similar to urandom but @@ -1483,7 +1482,7 @@ unsigned int get_random_int(void) hash = get_cpu_var(get_random_int_hash); - hash[0] += current->pid + jiffies + get_cycles(); + hash[0] += current->pid + jiffies + random_get_entropy(); md5_transform(hash, random_int_secret); ret = hash[0]; put_cpu_var(get_random_int_hash); diff --git a/drivers/char/tpm/xen-tpmfront.c b/drivers/char/tpm/xen-tpmfront.c index 7a7929ba2658..94c280d36e8b 100644 --- a/drivers/char/tpm/xen-tpmfront.c +++ b/drivers/char/tpm/xen-tpmfront.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -142,32 +143,6 @@ static int vtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count) return length; } -ssize_t tpm_show_locality(struct device *dev, struct device_attribute *attr, - char *buf) -{ - struct tpm_chip *chip = dev_get_drvdata(dev); - struct tpm_private *priv = TPM_VPRIV(chip); - u8 locality = priv->shr->locality; - - return sprintf(buf, "%d\n", locality); -} - -ssize_t tpm_store_locality(struct device *dev, struct device_attribute *attr, - const char *buf, size_t len) -{ - struct tpm_chip *chip = dev_get_drvdata(dev); - struct tpm_private *priv = TPM_VPRIV(chip); - u8 val; - - int rv = kstrtou8(buf, 0, &val); - if (rv) - return rv; - - priv->shr->locality = val; - - return len; -} - static const struct file_operations vtpm_ops = { .owner = THIS_MODULE, .llseek = no_llseek, @@ -188,8 +163,6 @@ static DEVICE_ATTR(caps, S_IRUGO, tpm_show_caps, NULL); static DEVICE_ATTR(cancel, S_IWUSR | S_IWGRP, NULL, tpm_store_cancel); static DEVICE_ATTR(durations, S_IRUGO, tpm_show_durations, NULL); static DEVICE_ATTR(timeouts, S_IRUGO, tpm_show_timeouts, NULL); -static DEVICE_ATTR(locality, S_IRUGO | S_IWUSR, tpm_show_locality, - tpm_store_locality); static struct attribute *vtpm_attrs[] = { &dev_attr_pubek.attr, @@ -202,7 +175,6 @@ static struct attribute *vtpm_attrs[] = { &dev_attr_cancel.attr, &dev_attr_durations.attr, &dev_attr_timeouts.attr, - &dev_attr_locality.attr, NULL, }; @@ -210,8 +182,6 @@ static struct attribute_group vtpm_attr_grp = { .attrs = vtpm_attrs, }; -#define TPM_LONG_TIMEOUT (10 * 60 * HZ) - static const struct tpm_vendor_specific tpm_vtpm = { .status = vtpm_status, .recv = vtpm_recv, @@ -224,11 +194,6 @@ static const struct tpm_vendor_specific tpm_vtpm = { .miscdev = { .fops = &vtpm_ops, }, - .duration = { - TPM_LONG_TIMEOUT, - TPM_LONG_TIMEOUT, - TPM_LONG_TIMEOUT, - }, }; static irqreturn_t tpmif_interrupt(int dummy, void *dev_id) diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig index 41c69469ce20..5e940f839a2d 100644 --- a/drivers/clocksource/Kconfig +++ b/drivers/clocksource/Kconfig @@ -26,6 +26,7 @@ config DW_APB_TIMER_OF config ARMADA_370_XP_TIMER bool + select CLKSRC_OF config ORION_TIMER select CLKSRC_OF @@ -74,6 +75,21 @@ config ARM_ARCH_TIMER bool select CLKSRC_OF if OF +config ARM_ARCH_TIMER_EVTSTREAM + bool "Support for ARM architected timer event stream generation" + default y if ARM_ARCH_TIMER + help + This option enables support for event stream generation based on + the ARM architected timer. It is used for waking up CPUs executing + the wfe instruction at a frequency represented as a power-of-2 + divisor of the clock rate. + The main use of the event stream is wfe-based timeouts of userspace + locking implementations. It might also be useful for imposing timeout + on wfe to safeguard against any programming errors in case an expected + event is not generated. + This must be disabled for hardware validation purposes to detect any + hardware anomalies of missing events. + config ARM_GLOBAL_TIMER bool select CLKSRC_OF if OF diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index fbd9ccd5e114..95fb944e15ee 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -13,12 +13,14 @@ #include #include #include +#include #include #include #include #include #include #include +#include #include #include @@ -294,6 +296,19 @@ static void __arch_timer_setup(unsigned type, clockevents_config_and_register(clk, arch_timer_rate, 0xf, 0x7fffffff); } +static void arch_timer_configure_evtstream(void) +{ + int evt_stream_div, pos; + + /* Find the closest power of two to the divisor */ + evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; + pos = fls(evt_stream_div); + if (pos > 1 && !(evt_stream_div & (1 << (pos - 2)))) + pos--; + /* enable event stream */ + arch_timer_evtstrm_enable(min(pos, 15)); +} + static int arch_timer_setup(struct clock_event_device *clk) { __arch_timer_setup(ARCH_CP15_TIMER, clk); @@ -307,6 +322,8 @@ static int arch_timer_setup(struct clock_event_device *clk) } arch_counter_set_user_access(); + if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM)) + arch_timer_configure_evtstream(); return 0; } @@ -389,7 +406,7 @@ static struct clocksource clocksource_counter = { .rating = 400, .read = arch_counter_read, .mask = CLOCKSOURCE_MASK(56), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP, }; static struct cyclecounter cyclecounter = { @@ -419,6 +436,9 @@ static void __init arch_counter_register(unsigned type) cyclecounter.mult = clocksource_counter.mult; cyclecounter.shift = clocksource_counter.shift; timecounter_init(&timecounter, &cyclecounter, start_count); + + /* 56 bits minimum, so we assume worst case rollover */ + sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate); } static void arch_timer_stop(struct clock_event_device *clk) @@ -460,6 +480,33 @@ static struct notifier_block arch_timer_cpu_nb = { .notifier_call = arch_timer_cpu_notify, }; +#ifdef CONFIG_CPU_PM +static unsigned int saved_cntkctl; +static int arch_timer_cpu_pm_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + if (action == CPU_PM_ENTER) + saved_cntkctl = arch_timer_get_cntkctl(); + else if (action == CPU_PM_ENTER_FAILED || action == CPU_PM_EXIT) + arch_timer_set_cntkctl(saved_cntkctl); + return NOTIFY_OK; +} + +static struct notifier_block arch_timer_cpu_pm_notifier = { + .notifier_call = arch_timer_cpu_pm_notify, +}; + +static int __init arch_timer_cpu_pm_init(void) +{ + return cpu_pm_register_notifier(&arch_timer_cpu_pm_notifier); +} +#else +static int __init arch_timer_cpu_pm_init(void) +{ + return 0; +} +#endif + static int __init arch_timer_register(void) { int err; @@ -499,11 +546,17 @@ static int __init arch_timer_register(void) if (err) goto out_free_irq; + err = arch_timer_cpu_pm_init(); + if (err) + goto out_unreg_notify; + /* Immediately configure the timer on the boot CPU */ arch_timer_setup(this_cpu_ptr(arch_timer_evt)); return 0; +out_unreg_notify: + unregister_cpu_notifier(&arch_timer_cpu_nb); out_free_irq: if (arch_timer_use_virtual) free_percpu_irq(arch_timer_ppi[VIRT_PPI], arch_timer_evt); diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c index b66c1f36066c..c639b1a9e996 100644 --- a/drivers/clocksource/arm_global_timer.c +++ b/drivers/clocksource/arm_global_timer.c @@ -169,7 +169,8 @@ static int gt_clockevents_init(struct clock_event_device *clk) int cpu = smp_processor_id(); clk->name = "arm_global_timer"; - clk->features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; + clk->features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT | + CLOCK_EVT_FEAT_PERCPU; clk->set_mode = gt_clockevent_set_mode; clk->set_next_event = gt_clockevent_set_next_event; clk->cpumask = cpumask_of(cpu); diff --git a/drivers/clocksource/bcm2835_timer.c b/drivers/clocksource/bcm2835_timer.c index 07ea7ce900dc..26ed331b1aad 100644 --- a/drivers/clocksource/bcm2835_timer.c +++ b/drivers/clocksource/bcm2835_timer.c @@ -49,7 +49,7 @@ struct bcm2835_timer { static void __iomem *system_clock __read_mostly; -static u32 notrace bcm2835_sched_read(void) +static u64 notrace bcm2835_sched_read(void) { return readl_relaxed(system_clock); } @@ -110,7 +110,7 @@ static void __init bcm2835_timer_init(struct device_node *node) panic("Can't read clock-frequency"); system_clock = base + REG_COUNTER_LO; - setup_sched_clock(bcm2835_sched_read, 32, freq); + sched_clock_register(bcm2835_sched_read, 32, freq); clocksource_mmio_init(base + REG_COUNTER_LO, node->name, freq, 300, 32, clocksource_mmio_readl_up); diff --git a/drivers/clocksource/clksrc-dbx500-prcmu.c b/drivers/clocksource/clksrc-dbx500-prcmu.c index a9fd4ad25674..b375106844d8 100644 --- a/drivers/clocksource/clksrc-dbx500-prcmu.c +++ b/drivers/clocksource/clksrc-dbx500-prcmu.c @@ -53,7 +53,7 @@ static struct clocksource clocksource_dbx500_prcmu = { #ifdef CONFIG_CLKSRC_DBX500_PRCMU_SCHED_CLOCK -static u32 notrace dbx500_prcmu_sched_clock_read(void) +static u64 notrace dbx500_prcmu_sched_clock_read(void) { if (unlikely(!clksrc_dbx500_timer_base)) return 0; @@ -81,8 +81,7 @@ void __init clksrc_dbx500_prcmu_init(void __iomem *base) clksrc_dbx500_timer_base + PRCMU_TIMER_REF); } #ifdef CONFIG_CLKSRC_DBX500_PRCMU_SCHED_CLOCK - setup_sched_clock(dbx500_prcmu_sched_clock_read, - 32, RATE_32K); + sched_clock_register(dbx500_prcmu_sched_clock_read, 32, RATE_32K); #endif clocksource_register_hz(&clocksource_dbx500_prcmu, RATE_32K); } diff --git a/drivers/clocksource/clksrc-of.c b/drivers/clocksource/clksrc-of.c index 37f5325bec95..35639cf4e5a2 100644 --- a/drivers/clocksource/clksrc-of.c +++ b/drivers/clocksource/clksrc-of.c @@ -30,7 +30,11 @@ void __init clocksource_of_init(void) clocksource_of_init_fn init_func; for_each_matching_node_and_match(np, __clksrc_of_table, &match) { + if (!of_device_is_available(np)) + continue; + init_func = match->data; init_func(np); + of_node_put(np); } } diff --git a/drivers/clocksource/dw_apb_timer_of.c b/drivers/clocksource/dw_apb_timer_of.c index 4cbae4f762b1..45ba8aecc729 100644 --- a/drivers/clocksource/dw_apb_timer_of.c +++ b/drivers/clocksource/dw_apb_timer_of.c @@ -23,7 +23,7 @@ #include #include -static void timer_get_base_and_rate(struct device_node *np, +static void __init timer_get_base_and_rate(struct device_node *np, void __iomem **base, u32 *rate) { struct clk *timer_clk; @@ -55,11 +55,11 @@ static void timer_get_base_and_rate(struct device_node *np, try_clock_freq: if (of_property_read_u32(np, "clock-freq", rate) && - of_property_read_u32(np, "clock-frequency", rate)) + of_property_read_u32(np, "clock-frequency", rate)) panic("No clock nor clock-frequency property for %s", np->name); } -static void add_clockevent(struct device_node *event_timer) +static void __init add_clockevent(struct device_node *event_timer) { void __iomem *iobase; struct dw_apb_clock_event_device *ced; @@ -82,7 +82,7 @@ static void add_clockevent(struct device_node *event_timer) static void __iomem *sched_io_base; static u32 sched_rate; -static void add_clocksource(struct device_node *source_timer) +static void __init add_clocksource(struct device_node *source_timer) { void __iomem *iobase; struct dw_apb_clocksource *cs; @@ -106,7 +106,7 @@ static void add_clocksource(struct device_node *source_timer) sched_rate = rate; } -static u32 read_sched_clock(void) +static u64 read_sched_clock(void) { return __raw_readl(sched_io_base); } @@ -117,7 +117,7 @@ static const struct of_device_id sptimer_ids[] __initconst = { { /* Sentinel */ }, }; -static void init_sched_clock(void) +static void __init init_sched_clock(void) { struct device_node *sched_timer; @@ -128,7 +128,7 @@ static void init_sched_clock(void) of_node_put(sched_timer); } - setup_sched_clock(read_sched_clock, 32, sched_rate); + sched_clock_register(read_sched_clock, 32, sched_rate); } static int num_called; @@ -138,12 +138,10 @@ static void __init dw_apb_timer_init(struct device_node *timer) case 0: pr_debug("%s: found clockevent timer\n", __func__); add_clockevent(timer); - of_node_put(timer); break; case 1: pr_debug("%s: found clocksource timer\n", __func__); add_clocksource(timer); - of_node_put(timer); init_sched_clock(); break; default: diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c index b9c81b7c3a3b..3a5909c12d42 100644 --- a/drivers/clocksource/em_sti.c +++ b/drivers/clocksource/em_sti.c @@ -301,7 +301,7 @@ static void em_sti_register_clockevent(struct em_sti_priv *p) ced->name = dev_name(&p->pdev->dev); ced->features = CLOCK_EVT_FEAT_ONESHOT; ced->rating = 200; - ced->cpumask = cpumask_of(0); + ced->cpumask = cpu_possible_mask; ced->set_next_event = em_sti_clock_event_next; ced->set_mode = em_sti_clock_event_mode; diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c index 5b34768f4d7c..62b0de6a1837 100644 --- a/drivers/clocksource/exynos_mct.c +++ b/drivers/clocksource/exynos_mct.c @@ -428,7 +428,6 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt) evt->irq); return -EIO; } - irq_set_affinity(evt->irq, cpumask_of(cpu)); } else { enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0); } @@ -449,6 +448,7 @@ static int exynos4_mct_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { struct mct_clock_event_device *mevt; + unsigned int cpu; /* * Grab cpu pointer in each case to avoid spurious @@ -459,6 +459,12 @@ static int exynos4_mct_cpu_notify(struct notifier_block *self, mevt = this_cpu_ptr(&percpu_mct_tick); exynos4_local_timer_setup(&mevt->evt); break; + case CPU_ONLINE: + cpu = (unsigned long)hcpu; + if (mct_int_type == MCT_INT_SPI) + irq_set_affinity(mct_irqs[MCT_L0_IRQ + cpu], + cpumask_of(cpu)); + break; case CPU_DYING: mevt = this_cpu_ptr(&percpu_mct_tick); exynos4_local_timer_stop(&mevt->evt); @@ -500,6 +506,8 @@ static void __init exynos4_timer_resources(struct device_node *np, void __iomem &percpu_mct_tick); WARN(err, "MCT: can't request IRQ %d (%d)\n", mct_irqs[MCT_L0_IRQ], err); + } else { + irq_set_affinity(mct_irqs[MCT_L0_IRQ], cpumask_of(0)); } err = register_cpu_notifier(&exynos4_mct_cpu_nb); diff --git a/drivers/clocksource/mxs_timer.c b/drivers/clocksource/mxs_timer.c index 0f5e65f74dc3..445b68a01dc5 100644 --- a/drivers/clocksource/mxs_timer.c +++ b/drivers/clocksource/mxs_timer.c @@ -222,7 +222,7 @@ static struct clocksource clocksource_mxs = { .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; -static u32 notrace mxs_read_sched_clock_v2(void) +static u64 notrace mxs_read_sched_clock_v2(void) { return ~readl_relaxed(mxs_timrot_base + HW_TIMROT_RUNNING_COUNTn(1)); } @@ -236,7 +236,7 @@ static int __init mxs_clocksource_init(struct clk *timer_clk) else { clocksource_mmio_init(mxs_timrot_base + HW_TIMROT_RUNNING_COUNTn(1), "mxs_timer", c, 200, 32, clocksource_mmio_readl_down); - setup_sched_clock(mxs_read_sched_clock_v2, 32, c); + sched_clock_register(mxs_read_sched_clock_v2, 32, c); } return 0; diff --git a/drivers/clocksource/nomadik-mtu.c b/drivers/clocksource/nomadik-mtu.c index 1b74bea12385..ed7b73b508e0 100644 --- a/drivers/clocksource/nomadik-mtu.c +++ b/drivers/clocksource/nomadik-mtu.c @@ -76,7 +76,7 @@ static struct delay_timer mtu_delay_timer; * local implementation which uses the clocksource to get some * better resolution when scheduling the kernel. */ -static u32 notrace nomadik_read_sched_clock(void) +static u64 notrace nomadik_read_sched_clock(void) { if (unlikely(!mtu_base)) return 0; @@ -231,7 +231,7 @@ static void __init __nmdk_timer_init(void __iomem *base, int irq, "mtu_0"); #ifdef CONFIG_CLKSRC_NOMADIK_MTU_SCHED_CLOCK - setup_sched_clock(nomadik_read_sched_clock, 32, rate); + sched_clock_register(nomadik_read_sched_clock, 32, rate); #endif /* Timer 1 is used for events, register irq and clockevents */ diff --git a/drivers/clocksource/samsung_pwm_timer.c b/drivers/clocksource/samsung_pwm_timer.c index ab29476ee5f9..85082e8d3052 100644 --- a/drivers/clocksource/samsung_pwm_timer.c +++ b/drivers/clocksource/samsung_pwm_timer.c @@ -331,7 +331,7 @@ static struct clocksource samsung_clocksource = { * this wraps around for now, since it is just a relative time * stamp. (Inspired by U300 implementation.) */ -static u32 notrace samsung_read_sched_clock(void) +static u64 notrace samsung_read_sched_clock(void) { return samsung_clocksource_read(NULL); } @@ -357,7 +357,7 @@ static void __init samsung_clocksource_init(void) else pwm.source_reg = pwm.base + pwm.source_id * 0x0c + 0x14; - setup_sched_clock(samsung_read_sched_clock, + sched_clock_register(samsung_read_sched_clock, pwm.variant.bits, clock_rate); samsung_clocksource.mask = CLOCKSOURCE_MASK(pwm.variant.bits); diff --git a/drivers/clocksource/tcb_clksrc.c b/drivers/clocksource/tcb_clksrc.c index 8a6187225dd0..00fdd1170284 100644 --- a/drivers/clocksource/tcb_clksrc.c +++ b/drivers/clocksource/tcb_clksrc.c @@ -100,7 +100,7 @@ static void tc_mode(enum clock_event_mode m, struct clock_event_device *d) || tcd->clkevt.mode == CLOCK_EVT_MODE_ONESHOT) { __raw_writel(0xff, regs + ATMEL_TC_REG(2, IDR)); __raw_writel(ATMEL_TC_CLKDIS, regs + ATMEL_TC_REG(2, CCR)); - clk_disable(tcd->clk); + clk_disable_unprepare(tcd->clk); } switch (m) { @@ -109,7 +109,7 @@ static void tc_mode(enum clock_event_mode m, struct clock_event_device *d) * of oneshot, we get lower overhead and improved accuracy. */ case CLOCK_EVT_MODE_PERIODIC: - clk_enable(tcd->clk); + clk_prepare_enable(tcd->clk); /* slow clock, count up to RC, then irq and restart */ __raw_writel(timer_clock @@ -126,7 +126,7 @@ static void tc_mode(enum clock_event_mode m, struct clock_event_device *d) break; case CLOCK_EVT_MODE_ONESHOT: - clk_enable(tcd->clk); + clk_prepare_enable(tcd->clk); /* slow clock, count up to RC, then irq and stop */ __raw_writel(timer_clock | ATMEL_TC_CPCSTOP @@ -180,15 +180,22 @@ static irqreturn_t ch2_irq(int irq, void *handle) static struct irqaction tc_irqaction = { .name = "tc_clkevt", - .flags = IRQF_TIMER | IRQF_DISABLED, + .flags = IRQF_TIMER, .handler = ch2_irq, }; -static void __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) +static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) { + int ret; struct clk *t2_clk = tc->clk[2]; int irq = tc->irq[2]; + /* try to enable t2 clk to avoid future errors in mode change */ + ret = clk_prepare_enable(t2_clk); + if (ret) + return ret; + clk_disable_unprepare(t2_clk); + clkevt.regs = tc->regs; clkevt.clk = t2_clk; tc_irqaction.dev_id = &clkevt; @@ -197,16 +204,21 @@ static void __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) clkevt.clkevt.cpumask = cpumask_of(0); + ret = setup_irq(irq, &tc_irqaction); + if (ret) + return ret; + clockevents_config_and_register(&clkevt.clkevt, 32768, 1, 0xffff); - setup_irq(irq, &tc_irqaction); + return ret; } #else /* !CONFIG_GENERIC_CLOCKEVENTS */ -static void __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) +static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) { /* NOTHING */ + return 0; } #endif @@ -265,6 +277,7 @@ static int __init tcb_clksrc_init(void) int best_divisor_idx = -1; int clk32k_divisor_idx = -1; int i; + int ret; tc = atmel_tc_alloc(CONFIG_ATMEL_TCB_CLKSRC_BLOCK, clksrc.name); if (!tc) { @@ -275,7 +288,11 @@ static int __init tcb_clksrc_init(void) pdev = tc->pdev; t0_clk = tc->clk[0]; - clk_enable(t0_clk); + ret = clk_prepare_enable(t0_clk); + if (ret) { + pr_debug("can't enable T0 clk\n"); + goto err_free_tc; + } /* How fast will we be counting? Pick something over 5 MHz. */ rate = (u32) clk_get_rate(t0_clk); @@ -313,17 +330,39 @@ static int __init tcb_clksrc_init(void) /* tclib will give us three clocks no matter what the * underlying platform supports. */ - clk_enable(tc->clk[1]); + ret = clk_prepare_enable(tc->clk[1]); + if (ret) { + pr_debug("can't enable T1 clk\n"); + goto err_disable_t0; + } /* setup both channel 0 & 1 */ tcb_setup_dual_chan(tc, best_divisor_idx); } /* and away we go! */ - clocksource_register_hz(&clksrc, divided_rate); + ret = clocksource_register_hz(&clksrc, divided_rate); + if (ret) + goto err_disable_t1; /* channel 2: periodic and oneshot timer support */ - setup_clkevents(tc, clk32k_divisor_idx); + ret = setup_clkevents(tc, clk32k_divisor_idx); + if (ret) + goto err_unregister_clksrc; return 0; + +err_unregister_clksrc: + clocksource_unregister(&clksrc); + +err_disable_t1: + if (!tc->tcb_config || tc->tcb_config->counter_width != 32) + clk_disable_unprepare(tc->clk[1]); + +err_disable_t0: + clk_disable_unprepare(t0_clk); + +err_free_tc: + atmel_tc_free(tc); + return ret; } arch_initcall(tcb_clksrc_init); diff --git a/drivers/clocksource/tegra20_timer.c b/drivers/clocksource/tegra20_timer.c index 93961703b887..642849256d82 100644 --- a/drivers/clocksource/tegra20_timer.c +++ b/drivers/clocksource/tegra20_timer.c @@ -98,7 +98,7 @@ static struct clock_event_device tegra_clockevent = { .set_mode = tegra_timer_set_mode, }; -static u32 notrace tegra_read_sched_clock(void) +static u64 notrace tegra_read_sched_clock(void) { return timer_readl(TIMERUS_CNTR_1US); } @@ -181,8 +181,6 @@ static void __init tegra20_init_timer(struct device_node *np) rate = clk_get_rate(clk); } - of_node_put(np); - switch (rate) { case 12000000: timer_writel(0x000b, TIMERUS_USEC_CFG); @@ -200,7 +198,7 @@ static void __init tegra20_init_timer(struct device_node *np) WARN(1, "Unknown clock rate"); } - setup_sched_clock(tegra_read_sched_clock, 32, 1000000); + sched_clock_register(tegra_read_sched_clock, 32, 1000000); if (clocksource_mmio_init(timer_reg_base + TIMERUS_CNTR_1US, "timer_us", 1000000, 300, 32, clocksource_mmio_readl_up)) { @@ -241,8 +239,6 @@ static void __init tegra20_init_rtc(struct device_node *np) else clk_prepare_enable(clk); - of_node_put(np); - register_persistent_clock(NULL, tegra_read_persistent_clock); } CLOCKSOURCE_OF_DECLARE(tegra20_rtc, "nvidia,tegra20-rtc", tegra20_init_rtc); diff --git a/drivers/clocksource/time-armada-370-xp.c b/drivers/clocksource/time-armada-370-xp.c index 0198504ef6b0..d8e47e502785 100644 --- a/drivers/clocksource/time-armada-370-xp.c +++ b/drivers/clocksource/time-armada-370-xp.c @@ -96,7 +96,7 @@ static void local_timer_ctrl_clrset(u32 clr, u32 set) local_base + TIMER_CTRL_OFF); } -static u32 notrace armada_370_xp_read_sched_clock(void) +static u64 notrace armada_370_xp_read_sched_clock(void) { return ~readl(timer_base + TIMER0_VAL_OFF); } @@ -258,7 +258,7 @@ static void __init armada_370_xp_timer_common_init(struct device_node *np) /* * Set scale and timer for sched_clock. */ - setup_sched_clock(armada_370_xp_read_sched_clock, 32, timer_clk); + sched_clock_register(armada_370_xp_read_sched_clock, 32, timer_clk); /* * Setup free-running clocksource timer (interrupts diff --git a/drivers/clocksource/timer-prima2.c b/drivers/clocksource/timer-prima2.c index ef3cfb269d8b..8a492d34ff9f 100644 --- a/drivers/clocksource/timer-prima2.c +++ b/drivers/clocksource/timer-prima2.c @@ -165,9 +165,9 @@ static struct irqaction sirfsoc_timer_irq = { }; /* Overwrite weak default sched_clock with more precise one */ -static u32 notrace sirfsoc_read_sched_clock(void) +static u64 notrace sirfsoc_read_sched_clock(void) { - return (u32)(sirfsoc_timer_read(NULL) & 0xffffffff); + return sirfsoc_timer_read(NULL); } static void __init sirfsoc_clockevent_init(void) @@ -206,7 +206,7 @@ static void __init sirfsoc_prima2_timer_init(struct device_node *np) BUG_ON(clocksource_register_hz(&sirfsoc_clocksource, CLOCK_TICK_RATE)); - setup_sched_clock(sirfsoc_read_sched_clock, 32, CLOCK_TICK_RATE); + sched_clock_register(sirfsoc_read_sched_clock, 64, CLOCK_TICK_RATE); BUG_ON(setup_irq(sirfsoc_timer_irq.irq, &sirfsoc_timer_irq)); diff --git a/drivers/clocksource/vf_pit_timer.c b/drivers/clocksource/vf_pit_timer.c index 587e0202a70b..02821b06a39e 100644 --- a/drivers/clocksource/vf_pit_timer.c +++ b/drivers/clocksource/vf_pit_timer.c @@ -52,7 +52,7 @@ static inline void pit_irq_acknowledge(void) __raw_writel(PITTFLG_TIF, clkevt_base + PITTFLG); } -static unsigned int pit_read_sched_clock(void) +static u64 pit_read_sched_clock(void) { return __raw_readl(clksrc_base + PITCVAL); } @@ -64,7 +64,7 @@ static int __init pit_clocksource_init(unsigned long rate) __raw_writel(~0UL, clksrc_base + PITLDVAL); __raw_writel(PITTCTRL_TEN, clksrc_base + PITTCTRL); - setup_sched_clock(pit_read_sched_clock, 32, rate); + sched_clock_register(pit_read_sched_clock, 32, rate); return clocksource_mmio_init(clksrc_base + PITCVAL, "vf-pit", rate, 300, 32, clocksource_mmio_readl_down); } diff --git a/drivers/clocksource/vt8500_timer.c b/drivers/clocksource/vt8500_timer.c index 64f553f04fa4..ad3c0e83a779 100644 --- a/drivers/clocksource/vt8500_timer.c +++ b/drivers/clocksource/vt8500_timer.c @@ -137,14 +137,12 @@ static void __init vt8500_timer_init(struct device_node *np) if (!regbase) { pr_err("%s: Missing iobase description in Device Tree\n", __func__); - of_node_put(np); return; } timer_irq = irq_of_parse_and_map(np, 0); if (!timer_irq) { pr_err("%s: Missing irq description in Device Tree\n", __func__); - of_node_put(np); return; } diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index a1260b4549db..d2c3253e015e 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -986,6 +986,10 @@ static int __init acpi_cpufreq_init(void) { int ret; + /* don't keep reloading if cpufreq_driver exists */ + if (cpufreq_get_current_driver()) + return 0; + if (acpi_disabled) return 0; diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c index 78c49d8e0f4a..c522a95c0e16 100644 --- a/drivers/cpufreq/cpufreq-cpu0.c +++ b/drivers/cpufreq/cpufreq-cpu0.c @@ -229,7 +229,7 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) if (of_property_read_u32(np, "clock-latency", &transition_latency)) transition_latency = CPUFREQ_ETERNAL; - if (cpu_reg) { + if (!IS_ERR(cpu_reg)) { struct opp *opp; unsigned long min_uV, max_uV; int i; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 89b3c52cd5c3..04548f7023af 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1460,6 +1460,9 @@ unsigned int cpufreq_get(unsigned int cpu) { unsigned int ret_freq = 0; + if (cpufreq_disabled() || !cpufreq_driver) + return -ENOENT; + if (!down_read_trylock(&cpufreq_rwsem)) return 0; diff --git a/drivers/cpufreq/exynos5440-cpufreq.c b/drivers/cpufreq/exynos5440-cpufreq.c index d514c152fd1a..be5380ecdcd4 100644 --- a/drivers/cpufreq/exynos5440-cpufreq.c +++ b/drivers/cpufreq/exynos5440-cpufreq.c @@ -457,7 +457,7 @@ err_free_table: opp_free_cpufreq_table(dvfs_info->dev, &dvfs_info->freq_table); err_put_node: of_node_put(np); - dev_err(dvfs_info->dev, "%s: failed initialization\n", __func__); + dev_err(&pdev->dev, "%s: failed initialization\n", __func__); return ret; } diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 9733f29ed148..badf6206b2b2 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -383,6 +383,7 @@ static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max) static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate) { int max_perf, min_perf; + u64 val; intel_pstate_get_min_max(cpu, &min_perf, &max_perf); @@ -394,8 +395,11 @@ static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate) trace_cpu_frequency(pstate * 100000, cpu->cpu); cpu->pstate.current_pstate = pstate; - wrmsrl(MSR_IA32_PERF_CTL, pstate << 8); + val = pstate << 8; + if (limits.no_turbo) + val |= (u64)1 << 32; + wrmsrl(MSR_IA32_PERF_CTL, val); } static inline void intel_pstate_pstate_increase(struct cpudata *cpu, int steps) @@ -634,8 +638,8 @@ static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) static int intel_pstate_cpu_init(struct cpufreq_policy *policy) { - int rc, min_pstate, max_pstate; struct cpudata *cpu; + int rc; rc = intel_pstate_init_cpu(policy->cpu); if (rc) @@ -649,9 +653,8 @@ static int intel_pstate_cpu_init(struct cpufreq_policy *policy) else policy->policy = CPUFREQ_POLICY_POWERSAVE; - intel_pstate_get_min_max(cpu, &min_pstate, &max_pstate); - policy->min = min_pstate * 100000; - policy->max = max_pstate * 100000; + policy->min = cpu->pstate.min_pstate * 100000; + policy->max = cpu->pstate.turbo_pstate * 100000; /* cpuinfo and default policy values */ policy->cpuinfo.min_freq = cpu->pstate.min_pstate * 100000; diff --git a/drivers/cpufreq/s3c64xx-cpufreq.c b/drivers/cpufreq/s3c64xx-cpufreq.c index 8a72b0c555f8..15631f92ab7d 100644 --- a/drivers/cpufreq/s3c64xx-cpufreq.c +++ b/drivers/cpufreq/s3c64xx-cpufreq.c @@ -166,7 +166,7 @@ static void __init s3c64xx_cpufreq_config_regulator(void) if (freq->frequency == CPUFREQ_ENTRY_INVALID) continue; - dvfs = &s3c64xx_dvfs_table[freq->index]; + dvfs = &s3c64xx_dvfs_table[freq->driver_data]; found = 0; for (i = 0; i < count; i++) { diff --git a/drivers/cpufreq/spear-cpufreq.c b/drivers/cpufreq/spear-cpufreq.c index 19e364fa5955..3f418166ce02 100644 --- a/drivers/cpufreq/spear-cpufreq.c +++ b/drivers/cpufreq/spear-cpufreq.c @@ -113,7 +113,7 @@ static int spear_cpufreq_target(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation) { struct cpufreq_freqs freqs; - unsigned long newfreq; + long newfreq; struct clk *srcclk; int index, ret, mult = 1; diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 526ec77c7ba0..f238cfd33847 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -198,6 +198,7 @@ config TI_EDMA depends on ARCH_DAVINCI || ARCH_OMAP select DMA_ENGINE select DMA_VIRTUAL_CHANNELS + select TI_PRIV_EDMA default n help Enable support for the TI EDMA controller. This DMA diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index ff50ff4c6a57..3519111c566b 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -306,6 +306,7 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( EDMA_SLOT_ANY); if (echan->slot[i] < 0) { dev_err(dev, "Failed to allocate slot\n"); + kfree(edesc); return NULL; } } @@ -749,6 +750,6 @@ static void __exit edma_exit(void) } module_exit(edma_exit); -MODULE_AUTHOR("Matt Porter "); +MODULE_AUTHOR("Matt Porter "); MODULE_DESCRIPTION("TI EDMA DMA engine driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/dma/imx-dma.c b/drivers/dma/imx-dma.c index 78f8ca5fccee..55852c026791 100644 --- a/drivers/dma/imx-dma.c +++ b/drivers/dma/imx-dma.c @@ -437,17 +437,18 @@ static void dma_irq_handle_channel(struct imxdma_channel *imxdmac) struct imxdma_engine *imxdma = imxdmac->imxdma; int chno = imxdmac->channel; struct imxdma_desc *desc; + unsigned long flags; - spin_lock(&imxdma->lock); + spin_lock_irqsave(&imxdma->lock, flags); if (list_empty(&imxdmac->ld_active)) { - spin_unlock(&imxdma->lock); + spin_unlock_irqrestore(&imxdma->lock, flags); goto out; } desc = list_first_entry(&imxdmac->ld_active, struct imxdma_desc, node); - spin_unlock(&imxdma->lock); + spin_unlock_irqrestore(&imxdma->lock, flags); if (desc->sg) { u32 tmp; @@ -519,7 +520,6 @@ static int imxdma_xfer_desc(struct imxdma_desc *d) { struct imxdma_channel *imxdmac = to_imxdma_chan(d->desc.chan); struct imxdma_engine *imxdma = imxdmac->imxdma; - unsigned long flags; int slot = -1; int i; @@ -527,7 +527,6 @@ static int imxdma_xfer_desc(struct imxdma_desc *d) switch (d->type) { case IMXDMA_DESC_INTERLEAVED: /* Try to get a free 2D slot */ - spin_lock_irqsave(&imxdma->lock, flags); for (i = 0; i < IMX_DMA_2D_SLOTS; i++) { if ((imxdma->slots_2d[i].count > 0) && ((imxdma->slots_2d[i].xsr != d->x) || @@ -537,10 +536,8 @@ static int imxdma_xfer_desc(struct imxdma_desc *d) slot = i; break; } - if (slot < 0) { - spin_unlock_irqrestore(&imxdma->lock, flags); + if (slot < 0) return -EBUSY; - } imxdma->slots_2d[slot].xsr = d->x; imxdma->slots_2d[slot].ysr = d->y; @@ -549,7 +546,6 @@ static int imxdma_xfer_desc(struct imxdma_desc *d) imxdmac->slot_2d = slot; imxdmac->enabled_2d = true; - spin_unlock_irqrestore(&imxdma->lock, flags); if (slot == IMX_DMA_2D_SLOT_A) { d->config_mem &= ~CCR_MSEL_B; @@ -625,18 +621,17 @@ static void imxdma_tasklet(unsigned long data) struct imxdma_channel *imxdmac = (void *)data; struct imxdma_engine *imxdma = imxdmac->imxdma; struct imxdma_desc *desc; + unsigned long flags; - spin_lock(&imxdma->lock); + spin_lock_irqsave(&imxdma->lock, flags); if (list_empty(&imxdmac->ld_active)) { /* Someone might have called terminate all */ - goto out; + spin_unlock_irqrestore(&imxdma->lock, flags); + return; } desc = list_first_entry(&imxdmac->ld_active, struct imxdma_desc, node); - if (desc->desc.callback) - desc->desc.callback(desc->desc.callback_param); - /* If we are dealing with a cyclic descriptor, keep it on ld_active * and dont mark the descriptor as complete. * Only in non-cyclic cases it would be marked as complete @@ -663,7 +658,11 @@ static void imxdma_tasklet(unsigned long data) __func__, imxdmac->channel); } out: - spin_unlock(&imxdma->lock); + spin_unlock_irqrestore(&imxdma->lock, flags); + + if (desc->desc.callback) + desc->desc.callback(desc->desc.callback_param); + } static int imxdma_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd, @@ -883,7 +882,7 @@ static struct dma_async_tx_descriptor *imxdma_prep_dma_cyclic( kfree(imxdmac->sg_list); imxdmac->sg_list = kcalloc(periods + 1, - sizeof(struct scatterlist), GFP_KERNEL); + sizeof(struct scatterlist), GFP_ATOMIC); if (!imxdmac->sg_list) return NULL; diff --git a/drivers/dma/sh/rcar-hpbdma.c b/drivers/dma/sh/rcar-hpbdma.c index 45a520281ce1..ebad84591a6e 100644 --- a/drivers/dma/sh/rcar-hpbdma.c +++ b/drivers/dma/sh/rcar-hpbdma.c @@ -93,6 +93,7 @@ struct hpb_dmae_chan { void __iomem *base; const struct hpb_dmae_slave_config *cfg; char dev_id[16]; /* unique name per DMAC of channel */ + dma_addr_t slave_addr; }; struct hpb_dmae_device { @@ -432,7 +433,6 @@ hpb_dmae_alloc_chan_resources(struct hpb_dmae_chan *hpb_chan, hpb_chan->xfer_mode = XFER_DOUBLE; } else { dev_err(hpb_chan->shdma_chan.dev, "DCR setting error"); - shdma_free_irq(&hpb_chan->shdma_chan); return -EINVAL; } @@ -446,7 +446,8 @@ hpb_dmae_alloc_chan_resources(struct hpb_dmae_chan *hpb_chan, return 0; } -static int hpb_dmae_set_slave(struct shdma_chan *schan, int slave_id, bool try) +static int hpb_dmae_set_slave(struct shdma_chan *schan, int slave_id, + dma_addr_t slave_addr, bool try) { struct hpb_dmae_chan *chan = to_chan(schan); const struct hpb_dmae_slave_config *sc = @@ -457,6 +458,7 @@ static int hpb_dmae_set_slave(struct shdma_chan *schan, int slave_id, bool try) if (try) return 0; chan->cfg = sc; + chan->slave_addr = slave_addr ? : sc->addr; return hpb_dmae_alloc_chan_resources(chan, sc); } @@ -468,7 +470,7 @@ static dma_addr_t hpb_dmae_slave_addr(struct shdma_chan *schan) { struct hpb_dmae_chan *chan = to_chan(schan); - return chan->cfg->addr; + return chan->slave_addr; } static struct shdma_desc *hpb_dmae_embedded_desc(void *buf, int i) @@ -614,7 +616,6 @@ static void hpb_dmae_chan_remove(struct hpb_dmae_device *hpbdev) shdma_for_each_chan(schan, &hpbdev->shdma_dev, i) { BUG_ON(!schan); - shdma_free_irq(schan); shdma_chan_remove(schan); } dma_dev->chancnt = 0; diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c new file mode 100644 index 000000000000..b6bffbfd3be7 --- /dev/null +++ b/drivers/firmware/efi/efi-stub-helper.c @@ -0,0 +1,636 @@ +/* + * Helper functions used by the EFI stub on multiple + * architectures. This should be #included by the EFI stub + * implementation files. + * + * Copyright 2011 Intel Corporation; author Matt Fleming + * + * This file is part of the Linux kernel, and is made available + * under the terms of the GNU General Public License version 2. + * + */ +#define EFI_READ_CHUNK_SIZE (1024 * 1024) + +struct file_info { + efi_file_handle_t *handle; + u64 size; +}; + + + + +static void efi_char16_printk(efi_system_table_t *sys_table_arg, + efi_char16_t *str) +{ + struct efi_simple_text_output_protocol *out; + + out = (struct efi_simple_text_output_protocol *)sys_table_arg->con_out; + efi_call_phys2(out->output_string, out, str); +} + +static void efi_printk(efi_system_table_t *sys_table_arg, char *str) +{ + char *s8; + + for (s8 = str; *s8; s8++) { + efi_char16_t ch[2] = { 0 }; + + ch[0] = *s8; + if (*s8 == '\n') { + efi_char16_t nl[2] = { '\r', 0 }; + efi_char16_printk(sys_table_arg, nl); + } + + efi_char16_printk(sys_table_arg, ch); + } +} + + +static efi_status_t efi_get_memory_map(efi_system_table_t *sys_table_arg, + efi_memory_desc_t **map, + unsigned long *map_size, + unsigned long *desc_size, + u32 *desc_ver, + unsigned long *key_ptr) +{ + efi_memory_desc_t *m = NULL; + efi_status_t status; + unsigned long key; + u32 desc_version; + + *map_size = sizeof(*m) * 32; +again: + /* + * Add an additional efi_memory_desc_t because we're doing an + * allocation which may be in a new descriptor region. + */ + *map_size += sizeof(*m); + status = efi_call_phys3(sys_table_arg->boottime->allocate_pool, + EFI_LOADER_DATA, *map_size, (void **)&m); + if (status != EFI_SUCCESS) + goto fail; + + status = efi_call_phys5(sys_table_arg->boottime->get_memory_map, + map_size, m, &key, desc_size, &desc_version); + if (status == EFI_BUFFER_TOO_SMALL) { + efi_call_phys1(sys_table_arg->boottime->free_pool, m); + goto again; + } + + if (status != EFI_SUCCESS) + efi_call_phys1(sys_table_arg->boottime->free_pool, m); + if (key_ptr && status == EFI_SUCCESS) + *key_ptr = key; + if (desc_ver && status == EFI_SUCCESS) + *desc_ver = desc_version; + +fail: + *map = m; + return status; +} + +/* + * Allocate at the highest possible address that is not above 'max'. + */ +static efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg, + unsigned long size, unsigned long align, + unsigned long *addr, unsigned long max) +{ + unsigned long map_size, desc_size; + efi_memory_desc_t *map; + efi_status_t status; + unsigned long nr_pages; + u64 max_addr = 0; + int i; + + status = efi_get_memory_map(sys_table_arg, &map, &map_size, &desc_size, + NULL, NULL); + if (status != EFI_SUCCESS) + goto fail; + + /* + * Enforce minimum alignment that EFI requires when requesting + * a specific address. We are doing page-based allocations, + * so we must be aligned to a page. + */ + if (align < EFI_PAGE_SIZE) + align = EFI_PAGE_SIZE; + + nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; +again: + for (i = 0; i < map_size / desc_size; i++) { + efi_memory_desc_t *desc; + unsigned long m = (unsigned long)map; + u64 start, end; + + desc = (efi_memory_desc_t *)(m + (i * desc_size)); + if (desc->type != EFI_CONVENTIONAL_MEMORY) + continue; + + if (desc->num_pages < nr_pages) + continue; + + start = desc->phys_addr; + end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + + if ((start + size) > end || (start + size) > max) + continue; + + if (end - size > max) + end = max; + + if (round_down(end - size, align) < start) + continue; + + start = round_down(end - size, align); + + /* + * Don't allocate at 0x0. It will confuse code that + * checks pointers against NULL. + */ + if (start == 0x0) + continue; + + if (start > max_addr) + max_addr = start; + } + + if (!max_addr) + status = EFI_NOT_FOUND; + else { + status = efi_call_phys4(sys_table_arg->boottime->allocate_pages, + EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, + nr_pages, &max_addr); + if (status != EFI_SUCCESS) { + max = max_addr; + max_addr = 0; + goto again; + } + + *addr = max_addr; + } + + efi_call_phys1(sys_table_arg->boottime->free_pool, map); + +fail: + return status; +} + +/* + * Allocate at the lowest possible address. + */ +static efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg, + unsigned long size, unsigned long align, + unsigned long *addr) +{ + unsigned long map_size, desc_size; + efi_memory_desc_t *map; + efi_status_t status; + unsigned long nr_pages; + int i; + + status = efi_get_memory_map(sys_table_arg, &map, &map_size, &desc_size, + NULL, NULL); + if (status != EFI_SUCCESS) + goto fail; + + /* + * Enforce minimum alignment that EFI requires when requesting + * a specific address. We are doing page-based allocations, + * so we must be aligned to a page. + */ + if (align < EFI_PAGE_SIZE) + align = EFI_PAGE_SIZE; + + nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; + for (i = 0; i < map_size / desc_size; i++) { + efi_memory_desc_t *desc; + unsigned long m = (unsigned long)map; + u64 start, end; + + desc = (efi_memory_desc_t *)(m + (i * desc_size)); + + if (desc->type != EFI_CONVENTIONAL_MEMORY) + continue; + + if (desc->num_pages < nr_pages) + continue; + + start = desc->phys_addr; + end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT); + + /* + * Don't allocate at 0x0. It will confuse code that + * checks pointers against NULL. Skip the first 8 + * bytes so we start at a nice even number. + */ + if (start == 0x0) + start += 8; + + start = round_up(start, align); + if ((start + size) > end) + continue; + + status = efi_call_phys4(sys_table_arg->boottime->allocate_pages, + EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, + nr_pages, &start); + if (status == EFI_SUCCESS) { + *addr = start; + break; + } + } + + if (i == map_size / desc_size) + status = EFI_NOT_FOUND; + + efi_call_phys1(sys_table_arg->boottime->free_pool, map); +fail: + return status; +} + +static void efi_free(efi_system_table_t *sys_table_arg, unsigned long size, + unsigned long addr) +{ + unsigned long nr_pages; + + if (!size) + return; + + nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; + efi_call_phys2(sys_table_arg->boottime->free_pages, addr, nr_pages); +} + + +/* + * Check the cmdline for a LILO-style file= arguments. + * + * We only support loading a file from the same filesystem as + * the kernel image. + */ +static efi_status_t handle_cmdline_files(efi_system_table_t *sys_table_arg, + efi_loaded_image_t *image, + char *cmd_line, char *option_string, + unsigned long max_addr, + unsigned long *load_addr, + unsigned long *load_size) +{ + struct file_info *files; + unsigned long file_addr; + efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID; + u64 file_size_total; + efi_file_io_interface_t *io; + efi_file_handle_t *fh; + efi_status_t status; + int nr_files; + char *str; + int i, j, k; + + file_addr = 0; + file_size_total = 0; + + str = cmd_line; + + j = 0; /* See close_handles */ + + if (!load_addr || !load_size) + return EFI_INVALID_PARAMETER; + + *load_addr = 0; + *load_size = 0; + + if (!str || !*str) + return EFI_SUCCESS; + + for (nr_files = 0; *str; nr_files++) { + str = strstr(str, option_string); + if (!str) + break; + + str += strlen(option_string); + + /* Skip any leading slashes */ + while (*str == '/' || *str == '\\') + str++; + + while (*str && *str != ' ' && *str != '\n') + str++; + } + + if (!nr_files) + return EFI_SUCCESS; + + status = efi_call_phys3(sys_table_arg->boottime->allocate_pool, + EFI_LOADER_DATA, + nr_files * sizeof(*files), + (void **)&files); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to alloc mem for file handle list\n"); + goto fail; + } + + str = cmd_line; + for (i = 0; i < nr_files; i++) { + struct file_info *file; + efi_file_handle_t *h; + efi_file_info_t *info; + efi_char16_t filename_16[256]; + unsigned long info_sz; + efi_guid_t info_guid = EFI_FILE_INFO_ID; + efi_char16_t *p; + u64 file_sz; + + str = strstr(str, option_string); + if (!str) + break; + + str += strlen(option_string); + + file = &files[i]; + p = filename_16; + + /* Skip any leading slashes */ + while (*str == '/' || *str == '\\') + str++; + + while (*str && *str != ' ' && *str != '\n') { + if ((u8 *)p >= (u8 *)filename_16 + sizeof(filename_16)) + break; + + if (*str == '/') { + *p++ = '\\'; + str++; + } else { + *p++ = *str++; + } + } + + *p = '\0'; + + /* Only open the volume once. */ + if (!i) { + efi_boot_services_t *boottime; + + boottime = sys_table_arg->boottime; + + status = efi_call_phys3(boottime->handle_protocol, + image->device_handle, &fs_proto, + (void **)&io); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to handle fs_proto\n"); + goto free_files; + } + + status = efi_call_phys2(io->open_volume, io, &fh); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to open volume\n"); + goto free_files; + } + } + + status = efi_call_phys5(fh->open, fh, &h, filename_16, + EFI_FILE_MODE_READ, (u64)0); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to open file: "); + efi_char16_printk(sys_table_arg, filename_16); + efi_printk(sys_table_arg, "\n"); + goto close_handles; + } + + file->handle = h; + + info_sz = 0; + status = efi_call_phys4(h->get_info, h, &info_guid, + &info_sz, NULL); + if (status != EFI_BUFFER_TOO_SMALL) { + efi_printk(sys_table_arg, "Failed to get file info size\n"); + goto close_handles; + } + +grow: + status = efi_call_phys3(sys_table_arg->boottime->allocate_pool, + EFI_LOADER_DATA, info_sz, + (void **)&info); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to alloc mem for file info\n"); + goto close_handles; + } + + status = efi_call_phys4(h->get_info, h, &info_guid, + &info_sz, info); + if (status == EFI_BUFFER_TOO_SMALL) { + efi_call_phys1(sys_table_arg->boottime->free_pool, + info); + goto grow; + } + + file_sz = info->file_size; + efi_call_phys1(sys_table_arg->boottime->free_pool, info); + + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to get file info\n"); + goto close_handles; + } + + file->size = file_sz; + file_size_total += file_sz; + } + + if (file_size_total) { + unsigned long addr; + + /* + * Multiple files need to be at consecutive addresses in memory, + * so allocate enough memory for all the files. This is used + * for loading multiple files. + */ + status = efi_high_alloc(sys_table_arg, file_size_total, 0x1000, + &file_addr, max_addr); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to alloc highmem for files\n"); + goto close_handles; + } + + /* We've run out of free low memory. */ + if (file_addr > max_addr) { + efi_printk(sys_table_arg, "We've run out of free low memory\n"); + status = EFI_INVALID_PARAMETER; + goto free_file_total; + } + + addr = file_addr; + for (j = 0; j < nr_files; j++) { + unsigned long size; + + size = files[j].size; + while (size) { + unsigned long chunksize; + if (size > EFI_READ_CHUNK_SIZE) + chunksize = EFI_READ_CHUNK_SIZE; + else + chunksize = size; + status = efi_call_phys3(fh->read, + files[j].handle, + &chunksize, + (void *)addr); + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "Failed to read file\n"); + goto free_file_total; + } + addr += chunksize; + size -= chunksize; + } + + efi_call_phys1(fh->close, files[j].handle); + } + + } + + efi_call_phys1(sys_table_arg->boottime->free_pool, files); + + *load_addr = file_addr; + *load_size = file_size_total; + + return status; + +free_file_total: + efi_free(sys_table_arg, file_size_total, file_addr); + +close_handles: + for (k = j; k < i; k++) + efi_call_phys1(fh->close, files[k].handle); +free_files: + efi_call_phys1(sys_table_arg->boottime->free_pool, files); +fail: + *load_addr = 0; + *load_size = 0; + + return status; +} +/* + * Relocate a kernel image, either compressed or uncompressed. + * In the ARM64 case, all kernel images are currently + * uncompressed, and as such when we relocate it we need to + * allocate additional space for the BSS segment. Any low + * memory that this function should avoid needs to be + * unavailable in the EFI memory map, as if the preferred + * address is not available the lowest available address will + * be used. + */ +static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg, + unsigned long *image_addr, + unsigned long image_size, + unsigned long alloc_size, + unsigned long preferred_addr, + unsigned long alignment) +{ + unsigned long cur_image_addr; + unsigned long new_addr = 0; + efi_status_t status; + unsigned long nr_pages; + efi_physical_addr_t efi_addr = preferred_addr; + + if (!image_addr || !image_size || !alloc_size) + return EFI_INVALID_PARAMETER; + if (alloc_size < image_size) + return EFI_INVALID_PARAMETER; + + cur_image_addr = *image_addr; + + /* + * The EFI firmware loader could have placed the kernel image + * anywhere in memory, but the kernel has restrictions on the + * max physical address it can run at. Some architectures + * also have a prefered address, so first try to relocate + * to the preferred address. If that fails, allocate as low + * as possible while respecting the required alignment. + */ + nr_pages = round_up(alloc_size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE; + status = efi_call_phys4(sys_table_arg->boottime->allocate_pages, + EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA, + nr_pages, &efi_addr); + new_addr = efi_addr; + /* + * If preferred address allocation failed allocate as low as + * possible. + */ + if (status != EFI_SUCCESS) { + status = efi_low_alloc(sys_table_arg, alloc_size, alignment, + &new_addr); + } + if (status != EFI_SUCCESS) { + efi_printk(sys_table_arg, "ERROR: Failed to allocate usable memory for kernel.\n"); + return status; + } + + /* + * We know source/dest won't overlap since both memory ranges + * have been allocated by UEFI, so we can safely use memcpy. + */ + memcpy((void *)new_addr, (void *)cur_image_addr, image_size); + + /* Return the new address of the relocated image. */ + *image_addr = new_addr; + + return status; +} + +/* + * Convert the unicode UEFI command line to ASCII to pass to kernel. + * Size of memory allocated return in *cmd_line_len. + * Returns NULL on error. + */ +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg, + efi_loaded_image_t *image, + int *cmd_line_len) +{ + u16 *s2; + u8 *s1 = NULL; + unsigned long cmdline_addr = 0; + int load_options_size = image->load_options_size / 2; /* ASCII */ + void *options = image->load_options; + int options_size = 0; + efi_status_t status; + int i; + u16 zero = 0; + + if (options) { + s2 = options; + while (*s2 && *s2 != '\n' && options_size < load_options_size) { + s2++; + options_size++; + } + } + + if (options_size == 0) { + /* No command line options, so return empty string*/ + options_size = 1; + options = &zero; + } + + options_size++; /* NUL termination */ +#ifdef CONFIG_ARM + /* + * For ARM, allocate at a high address to avoid reserved + * regions at low addresses that we don't know the specfics of + * at the time we are processing the command line. + */ + status = efi_high_alloc(sys_table_arg, options_size, 0, + &cmdline_addr, 0xfffff000); +#else + status = efi_low_alloc(sys_table_arg, options_size, 0, + &cmdline_addr); +#endif + if (status != EFI_SUCCESS) + return NULL; + + s1 = (u8 *)cmdline_addr; + s2 = (u16 *)options; + + for (i = 0; i < options_size - 1; i++) + *s1++ = *s2++; + + *s1 = '\0'; + + *cmd_line_len = options_size; + return (char *)cmdline_addr; +} diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 5145fa344ad5..2e2fbdec0845 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -13,11 +13,27 @@ * This file is released under the GPLv2. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include #include #include +#include + +struct efi __read_mostly efi = { + .mps = EFI_INVALID_TABLE_ADDR, + .acpi = EFI_INVALID_TABLE_ADDR, + .acpi20 = EFI_INVALID_TABLE_ADDR, + .smbios = EFI_INVALID_TABLE_ADDR, + .sal_systab = EFI_INVALID_TABLE_ADDR, + .boot_info = EFI_INVALID_TABLE_ADDR, + .hcdp = EFI_INVALID_TABLE_ADDR, + .uga = EFI_INVALID_TABLE_ADDR, + .uv_systab = EFI_INVALID_TABLE_ADDR, +}; +EXPORT_SYMBOL(efi); static struct kobject *efi_kobj; static struct kobject *efivars_kobj; @@ -132,3 +148,127 @@ err_put: } subsys_initcall(efisubsys_init); + + +/* + * We can't ioremap data in EFI boot services RAM, because we've already mapped + * it as RAM. So, look it up in the existing EFI memory map instead. Only + * callable after efi_enter_virtual_mode and before efi_free_boot_services. + */ +void __iomem *efi_lookup_mapped_addr(u64 phys_addr) +{ + struct efi_memory_map *map; + void *p; + map = efi.memmap; + if (!map) + return NULL; + if (WARN_ON(!map->map)) + return NULL; + for (p = map->map; p < map->map_end; p += map->desc_size) { + efi_memory_desc_t *md = p; + u64 size = md->num_pages << EFI_PAGE_SHIFT; + u64 end = md->phys_addr + size; + if (!(md->attribute & EFI_MEMORY_RUNTIME) && + md->type != EFI_BOOT_SERVICES_CODE && + md->type != EFI_BOOT_SERVICES_DATA) + continue; + if (!md->virt_addr) + continue; + if (phys_addr >= md->phys_addr && phys_addr < end) { + phys_addr += md->virt_addr - md->phys_addr; + return (__force void __iomem *)(unsigned long)phys_addr; + } + } + return NULL; +} + +static __initdata efi_config_table_type_t common_tables[] = { + {ACPI_20_TABLE_GUID, "ACPI 2.0", &efi.acpi20}, + {ACPI_TABLE_GUID, "ACPI", &efi.acpi}, + {HCDP_TABLE_GUID, "HCDP", &efi.hcdp}, + {MPS_TABLE_GUID, "MPS", &efi.mps}, + {SAL_SYSTEM_TABLE_GUID, "SALsystab", &efi.sal_systab}, + {SMBIOS_TABLE_GUID, "SMBIOS", &efi.smbios}, + {UGA_IO_PROTOCOL_GUID, "UGA", &efi.uga}, + {NULL_GUID, NULL, 0}, +}; + +static __init int match_config_table(efi_guid_t *guid, + unsigned long table, + efi_config_table_type_t *table_types) +{ + u8 str[EFI_VARIABLE_GUID_LEN + 1]; + int i; + + if (table_types) { + efi_guid_unparse(guid, str); + + for (i = 0; efi_guidcmp(table_types[i].guid, NULL_GUID); i++) { + efi_guid_unparse(&table_types[i].guid, str); + + if (!efi_guidcmp(*guid, table_types[i].guid)) { + *(table_types[i].ptr) = table; + pr_cont(" %s=0x%lx ", + table_types[i].name, table); + return 1; + } + } + } + + return 0; +} + +int __init efi_config_init(efi_config_table_type_t *arch_tables) +{ + void *config_tables, *tablep; + int i, sz; + + if (efi_enabled(EFI_64BIT)) + sz = sizeof(efi_config_table_64_t); + else + sz = sizeof(efi_config_table_32_t); + + /* + * Let's see what config tables the firmware passed to us. + */ + config_tables = early_memremap(efi.systab->tables, + efi.systab->nr_tables * sz); + if (config_tables == NULL) { + pr_err("Could not map Configuration table!\n"); + return -ENOMEM; + } + + tablep = config_tables; + pr_info(""); + for (i = 0; i < efi.systab->nr_tables; i++) { + efi_guid_t guid; + unsigned long table; + + if (efi_enabled(EFI_64BIT)) { + u64 table64; + guid = ((efi_config_table_64_t *)tablep)->guid; + table64 = ((efi_config_table_64_t *)tablep)->table; + table = table64; +#ifndef CONFIG_64BIT + if (table64 >> 32) { + pr_cont("\n"); + pr_err("Table located above 4GB, disabling EFI.\n"); + early_iounmap(config_tables, + efi.systab->nr_tables * sz); + return -EINVAL; + } +#endif + } else { + guid = ((efi_config_table_32_t *)tablep)->guid; + table = ((efi_config_table_32_t *)tablep)->table; + } + + if (!match_config_table(&guid, table, common_tables)) + match_config_table(&guid, table, arch_tables); + + tablep += sz; + } + pr_cont("\n"); + early_iounmap(config_tables, efi.systab->nr_tables * sz); + return 0; +} diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c index 8a7432a4b413..933eb027d527 100644 --- a/drivers/firmware/efi/efivars.c +++ b/drivers/firmware/efi/efivars.c @@ -564,7 +564,7 @@ static int efivar_sysfs_destroy(struct efivar_entry *entry, void *data) return 0; } -void efivars_sysfs_exit(void) +static void efivars_sysfs_exit(void) { /* Remove all entries and destroy */ __efivar_entry_iter(efivar_sysfs_destroy, &efivar_sysfs_list, NULL, NULL); diff --git a/drivers/gpio/gpio-lynxpoint.c b/drivers/gpio/gpio-lynxpoint.c index 2d9ca6055e5e..41b5913ddabe 100644 --- a/drivers/gpio/gpio-lynxpoint.c +++ b/drivers/gpio/gpio-lynxpoint.c @@ -248,14 +248,15 @@ static void lp_gpio_irq_handler(unsigned irq, struct irq_desc *desc) struct lp_gpio *lg = irq_data_get_irq_handler_data(data); struct irq_chip *chip = irq_data_get_irq_chip(data); u32 base, pin, mask; - unsigned long reg, pending; + unsigned long reg, ena, pending; unsigned virq; /* check from GPIO controller which pin triggered the interrupt */ for (base = 0; base < lg->chip.ngpio; base += 32) { reg = lp_gpio_reg(&lg->chip, base, LP_INT_STAT); + ena = lp_gpio_reg(&lg->chip, base, LP_INT_ENABLE); - while ((pending = inl(reg))) { + while ((pending = (inl(reg) & inl(ena)))) { pin = __ffs(pending); mask = BIT(pin); /* Clear before handling so we don't lose an edge */ diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c index 0ff43552d472..89675f862308 100644 --- a/drivers/gpio/gpio-omap.c +++ b/drivers/gpio/gpio-omap.c @@ -63,6 +63,7 @@ struct gpio_bank { struct gpio_chip chip; struct clk *dbck; u32 mod_usage; + u32 irq_usage; u32 dbck_enable_mask; bool dbck_enabled; struct device *dev; @@ -86,6 +87,9 @@ struct gpio_bank { #define GPIO_BIT(bank, gpio) (1 << GPIO_INDEX(bank, gpio)) #define GPIO_MOD_CTRL_BIT BIT(0) +#define BANK_USED(bank) (bank->mod_usage || bank->irq_usage) +#define LINE_USED(line, offset) (line & (1 << offset)) + static int irq_to_gpio(struct gpio_bank *bank, unsigned int gpio_irq) { return bank->chip.base + gpio_irq; @@ -420,15 +424,69 @@ static int _set_gpio_triggering(struct gpio_bank *bank, int gpio, return 0; } +static void _enable_gpio_module(struct gpio_bank *bank, unsigned offset) +{ + if (bank->regs->pinctrl) { + void __iomem *reg = bank->base + bank->regs->pinctrl; + + /* Claim the pin for MPU */ + __raw_writel(__raw_readl(reg) | (1 << offset), reg); + } + + if (bank->regs->ctrl && !BANK_USED(bank)) { + void __iomem *reg = bank->base + bank->regs->ctrl; + u32 ctrl; + + ctrl = __raw_readl(reg); + /* Module is enabled, clocks are not gated */ + ctrl &= ~GPIO_MOD_CTRL_BIT; + __raw_writel(ctrl, reg); + bank->context.ctrl = ctrl; + } +} + +static void _disable_gpio_module(struct gpio_bank *bank, unsigned offset) +{ + void __iomem *base = bank->base; + + if (bank->regs->wkup_en && + !LINE_USED(bank->mod_usage, offset) && + !LINE_USED(bank->irq_usage, offset)) { + /* Disable wake-up during idle for dynamic tick */ + _gpio_rmw(base, bank->regs->wkup_en, 1 << offset, 0); + bank->context.wake_en = + __raw_readl(bank->base + bank->regs->wkup_en); + } + + if (bank->regs->ctrl && !BANK_USED(bank)) { + void __iomem *reg = bank->base + bank->regs->ctrl; + u32 ctrl; + + ctrl = __raw_readl(reg); + /* Module is disabled, clocks are gated */ + ctrl |= GPIO_MOD_CTRL_BIT; + __raw_writel(ctrl, reg); + bank->context.ctrl = ctrl; + } +} + +static int gpio_is_input(struct gpio_bank *bank, int mask) +{ + void __iomem *reg = bank->base + bank->regs->direction; + + return __raw_readl(reg) & mask; +} + static int gpio_irq_type(struct irq_data *d, unsigned type) { struct gpio_bank *bank = irq_data_get_irq_chip_data(d); unsigned gpio = 0; int retval; unsigned long flags; + unsigned offset; - if (WARN_ON(!bank->mod_usage)) - return -EINVAL; + if (!BANK_USED(bank)) + pm_runtime_get_sync(bank->dev); #ifdef CONFIG_ARCH_OMAP1 if (d->irq > IH_MPUIO_BASE) @@ -446,7 +504,17 @@ static int gpio_irq_type(struct irq_data *d, unsigned type) return -EINVAL; spin_lock_irqsave(&bank->lock, flags); - retval = _set_gpio_triggering(bank, GPIO_INDEX(bank, gpio), type); + offset = GPIO_INDEX(bank, gpio); + retval = _set_gpio_triggering(bank, offset, type); + if (!LINE_USED(bank->mod_usage, offset)) { + _enable_gpio_module(bank, offset); + _set_gpio_direction(bank, offset, 1); + } else if (!gpio_is_input(bank, 1 << offset)) { + spin_unlock_irqrestore(&bank->lock, flags); + return -EINVAL; + } + + bank->irq_usage |= 1 << GPIO_INDEX(bank, gpio); spin_unlock_irqrestore(&bank->lock, flags); if (type & (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH)) @@ -603,35 +671,19 @@ static int omap_gpio_request(struct gpio_chip *chip, unsigned offset) * If this is the first gpio_request for the bank, * enable the bank module. */ - if (!bank->mod_usage) + if (!BANK_USED(bank)) pm_runtime_get_sync(bank->dev); spin_lock_irqsave(&bank->lock, flags); /* Set trigger to none. You need to enable the desired trigger with - * request_irq() or set_irq_type(). + * request_irq() or set_irq_type(). Only do this if the IRQ line has + * not already been requested. */ - _set_gpio_triggering(bank, offset, IRQ_TYPE_NONE); - - if (bank->regs->pinctrl) { - void __iomem *reg = bank->base + bank->regs->pinctrl; - - /* Claim the pin for MPU */ - __raw_writel(__raw_readl(reg) | (1 << offset), reg); - } - - if (bank->regs->ctrl && !bank->mod_usage) { - void __iomem *reg = bank->base + bank->regs->ctrl; - u32 ctrl; - - ctrl = __raw_readl(reg); - /* Module is enabled, clocks are not gated */ - ctrl &= ~GPIO_MOD_CTRL_BIT; - __raw_writel(ctrl, reg); - bank->context.ctrl = ctrl; + if (!LINE_USED(bank->irq_usage, offset)) { + _set_gpio_triggering(bank, offset, IRQ_TYPE_NONE); + _enable_gpio_module(bank, offset); } - bank->mod_usage |= 1 << offset; - spin_unlock_irqrestore(&bank->lock, flags); return 0; @@ -640,31 +692,11 @@ static int omap_gpio_request(struct gpio_chip *chip, unsigned offset) static void omap_gpio_free(struct gpio_chip *chip, unsigned offset) { struct gpio_bank *bank = container_of(chip, struct gpio_bank, chip); - void __iomem *base = bank->base; unsigned long flags; spin_lock_irqsave(&bank->lock, flags); - - if (bank->regs->wkup_en) { - /* Disable wake-up during idle for dynamic tick */ - _gpio_rmw(base, bank->regs->wkup_en, 1 << offset, 0); - bank->context.wake_en = - __raw_readl(bank->base + bank->regs->wkup_en); - } - bank->mod_usage &= ~(1 << offset); - - if (bank->regs->ctrl && !bank->mod_usage) { - void __iomem *reg = bank->base + bank->regs->ctrl; - u32 ctrl; - - ctrl = __raw_readl(reg); - /* Module is disabled, clocks are gated */ - ctrl |= GPIO_MOD_CTRL_BIT; - __raw_writel(ctrl, reg); - bank->context.ctrl = ctrl; - } - + _disable_gpio_module(bank, offset); _reset_gpio(bank, bank->chip.base + offset); spin_unlock_irqrestore(&bank->lock, flags); @@ -672,7 +704,7 @@ static void omap_gpio_free(struct gpio_chip *chip, unsigned offset) * If this is the last gpio to be freed in the bank, * disable the bank module. */ - if (!bank->mod_usage) + if (!BANK_USED(bank)) pm_runtime_put(bank->dev); } @@ -762,10 +794,20 @@ static void gpio_irq_shutdown(struct irq_data *d) struct gpio_bank *bank = irq_data_get_irq_chip_data(d); unsigned int gpio = irq_to_gpio(bank, d->hwirq); unsigned long flags; + unsigned offset = GPIO_INDEX(bank, gpio); spin_lock_irqsave(&bank->lock, flags); + bank->irq_usage &= ~(1 << offset); + _disable_gpio_module(bank, offset); _reset_gpio(bank, gpio); spin_unlock_irqrestore(&bank->lock, flags); + + /* + * If this is the last IRQ to be freed in the bank, + * disable the bank module. + */ + if (!BANK_USED(bank)) + pm_runtime_put(bank->dev); } static void gpio_ack_irq(struct irq_data *d) @@ -897,13 +939,6 @@ static int gpio_input(struct gpio_chip *chip, unsigned offset) return 0; } -static int gpio_is_input(struct gpio_bank *bank, int mask) -{ - void __iomem *reg = bank->base + bank->regs->direction; - - return __raw_readl(reg) & mask; -} - static int gpio_get(struct gpio_chip *chip, unsigned offset) { struct gpio_bank *bank; @@ -922,13 +957,22 @@ static int gpio_output(struct gpio_chip *chip, unsigned offset, int value) { struct gpio_bank *bank; unsigned long flags; + int retval = 0; bank = container_of(chip, struct gpio_bank, chip); spin_lock_irqsave(&bank->lock, flags); + + if (LINE_USED(bank->irq_usage, offset)) { + retval = -EINVAL; + goto exit; + } + bank->set_dataout(bank, offset, value); _set_gpio_direction(bank, offset, 0); + +exit: spin_unlock_irqrestore(&bank->lock, flags); - return 0; + return retval; } static int gpio_debounce(struct gpio_chip *chip, unsigned offset, @@ -1400,7 +1444,7 @@ void omap2_gpio_prepare_for_idle(int pwr_mode) struct gpio_bank *bank; list_for_each_entry(bank, &omap_gpio_list, node) { - if (!bank->mod_usage || !bank->loses_context) + if (!BANK_USED(bank) || !bank->loses_context) continue; bank->power_mode = pwr_mode; @@ -1414,7 +1458,7 @@ void omap2_gpio_resume_after_idle(void) struct gpio_bank *bank; list_for_each_entry(bank, &omap_gpio_list, node) { - if (!bank->mod_usage || !bank->loses_context) + if (!BANK_USED(bank) || !bank->loses_context) continue; pm_runtime_get_sync(bank->dev); diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c index e3745eb07570..6038966ab045 100644 --- a/drivers/gpio/gpio-rcar.c +++ b/drivers/gpio/gpio-rcar.c @@ -293,10 +293,9 @@ static void gpio_rcar_parse_pdata(struct gpio_rcar_priv *p) if (pdata) { p->config = *pdata; } else if (IS_ENABLED(CONFIG_OF) && np) { - ret = of_parse_phandle_with_args(np, "gpio-ranges", - "#gpio-range-cells", 0, &args); - p->config.number_of_pins = ret == 0 && args.args_count == 3 - ? args.args[2] + ret = of_parse_phandle_with_fixed_args(np, "gpio-ranges", 3, 0, + &args); + p->config.number_of_pins = ret == 0 ? args.args[2] : RCAR_MAX_GPIO_PER_BANK; p->config.gpio_base = -1; } diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 86ef3461ec06..0dee0e0c247a 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -136,7 +136,7 @@ static struct gpio_desc *gpio_to_desc(unsigned gpio) */ static int desc_to_gpio(const struct gpio_desc *desc) { - return desc->chip->base + gpio_chip_hwgpio(desc); + return desc - &gpio_desc[0]; } @@ -1398,7 +1398,7 @@ static int gpiod_request(struct gpio_desc *desc, const char *label) int status = -EPROBE_DEFER; unsigned long flags; - if (!desc || !desc->chip) { + if (!desc) { pr_warn("%s: invalid GPIO\n", __func__); return -EINVAL; } @@ -1406,6 +1406,8 @@ static int gpiod_request(struct gpio_desc *desc, const char *label) spin_lock_irqsave(&gpio_lock, flags); chip = desc->chip; + if (chip == NULL) + goto done; if (!try_module_get(chip->owner)) goto done; diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index 1688ff500513..830f7501cb4d 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -2925,6 +2925,8 @@ int drm_edid_to_speaker_allocation(struct edid *edid, u8 **sadb) /* Speaker Allocation Data Block */ if (dbl == 3) { *sadb = kmalloc(dbl, GFP_KERNEL); + if (!*sadb) + return -ENOMEM; memcpy(*sadb, &db[1], dbl); count = dbl; break; diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index f6f6cc7fc133..3d13ca6e257f 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -407,14 +407,6 @@ static void drm_fb_helper_dpms(struct fb_info *info, int dpms_mode) struct drm_connector *connector; int i, j; - /* - * fbdev->blank can be called from irq context in case of a panic. - * Since we already have our own special panic handler which will - * restore the fbdev console mode completely, just bail out early. - */ - if (oops_in_progress) - return; - /* * fbdev->blank can be called from irq context in case of a panic. * Since we already have our own special panic handler which will diff --git a/drivers/gpu/drm/gma500/gtt.c b/drivers/gpu/drm/gma500/gtt.c index 92babac362ec..2db731f00930 100644 --- a/drivers/gpu/drm/gma500/gtt.c +++ b/drivers/gpu/drm/gma500/gtt.c @@ -204,6 +204,7 @@ static int psb_gtt_attach_pages(struct gtt_range *gt) if (IS_ERR(pages)) return PTR_ERR(pages); + gt->npage = gt->gem.size / PAGE_SIZE; gt->pages = pages; return 0; diff --git a/drivers/gpu/drm/gma500/mdfld_dsi_output.h b/drivers/gpu/drm/gma500/mdfld_dsi_output.h index 45d5af0546bf..5b646c1f0c3e 100644 --- a/drivers/gpu/drm/gma500/mdfld_dsi_output.h +++ b/drivers/gpu/drm/gma500/mdfld_dsi_output.h @@ -39,7 +39,7 @@ #include "psb_intel_reg.h" #include "mdfld_output.h" -#include +#include #define FLD_MASK(start, end) (((1 << ((start) - (end) + 1)) - 1) << (end)) #define FLD_VAL(val, start, end) (((val) << (end)) & FLD_MASK(start, end)) diff --git a/drivers/gpu/drm/gma500/oaktrail_device.c b/drivers/gpu/drm/gma500/oaktrail_device.c index 08747fd7105c..7a9ce000fd86 100644 --- a/drivers/gpu/drm/gma500/oaktrail_device.c +++ b/drivers/gpu/drm/gma500/oaktrail_device.c @@ -26,7 +26,7 @@ #include "psb_drv.h" #include "psb_reg.h" #include "psb_intel_reg.h" -#include +#include #include #include "mid_bios.h" #include "intel_bios.h" diff --git a/drivers/gpu/drm/gma500/oaktrail_lvds.c b/drivers/gpu/drm/gma500/oaktrail_lvds.c index e77d7214fca4..3ece553311fe 100644 --- a/drivers/gpu/drm/gma500/oaktrail_lvds.c +++ b/drivers/gpu/drm/gma500/oaktrail_lvds.c @@ -22,7 +22,7 @@ #include #include -#include +#include #include "intel_bios.h" #include "psb_drv.h" diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index c27a21034a5e..d5c784d48671 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1290,12 +1290,9 @@ static int i915_load_modeset_init(struct drm_device *dev) * then we do not take part in VGA arbitration and the * vga_client_register() fails with -ENODEV. */ - if (!HAS_PCH_SPLIT(dev)) { - ret = vga_client_register(dev->pdev, dev, NULL, - i915_vga_set_decode); - if (ret && ret != -ENODEV) - goto out; - } + ret = vga_client_register(dev->pdev, dev, NULL, i915_vga_set_decode); + if (ret && ret != -ENODEV) + goto out; intel_register_dsm_handler(); @@ -1351,12 +1348,6 @@ static int i915_load_modeset_init(struct drm_device *dev) */ intel_fbdev_initial_config(dev); - /* - * Must do this after fbcon init so that - * vgacon_save_screen() works during the handover. - */ - i915_disable_vga_mem(dev); - /* Only enable hotplug handling once the fbdev is fully set up. */ dev_priv->enable_hotplug_processing = true; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index df9253d890ee..cdfb9da0e4ce 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4800,10 +4800,10 @@ i915_gem_inactive_count(struct shrinker *shrinker, struct shrink_control *sc) if (!mutex_trylock(&dev->struct_mutex)) { if (!mutex_is_locked_by(&dev->struct_mutex, current)) - return SHRINK_STOP; + return 0; if (dev_priv->mm.shrinker_no_lock_stealing) - return SHRINK_STOP; + return 0; unlock = false; } @@ -4901,10 +4901,10 @@ i915_gem_inactive_scan(struct shrinker *shrinker, struct shrink_control *sc) if (!mutex_trylock(&dev->struct_mutex)) { if (!mutex_is_locked_by(&dev->struct_mutex, current)) - return 0; + return SHRINK_STOP; if (dev_priv->mm.shrinker_no_lock_stealing) - return 0; + return SHRINK_STOP; unlock = false; } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index aba9d7498996..dae364f0028c 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -143,8 +143,10 @@ static void i915_error_vprintf(struct drm_i915_error_state_buf *e, /* Seek the first printf which is hits start position */ if (e->pos < e->start) { - len = vsnprintf(NULL, 0, f, args); - if (!__i915_error_seek(e, len)) + va_list tmp; + + va_copy(tmp, args); + if (!__i915_error_seek(e, vsnprintf(NULL, 0, f, tmp))) return; } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c159e1a6810f..38f96f65d87a 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3881,6 +3881,9 @@ #define GEN7_SQ_CHICKEN_MBCUNIT_CONFIG 0x9030 #define GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB (1<<11) +#define HSW_SCRATCH1 0xb038 +#define HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE (1<<27) + #define HSW_FUSE_STRAP 0x42014 #define HSW_CDCLK_LIMIT (1 << 24) @@ -4728,6 +4731,9 @@ #define GEN7_ROW_CHICKEN2_GT2 0xf4f4 #define DOP_CLOCK_GATING_DISABLE (1<<0) +#define HSW_ROW_CHICKEN3 0xe49c +#define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6) + #define G4X_AUD_VID_DID (dev_priv->info->display_mmio_offset + 0x62020) #define INTEL_AUDIO_DEVCL 0x808629FB #define INTEL_AUDIO_DEVBLC 0x80862801 diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index d8a1d98693e7..581fb4b2f766 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3941,8 +3941,6 @@ static void intel_connector_check_state(struct intel_connector *connector) * consider. */ void intel_connector_dpms(struct drm_connector *connector, int mode) { - struct intel_encoder *encoder = intel_attached_encoder(connector); - /* All the simple cases only support two dpms states. */ if (mode != DRM_MODE_DPMS_ON) mode = DRM_MODE_DPMS_OFF; @@ -3953,10 +3951,8 @@ void intel_connector_dpms(struct drm_connector *connector, int mode) connector->dpms = mode; /* Only need to change hw state when actually enabled */ - if (encoder->base.crtc) - intel_encoder_dpms(encoder, mode); - else - WARN_ON(encoder->connectors_active != false); + if (connector->encoder) + intel_encoder_dpms(to_intel_encoder(connector->encoder), mode); intel_modeset_check_state(connector->dev); } @@ -4775,6 +4771,10 @@ static void i9xx_set_pipeconf(struct intel_crtc *intel_crtc) pipeconf = 0; + if (dev_priv->quirks & QUIRK_PIPEA_FORCE && + I915_READ(PIPECONF(intel_crtc->pipe)) & PIPECONF_ENABLE) + pipeconf |= PIPECONF_ENABLE; + if (intel_crtc->pipe == 0 && INTEL_INFO(dev)->gen < 4) { /* Enable pixel doubling when the dot clock is > 90% of the (display) * core speed. @@ -10045,33 +10045,6 @@ static void i915_disable_vga(struct drm_device *dev) POSTING_READ(vga_reg); } -static void i915_enable_vga_mem(struct drm_device *dev) -{ - /* Enable VGA memory on Intel HD */ - if (HAS_PCH_SPLIT(dev)) { - vga_get_uninterruptible(dev->pdev, VGA_RSRC_LEGACY_IO); - outb(inb(VGA_MSR_READ) | VGA_MSR_MEM_EN, VGA_MSR_WRITE); - vga_set_legacy_decoding(dev->pdev, VGA_RSRC_LEGACY_IO | - VGA_RSRC_LEGACY_MEM | - VGA_RSRC_NORMAL_IO | - VGA_RSRC_NORMAL_MEM); - vga_put(dev->pdev, VGA_RSRC_LEGACY_IO); - } -} - -void i915_disable_vga_mem(struct drm_device *dev) -{ - /* Disable VGA memory on Intel HD */ - if (HAS_PCH_SPLIT(dev)) { - vga_get_uninterruptible(dev->pdev, VGA_RSRC_LEGACY_IO); - outb(inb(VGA_MSR_READ) & ~VGA_MSR_MEM_EN, VGA_MSR_WRITE); - vga_set_legacy_decoding(dev->pdev, VGA_RSRC_LEGACY_IO | - VGA_RSRC_NORMAL_IO | - VGA_RSRC_NORMAL_MEM); - vga_put(dev->pdev, VGA_RSRC_LEGACY_IO); - } -} - void intel_modeset_init_hw(struct drm_device *dev) { intel_init_power_well(dev); @@ -10350,7 +10323,6 @@ void i915_redisable_vga(struct drm_device *dev) if (I915_READ(vga_reg) != VGA_DISP_DISABLE) { DRM_DEBUG_KMS("Something enabled VGA plane, disabling it\n"); i915_disable_vga(dev); - i915_disable_vga_mem(dev); } } @@ -10564,8 +10536,6 @@ void intel_modeset_cleanup(struct drm_device *dev) intel_disable_fbc(dev); - i915_enable_vga_mem(dev); - intel_disable_gt_powersave(dev); ironlake_teardown_rc6(dev); diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 2151d13772b8..2c555f91bfae 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -588,7 +588,18 @@ intel_dp_i2c_aux_ch(struct i2c_adapter *adapter, int mode, DRM_DEBUG_KMS("aux_ch native nack\n"); return -EREMOTEIO; case AUX_NATIVE_REPLY_DEFER: - udelay(100); + /* + * For now, just give more slack to branch devices. We + * could check the DPCD for I2C bit rate capabilities, + * and if available, adjust the interval. We could also + * be more careful with DP-to-Legacy adapters where a + * long legacy cable may force very low I2C bit rates. + */ + if (intel_dp->dpcd[DP_DOWNSTREAMPORT_PRESENT] & + DP_DWN_STRM_PORT_PRESENT) + usleep_range(500, 600); + else + usleep_range(300, 400); continue; default: DRM_ERROR("aux_ch invalid native reply 0x%02x\n", @@ -1456,7 +1467,7 @@ static void intel_edp_psr_setup(struct intel_dp *intel_dp) /* Avoid continuous PSR exit by masking memup and hpd */ I915_WRITE(EDP_PSR_DEBUG_CTL, EDP_PSR_DEBUG_MASK_MEMUP | - EDP_PSR_DEBUG_MASK_HPD); + EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP); intel_dp->psr_setup_done = true; } diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 28cae80495e2..9b7b68fd5d47 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -793,6 +793,5 @@ extern void hsw_pc8_disable_interrupts(struct drm_device *dev); extern void hsw_pc8_restore_interrupts(struct drm_device *dev); extern void intel_aux_display_runtime_get(struct drm_i915_private *dev_priv); extern void intel_aux_display_runtime_put(struct drm_i915_private *dev_priv); -extern void i915_disable_vga_mem(struct drm_device *dev); #endif /* __INTEL_DRV_H__ */ diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index dd176b7296c1..f4c5e95b2d6f 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3864,8 +3864,6 @@ static void valleyview_enable_rps(struct drm_device *dev) dev_priv->rps.rpe_delay), dev_priv->rps.rpe_delay); - INIT_DELAYED_WORK(&dev_priv->rps.vlv_work, vlv_rps_timer_work); - valleyview_set_rps(dev_priv->dev, dev_priv->rps.rpe_delay); gen6_enable_rps_interrupts(dev); @@ -4955,6 +4953,11 @@ static void haswell_init_clock_gating(struct drm_device *dev) I915_WRITE(GEN7_L3_CHICKEN_MODE_REGISTER, GEN7_WA_L3_CHICKEN_MODE); + /* L3 caching of data atomics doesn't work -- disable it. */ + I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); + I915_WRITE(HSW_ROW_CHICKEN3, + _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE)); + /* This is required by WaCatErrorRejectionIssue:hsw */ I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, I915_READ(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG) | @@ -5681,5 +5684,7 @@ void intel_pm_init(struct drm_device *dev) INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work, intel_gen6_powersave_work); + + INIT_DELAYED_WORK(&dev_priv->rps.vlv_work, vlv_rps_timer_work); } diff --git a/drivers/gpu/drm/i915/intel_tv.c b/drivers/gpu/drm/i915/intel_tv.c index f2c6d7909ae2..dd6f84bf6c22 100644 --- a/drivers/gpu/drm/i915/intel_tv.c +++ b/drivers/gpu/drm/i915/intel_tv.c @@ -916,6 +916,14 @@ intel_tv_compute_config(struct intel_encoder *encoder, DRM_DEBUG_KMS("forcing bpc to 8 for TV\n"); pipe_config->pipe_bpp = 8*3; + /* TV has it's own notion of sync and other mode flags, so clear them. */ + pipe_config->adjusted_mode.flags = 0; + + /* + * FIXME: We don't check whether the input mode is actually what we want + * or whether userspace is doing something stupid. + */ + return true; } diff --git a/drivers/gpu/drm/msm/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/mdp4/mdp4_kms.c index 5db5bbaedae2..bc7fd11ad8be 100644 --- a/drivers/gpu/drm/msm/mdp4/mdp4_kms.c +++ b/drivers/gpu/drm/msm/mdp4/mdp4_kms.c @@ -19,8 +19,6 @@ #include "msm_drv.h" #include "mdp4_kms.h" -#include - static struct mdp4_platform_config *mdp4_get_config(struct platform_device *dev); static int mdp4_hw_init(struct msm_kms *kms) diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 008d772384c7..b3a2f1629041 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -18,8 +18,6 @@ #include "msm_drv.h" #include "msm_gpu.h" -#include - static void msm_fb_output_poll_changed(struct drm_device *dev) { struct msm_drm_private *priv = dev->dev_private; @@ -62,6 +60,8 @@ int msm_iommu_attach(struct drm_device *dev, struct iommu_domain *iommu, int i, ret; for (i = 0; i < cnt; i++) { + /* TODO maybe some day msm iommu won't require this hack: */ + struct device *msm_iommu_get_ctx(const char *ctx_name); struct device *ctx = msm_iommu_get_ctx(names[i]); if (!ctx) continue; @@ -199,7 +199,7 @@ static int msm_load(struct drm_device *dev, unsigned long flags) * imx drm driver on iMX5 */ dev_err(dev->dev, "failed to load kms\n"); - ret = PTR_ERR(priv->kms); + ret = PTR_ERR(kms); goto fail; } @@ -697,7 +697,7 @@ static struct drm_driver msm_driver = { .gem_vm_ops = &vm_ops, .dumb_create = msm_gem_dumb_create, .dumb_map_offset = msm_gem_dumb_map_offset, - .dumb_destroy = msm_gem_dumb_destroy, + .dumb_destroy = drm_gem_dumb_destroy, #ifdef CONFIG_DEBUG_FS .debugfs_init = msm_debugfs_init, .debugfs_cleanup = msm_debugfs_cleanup, diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 29eacfa29cfb..2bae46c66a30 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -319,13 +319,6 @@ int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev, MSM_BO_SCANOUT | MSM_BO_WC, &args->handle); } -int msm_gem_dumb_destroy(struct drm_file *file, struct drm_device *dev, - uint32_t handle) -{ - /* No special work needed, drop the reference and see what falls out */ - return drm_gem_handle_delete(file, handle); -} - int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, uint32_t handle, uint64_t *offset) { diff --git a/drivers/gpu/drm/nouveau/core/subdev/mc/base.c b/drivers/gpu/drm/nouveau/core/subdev/mc/base.c index 37712a6df923..e290cfa4acee 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/mc/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/mc/base.c @@ -113,7 +113,7 @@ nouveau_mc_create_(struct nouveau_object *parent, struct nouveau_object *engine, pmc->use_msi = false; break; default: - pmc->use_msi = nouveau_boolopt(device->cfgopt, "NvMSI", true); + pmc->use_msi = nouveau_boolopt(device->cfgopt, "NvMSI", false); if (pmc->use_msi) { pmc->use_msi = pci_enable_msi(device->pdev) == 0; if (pmc->use_msi) { diff --git a/drivers/gpu/drm/radeon/btc_dpm.c b/drivers/gpu/drm/radeon/btc_dpm.c index 05ff315e8e9e..9b6950d9b3c0 100644 --- a/drivers/gpu/drm/radeon/btc_dpm.c +++ b/drivers/gpu/drm/radeon/btc_dpm.c @@ -1168,6 +1168,23 @@ static const struct radeon_blacklist_clocks btc_blacklist_clocks[] = { 25000, 30000, RADEON_SCLK_UP } }; +void btc_get_max_clock_from_voltage_dependency_table(struct radeon_clock_voltage_dependency_table *table, + u32 *max_clock) +{ + u32 i, clock = 0; + + if ((table == NULL) || (table->count == 0)) { + *max_clock = clock; + return; + } + + for (i = 0; i < table->count; i++) { + if (clock < table->entries[i].clk) + clock = table->entries[i].clk; + } + *max_clock = clock; +} + void btc_apply_voltage_dependency_rules(struct radeon_clock_voltage_dependency_table *table, u32 clock, u16 max_voltage, u16 *voltage) { @@ -1913,7 +1930,7 @@ static int btc_set_mc_special_registers(struct radeon_device *rdev, } j++; - if (j > SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) return -EINVAL; tmp = RREG32(MC_PMG_CMD_MRS); @@ -1928,7 +1945,7 @@ static int btc_set_mc_special_registers(struct radeon_device *rdev, } j++; - if (j > SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) return -EINVAL; break; case MC_SEQ_RESERVE_M >> 2: @@ -1942,7 +1959,7 @@ static int btc_set_mc_special_registers(struct radeon_device *rdev, } j++; - if (j > SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_EVERGREEN_MC_REGISTER_ARRAY_SIZE) return -EINVAL; break; default: @@ -2080,6 +2097,7 @@ static void btc_apply_state_adjust_rules(struct radeon_device *rdev, bool disable_mclk_switching; u32 mclk, sclk; u16 vddc, vddci; + u32 max_sclk_vddc, max_mclk_vddci, max_mclk_vddc; if ((rdev->pm.dpm.new_active_crtc_count > 1) || btc_dpm_vblank_too_short(rdev)) @@ -2121,6 +2139,39 @@ static void btc_apply_state_adjust_rules(struct radeon_device *rdev, ps->low.vddci = max_limits->vddci; } + /* limit clocks to max supported clocks based on voltage dependency tables */ + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_sclk, + &max_sclk_vddc); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk, + &max_mclk_vddci); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_mclk, + &max_mclk_vddc); + + if (max_sclk_vddc) { + if (ps->low.sclk > max_sclk_vddc) + ps->low.sclk = max_sclk_vddc; + if (ps->medium.sclk > max_sclk_vddc) + ps->medium.sclk = max_sclk_vddc; + if (ps->high.sclk > max_sclk_vddc) + ps->high.sclk = max_sclk_vddc; + } + if (max_mclk_vddci) { + if (ps->low.mclk > max_mclk_vddci) + ps->low.mclk = max_mclk_vddci; + if (ps->medium.mclk > max_mclk_vddci) + ps->medium.mclk = max_mclk_vddci; + if (ps->high.mclk > max_mclk_vddci) + ps->high.mclk = max_mclk_vddci; + } + if (max_mclk_vddc) { + if (ps->low.mclk > max_mclk_vddc) + ps->low.mclk = max_mclk_vddc; + if (ps->medium.mclk > max_mclk_vddc) + ps->medium.mclk = max_mclk_vddc; + if (ps->high.mclk > max_mclk_vddc) + ps->high.mclk = max_mclk_vddc; + } + /* XXX validate the min clocks required for display */ if (disable_mclk_switching) { diff --git a/drivers/gpu/drm/radeon/btc_dpm.h b/drivers/gpu/drm/radeon/btc_dpm.h index 1a15e0e41950..3b6f12b7760b 100644 --- a/drivers/gpu/drm/radeon/btc_dpm.h +++ b/drivers/gpu/drm/radeon/btc_dpm.h @@ -46,6 +46,8 @@ void btc_adjust_clock_combinations(struct radeon_device *rdev, struct rv7xx_pl *pl); void btc_apply_voltage_dependency_rules(struct radeon_clock_voltage_dependency_table *table, u32 clock, u16 max_voltage, u16 *voltage); +void btc_get_max_clock_from_voltage_dependency_table(struct radeon_clock_voltage_dependency_table *table, + u32 *max_clock); void btc_apply_voltage_delta_rules(struct radeon_device *rdev, u16 max_vddc, u16 max_vddci, u16 *vddc, u16 *vddci); diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c index 899627443030..51e947a97edf 100644 --- a/drivers/gpu/drm/radeon/ci_dpm.c +++ b/drivers/gpu/drm/radeon/ci_dpm.c @@ -146,6 +146,8 @@ static const struct ci_pt_config_reg didt_config_ci[] = }; extern u8 rv770_get_memory_module_index(struct radeon_device *rdev); +extern void btc_get_max_clock_from_voltage_dependency_table(struct radeon_clock_voltage_dependency_table *table, + u32 *max_clock); extern int ni_copy_and_switch_arb_sets(struct radeon_device *rdev, u32 arb_freq_src, u32 arb_freq_dest); extern u8 si_get_ddr3_mclk_frequency_ratio(u32 memory_clock); @@ -712,6 +714,7 @@ static void ci_apply_state_adjust_rules(struct radeon_device *rdev, struct radeon_clock_and_voltage_limits *max_limits; bool disable_mclk_switching; u32 sclk, mclk; + u32 max_sclk_vddc, max_mclk_vddci, max_mclk_vddc; int i; if ((rdev->pm.dpm.new_active_crtc_count > 1) || @@ -739,6 +742,29 @@ static void ci_apply_state_adjust_rules(struct radeon_device *rdev, } } + /* limit clocks to max supported clocks based on voltage dependency tables */ + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_sclk, + &max_sclk_vddc); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk, + &max_mclk_vddci); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_mclk, + &max_mclk_vddc); + + for (i = 0; i < ps->performance_level_count; i++) { + if (max_sclk_vddc) { + if (ps->performance_levels[i].sclk > max_sclk_vddc) + ps->performance_levels[i].sclk = max_sclk_vddc; + } + if (max_mclk_vddci) { + if (ps->performance_levels[i].mclk > max_mclk_vddci) + ps->performance_levels[i].mclk = max_mclk_vddci; + } + if (max_mclk_vddc) { + if (ps->performance_levels[i].mclk > max_mclk_vddc) + ps->performance_levels[i].mclk = max_mclk_vddc; + } + } + /* XXX validate the min clocks required for display */ if (disable_mclk_switching) { diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index adbdb6503b05..b874ccdf52f7 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -77,6 +77,8 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev); static void cik_program_aspm(struct radeon_device *rdev); static void cik_init_pg(struct radeon_device *rdev); static void cik_init_cg(struct radeon_device *rdev); +static void cik_fini_pg(struct radeon_device *rdev); +static void cik_fini_cg(struct radeon_device *rdev); static void cik_enable_gui_idle_interrupt(struct radeon_device *rdev, bool enable); @@ -2845,10 +2847,8 @@ static void cik_gpu_init(struct radeon_device *rdev) rdev->config.cik.tile_config |= (3 << 0); break; } - if ((mc_arb_ramcfg & NOOFBANK_MASK) >> NOOFBANK_SHIFT) - rdev->config.cik.tile_config |= 1 << 4; - else - rdev->config.cik.tile_config |= 0 << 4; + rdev->config.cik.tile_config |= + ((mc_arb_ramcfg & NOOFBANK_MASK) >> NOOFBANK_SHIFT) << 4; rdev->config.cik.tile_config |= ((gb_addr_config & PIPE_INTERLEAVE_SIZE_MASK) >> PIPE_INTERLEAVE_SIZE_SHIFT) << 8; rdev->config.cik.tile_config |= @@ -4187,6 +4187,10 @@ static void cik_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask) dev_info(rdev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n", RREG32(VM_CONTEXT1_PROTECTION_FAULT_STATUS)); + /* disable CG/PG */ + cik_fini_pg(rdev); + cik_fini_cg(rdev); + /* stop the rlc */ cik_rlc_stop(rdev); @@ -4456,8 +4460,8 @@ static int cik_mc_init(struct radeon_device *rdev) rdev->mc.aper_base = pci_resource_start(rdev->pdev, 0); rdev->mc.aper_size = pci_resource_len(rdev->pdev, 0); /* size in MB on si */ - rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024; - rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024; + rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE) * 1024ULL * 1024ULL; + rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE) * 1024ULL * 1024ULL; rdev->mc.visible_vram_size = rdev->mc.aper_size; si_vram_gtt_location(rdev, &rdev->mc); radeon_update_bandwidth_info(rdev); @@ -4735,12 +4739,13 @@ static void cik_vm_decode_fault(struct radeon_device *rdev, u32 mc_id = (status & MEMORY_CLIENT_ID_MASK) >> MEMORY_CLIENT_ID_SHIFT; u32 vmid = (status & FAULT_VMID_MASK) >> FAULT_VMID_SHIFT; u32 protections = (status & PROTECTIONS_MASK) >> PROTECTIONS_SHIFT; - char *block = (char *)&mc_client; + char block[5] = { mc_client >> 24, (mc_client >> 16) & 0xff, + (mc_client >> 8) & 0xff, mc_client & 0xff, 0 }; - printk("VM fault (0x%02x, vmid %d) at page %u, %s from %s (%d)\n", + printk("VM fault (0x%02x, vmid %d) at page %u, %s from '%s' (0x%08x) (%d)\n", protections, vmid, addr, (status & MEMORY_CLIENT_RW_MASK) ? "write" : "read", - block, mc_id); + block, mc_client, mc_id); } /** diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 555164e270a7..b5c67a99dda9 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3131,7 +3131,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev) rdev->config.evergreen.sx_max_export_size = 256; rdev->config.evergreen.sx_max_export_pos_size = 64; rdev->config.evergreen.sx_max_export_smx_size = 192; - rdev->config.evergreen.max_hw_contexts = 8; + rdev->config.evergreen.max_hw_contexts = 4; rdev->config.evergreen.sq_num_cf_insts = 2; rdev->config.evergreen.sc_prim_fifo_size = 0x40; diff --git a/drivers/gpu/drm/radeon/evergreen_hdmi.c b/drivers/gpu/drm/radeon/evergreen_hdmi.c index f71ce390aebe..f815c20640bd 100644 --- a/drivers/gpu/drm/radeon/evergreen_hdmi.c +++ b/drivers/gpu/drm/radeon/evergreen_hdmi.c @@ -288,8 +288,7 @@ void evergreen_hdmi_setmode(struct drm_encoder *encoder, struct drm_display_mode /* fglrx clears sth in AFMT_AUDIO_PACKET_CONTROL2 here */ WREG32(HDMI_ACR_PACKET_CONTROL + offset, - HDMI_ACR_AUTO_SEND | /* allow hw to sent ACR packets when required */ - HDMI_ACR_SOURCE); /* select SW CTS value */ + HDMI_ACR_AUTO_SEND); /* allow hw to sent ACR packets when required */ evergreen_hdmi_update_ACR(encoder, mode->clock); diff --git a/drivers/gpu/drm/radeon/evergreend.h b/drivers/gpu/drm/radeon/evergreend.h index 8768fd6a1e27..4f6d2962767d 100644 --- a/drivers/gpu/drm/radeon/evergreend.h +++ b/drivers/gpu/drm/radeon/evergreend.h @@ -1501,7 +1501,7 @@ * 6. COMMAND [29:22] | BYTE_COUNT [20:0] */ # define PACKET3_CP_DMA_DST_SEL(x) ((x) << 20) - /* 0 - SRC_ADDR + /* 0 - DST_ADDR * 1 - GDS */ # define PACKET3_CP_DMA_ENGINE(x) ((x) << 27) @@ -1516,7 +1516,7 @@ # define PACKET3_CP_DMA_CP_SYNC (1 << 31) /* COMMAND */ # define PACKET3_CP_DMA_DIS_WC (1 << 21) -# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 23) +# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 22) /* 0 - none * 1 - 8 in 16 * 2 - 8 in 32 diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c index 6c398a456d78..f26339028154 100644 --- a/drivers/gpu/drm/radeon/ni_dpm.c +++ b/drivers/gpu/drm/radeon/ni_dpm.c @@ -787,6 +787,7 @@ static void ni_apply_state_adjust_rules(struct radeon_device *rdev, bool disable_mclk_switching; u32 mclk, sclk; u16 vddc, vddci; + u32 max_sclk_vddc, max_mclk_vddci, max_mclk_vddc; int i; if ((rdev->pm.dpm.new_active_crtc_count > 1) || @@ -813,6 +814,29 @@ static void ni_apply_state_adjust_rules(struct radeon_device *rdev, } } + /* limit clocks to max supported clocks based on voltage dependency tables */ + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_sclk, + &max_sclk_vddc); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk, + &max_mclk_vddci); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_mclk, + &max_mclk_vddc); + + for (i = 0; i < ps->performance_level_count; i++) { + if (max_sclk_vddc) { + if (ps->performance_levels[i].sclk > max_sclk_vddc) + ps->performance_levels[i].sclk = max_sclk_vddc; + } + if (max_mclk_vddci) { + if (ps->performance_levels[i].mclk > max_mclk_vddci) + ps->performance_levels[i].mclk = max_mclk_vddci; + } + if (max_mclk_vddc) { + if (ps->performance_levels[i].mclk > max_mclk_vddc) + ps->performance_levels[i].mclk = max_mclk_vddc; + } + } + /* XXX validate the min clocks required for display */ if (disable_mclk_switching) { diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 24175717307b..d71333033b2b 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2933,9 +2933,11 @@ static int r100_debugfs_cp_ring_info(struct seq_file *m, void *data) seq_printf(m, "CP_RB_RPTR 0x%08x\n", rdp); seq_printf(m, "%u free dwords in ring\n", ring->ring_free_dw); seq_printf(m, "%u dwords in ring\n", count); - for (j = 0; j <= count; j++) { - i = (rdp + j) & ring->ptr_mask; - seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]); + if (ring->ready) { + for (j = 0; j <= count; j++) { + i = (rdp + j) & ring->ptr_mask; + seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]); + } } return 0; } diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c index e65f211a7be0..5513d8f06252 100644 --- a/drivers/gpu/drm/radeon/r600_dpm.c +++ b/drivers/gpu/drm/radeon/r600_dpm.c @@ -1084,7 +1084,7 @@ int r600_parse_extended_power_table(struct radeon_device *rdev) rdev->pm.dpm.dyn_state.uvd_clock_voltage_dependency_table.entries[i].dclk = le16_to_cpu(uvd_clk->usDClkLow) | (uvd_clk->ucDClkHigh << 16); rdev->pm.dpm.dyn_state.uvd_clock_voltage_dependency_table.entries[i].v = - le16_to_cpu(limits->entries[i].usVoltage); + le16_to_cpu(entry->usVoltage); entry = (ATOM_PPLIB_UVD_Clock_Voltage_Limit_Record *) ((u8 *)entry + sizeof(ATOM_PPLIB_UVD_Clock_Voltage_Limit_Record)); } diff --git a/drivers/gpu/drm/radeon/r600_hdmi.c b/drivers/gpu/drm/radeon/r600_hdmi.c index f443010ce90b..5b729319f27b 100644 --- a/drivers/gpu/drm/radeon/r600_hdmi.c +++ b/drivers/gpu/drm/radeon/r600_hdmi.c @@ -57,15 +57,15 @@ enum r600_hdmi_iec_status_bits { static const struct radeon_hdmi_acr r600_hdmi_predefined_acr[] = { /* 32kHz 44.1kHz 48kHz */ /* Clock N CTS N CTS N CTS */ - { 25174, 4576, 28125, 7007, 31250, 6864, 28125 }, /* 25,20/1.001 MHz */ + { 25175, 4576, 28125, 7007, 31250, 6864, 28125 }, /* 25,20/1.001 MHz */ { 25200, 4096, 25200, 6272, 28000, 6144, 25200 }, /* 25.20 MHz */ { 27000, 4096, 27000, 6272, 30000, 6144, 27000 }, /* 27.00 MHz */ { 27027, 4096, 27027, 6272, 30030, 6144, 27027 }, /* 27.00*1.001 MHz */ { 54000, 4096, 54000, 6272, 60000, 6144, 54000 }, /* 54.00 MHz */ { 54054, 4096, 54054, 6272, 60060, 6144, 54054 }, /* 54.00*1.001 MHz */ - { 74175, 11648, 210937, 17836, 234375, 11648, 140625 }, /* 74.25/1.001 MHz */ + { 74176, 11648, 210937, 17836, 234375, 11648, 140625 }, /* 74.25/1.001 MHz */ { 74250, 4096, 74250, 6272, 82500, 6144, 74250 }, /* 74.25 MHz */ - { 148351, 11648, 421875, 8918, 234375, 5824, 140625 }, /* 148.50/1.001 MHz */ + { 148352, 11648, 421875, 8918, 234375, 5824, 140625 }, /* 148.50/1.001 MHz */ { 148500, 4096, 148500, 6272, 165000, 6144, 148500 }, /* 148.50 MHz */ { 0, 4096, 0, 6272, 0, 6144, 0 } /* Other */ }; @@ -75,8 +75,15 @@ static const struct radeon_hdmi_acr r600_hdmi_predefined_acr[] = { */ static void r600_hdmi_calc_cts(uint32_t clock, int *CTS, int N, int freq) { - if (*CTS == 0) - *CTS = clock * N / (128 * freq) * 1000; + u64 n; + u32 d; + + if (*CTS == 0) { + n = (u64)clock * (u64)N * 1000ULL; + d = 128 * freq; + do_div(n, d); + *CTS = n; + } DRM_DEBUG("Using ACR timing N=%d CTS=%d for frequency %d\n", N, *CTS, freq); } @@ -257,10 +264,7 @@ void r600_audio_set_dto(struct drm_encoder *encoder, u32 clock) * number (coefficient of two integer numbers. DCCG_AUDIO_DTOx_PHASE * is the numerator, DCCG_AUDIO_DTOx_MODULE is the denominator */ - if (ASIC_IS_DCE3(rdev)) { - /* according to the reg specs, this should DCE3.2 only, but in - * practice it seems to cover DCE3.0 as well. - */ + if (ASIC_IS_DCE32(rdev)) { if (dig->dig_encoder == 0) { dto_cntl = RREG32(DCCG_AUDIO_DTO0_CNTL) & ~DCCG_AUDIO_DTO_WALLCLOCK_RATIO_MASK; dto_cntl |= DCCG_AUDIO_DTO_WALLCLOCK_RATIO(wallclock_ratio); @@ -276,8 +280,21 @@ void r600_audio_set_dto(struct drm_encoder *encoder, u32 clock) WREG32(DCCG_AUDIO_DTO1_MODULE, dto_modulo); WREG32(DCCG_AUDIO_DTO_SELECT, 1); /* select DTO1 */ } + } else if (ASIC_IS_DCE3(rdev)) { + /* according to the reg specs, this should DCE3.2 only, but in + * practice it seems to cover DCE3.0/3.1 as well. + */ + if (dig->dig_encoder == 0) { + WREG32(DCCG_AUDIO_DTO0_PHASE, base_rate * 100); + WREG32(DCCG_AUDIO_DTO0_MODULE, clock * 100); + WREG32(DCCG_AUDIO_DTO_SELECT, 0); /* select DTO0 */ + } else { + WREG32(DCCG_AUDIO_DTO1_PHASE, base_rate * 100); + WREG32(DCCG_AUDIO_DTO1_MODULE, clock * 100); + WREG32(DCCG_AUDIO_DTO_SELECT, 1); /* select DTO1 */ + } } else { - /* according to the reg specs, this should be DCE2.0 and DCE3.0 */ + /* according to the reg specs, this should be DCE2.0 and DCE3.0/3.1 */ WREG32(AUDIO_DTO, AUDIO_DTO_PHASE(base_rate / 10) | AUDIO_DTO_MODULE(clock / 10)); } @@ -434,8 +451,8 @@ void r600_hdmi_setmode(struct drm_encoder *encoder, struct drm_display_mode *mod } WREG32(HDMI0_ACR_PACKET_CONTROL + offset, - HDMI0_ACR_AUTO_SEND | /* allow hw to sent ACR packets when required */ - HDMI0_ACR_SOURCE); /* select SW CTS value */ + HDMI0_ACR_SOURCE | /* select SW CTS value - XXX verify that hw CTS works on all families */ + HDMI0_ACR_AUTO_SEND); /* allow hw to sent ACR packets when required */ WREG32(HDMI0_VBI_PACKET_CONTROL + offset, HDMI0_NULL_SEND | /* send null packets when required */ diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h index e673fe26ea84..7b3c7b5932c5 100644 --- a/drivers/gpu/drm/radeon/r600d.h +++ b/drivers/gpu/drm/radeon/r600d.h @@ -1523,7 +1523,7 @@ */ # define PACKET3_CP_DMA_CP_SYNC (1 << 31) /* COMMAND */ -# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 23) +# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 22) /* 0 - none * 1 - 8 in 16 * 2 - 8 in 32 diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c index 5003385a7512..8f7e04538fd6 100644 --- a/drivers/gpu/drm/radeon/radeon_asic.c +++ b/drivers/gpu/drm/radeon/radeon_asic.c @@ -1004,6 +1004,8 @@ static struct radeon_asic rv6xx_asic = { .wait_for_vblank = &avivo_wait_for_vblank, .set_backlight_level = &atombios_set_backlight_level, .get_backlight_level = &atombios_get_backlight_level, + .hdmi_enable = &r600_hdmi_enable, + .hdmi_setmode = &r600_hdmi_setmode, }, .copy = { .blit = &r600_copy_cpdma, diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c b/drivers/gpu/drm/radeon/radeon_atombios.c index 404e25d285ba..f79ee184ffd5 100644 --- a/drivers/gpu/drm/radeon/radeon_atombios.c +++ b/drivers/gpu/drm/radeon/radeon_atombios.c @@ -1367,6 +1367,7 @@ bool radeon_atombios_get_ppll_ss_info(struct radeon_device *rdev, int index = GetIndexIntoMasterTable(DATA, PPLL_SS_Info); uint16_t data_offset, size; struct _ATOM_SPREAD_SPECTRUM_INFO *ss_info; + struct _ATOM_SPREAD_SPECTRUM_ASSIGNMENT *ss_assign; uint8_t frev, crev; int i, num_indices; @@ -1378,18 +1379,21 @@ bool radeon_atombios_get_ppll_ss_info(struct radeon_device *rdev, num_indices = (size - sizeof(ATOM_COMMON_TABLE_HEADER)) / sizeof(ATOM_SPREAD_SPECTRUM_ASSIGNMENT); - + ss_assign = (struct _ATOM_SPREAD_SPECTRUM_ASSIGNMENT*) + ((u8 *)&ss_info->asSS_Info[0]); for (i = 0; i < num_indices; i++) { - if (ss_info->asSS_Info[i].ucSS_Id == id) { + if (ss_assign->ucSS_Id == id) { ss->percentage = - le16_to_cpu(ss_info->asSS_Info[i].usSpreadSpectrumPercentage); - ss->type = ss_info->asSS_Info[i].ucSpreadSpectrumType; - ss->step = ss_info->asSS_Info[i].ucSS_Step; - ss->delay = ss_info->asSS_Info[i].ucSS_Delay; - ss->range = ss_info->asSS_Info[i].ucSS_Range; - ss->refdiv = ss_info->asSS_Info[i].ucRecommendedRef_Div; + le16_to_cpu(ss_assign->usSpreadSpectrumPercentage); + ss->type = ss_assign->ucSpreadSpectrumType; + ss->step = ss_assign->ucSS_Step; + ss->delay = ss_assign->ucSS_Delay; + ss->range = ss_assign->ucSS_Range; + ss->refdiv = ss_assign->ucRecommendedRef_Div; return true; } + ss_assign = (struct _ATOM_SPREAD_SPECTRUM_ASSIGNMENT*) + ((u8 *)ss_assign + sizeof(struct _ATOM_SPREAD_SPECTRUM_ASSIGNMENT)); } } return false; @@ -1477,6 +1481,12 @@ union asic_ss_info { struct _ATOM_ASIC_INTERNAL_SS_INFO_V3 info_3; }; +union asic_ss_assignment { + struct _ATOM_ASIC_SS_ASSIGNMENT v1; + struct _ATOM_ASIC_SS_ASSIGNMENT_V2 v2; + struct _ATOM_ASIC_SS_ASSIGNMENT_V3 v3; +}; + bool radeon_atombios_get_asic_ss_info(struct radeon_device *rdev, struct radeon_atom_ss *ss, int id, u32 clock) @@ -1485,6 +1495,7 @@ bool radeon_atombios_get_asic_ss_info(struct radeon_device *rdev, int index = GetIndexIntoMasterTable(DATA, ASIC_InternalSS_Info); uint16_t data_offset, size; union asic_ss_info *ss_info; + union asic_ss_assignment *ss_assign; uint8_t frev, crev; int i, num_indices; @@ -1509,45 +1520,52 @@ bool radeon_atombios_get_asic_ss_info(struct radeon_device *rdev, num_indices = (size - sizeof(ATOM_COMMON_TABLE_HEADER)) / sizeof(ATOM_ASIC_SS_ASSIGNMENT); + ss_assign = (union asic_ss_assignment *)((u8 *)&ss_info->info.asSpreadSpectrum[0]); for (i = 0; i < num_indices; i++) { - if ((ss_info->info.asSpreadSpectrum[i].ucClockIndication == id) && - (clock <= le32_to_cpu(ss_info->info.asSpreadSpectrum[i].ulTargetClockRange))) { + if ((ss_assign->v1.ucClockIndication == id) && + (clock <= le32_to_cpu(ss_assign->v1.ulTargetClockRange))) { ss->percentage = - le16_to_cpu(ss_info->info.asSpreadSpectrum[i].usSpreadSpectrumPercentage); - ss->type = ss_info->info.asSpreadSpectrum[i].ucSpreadSpectrumMode; - ss->rate = le16_to_cpu(ss_info->info.asSpreadSpectrum[i].usSpreadRateInKhz); + le16_to_cpu(ss_assign->v1.usSpreadSpectrumPercentage); + ss->type = ss_assign->v1.ucSpreadSpectrumMode; + ss->rate = le16_to_cpu(ss_assign->v1.usSpreadRateInKhz); return true; } + ss_assign = (union asic_ss_assignment *) + ((u8 *)ss_assign + sizeof(ATOM_ASIC_SS_ASSIGNMENT)); } break; case 2: num_indices = (size - sizeof(ATOM_COMMON_TABLE_HEADER)) / sizeof(ATOM_ASIC_SS_ASSIGNMENT_V2); + ss_assign = (union asic_ss_assignment *)((u8 *)&ss_info->info_2.asSpreadSpectrum[0]); for (i = 0; i < num_indices; i++) { - if ((ss_info->info_2.asSpreadSpectrum[i].ucClockIndication == id) && - (clock <= le32_to_cpu(ss_info->info_2.asSpreadSpectrum[i].ulTargetClockRange))) { + if ((ss_assign->v2.ucClockIndication == id) && + (clock <= le32_to_cpu(ss_assign->v2.ulTargetClockRange))) { ss->percentage = - le16_to_cpu(ss_info->info_2.asSpreadSpectrum[i].usSpreadSpectrumPercentage); - ss->type = ss_info->info_2.asSpreadSpectrum[i].ucSpreadSpectrumMode; - ss->rate = le16_to_cpu(ss_info->info_2.asSpreadSpectrum[i].usSpreadRateIn10Hz); + le16_to_cpu(ss_assign->v2.usSpreadSpectrumPercentage); + ss->type = ss_assign->v2.ucSpreadSpectrumMode; + ss->rate = le16_to_cpu(ss_assign->v2.usSpreadRateIn10Hz); if ((crev == 2) && ((id == ASIC_INTERNAL_ENGINE_SS) || (id == ASIC_INTERNAL_MEMORY_SS))) ss->rate /= 100; return true; } + ss_assign = (union asic_ss_assignment *) + ((u8 *)ss_assign + sizeof(ATOM_ASIC_SS_ASSIGNMENT_V2)); } break; case 3: num_indices = (size - sizeof(ATOM_COMMON_TABLE_HEADER)) / sizeof(ATOM_ASIC_SS_ASSIGNMENT_V3); + ss_assign = (union asic_ss_assignment *)((u8 *)&ss_info->info_3.asSpreadSpectrum[0]); for (i = 0; i < num_indices; i++) { - if ((ss_info->info_3.asSpreadSpectrum[i].ucClockIndication == id) && - (clock <= le32_to_cpu(ss_info->info_3.asSpreadSpectrum[i].ulTargetClockRange))) { + if ((ss_assign->v3.ucClockIndication == id) && + (clock <= le32_to_cpu(ss_assign->v3.ulTargetClockRange))) { ss->percentage = - le16_to_cpu(ss_info->info_3.asSpreadSpectrum[i].usSpreadSpectrumPercentage); - ss->type = ss_info->info_3.asSpreadSpectrum[i].ucSpreadSpectrumMode; - ss->rate = le16_to_cpu(ss_info->info_3.asSpreadSpectrum[i].usSpreadRateIn10Hz); + le16_to_cpu(ss_assign->v3.usSpreadSpectrumPercentage); + ss->type = ss_assign->v3.ucSpreadSpectrumMode; + ss->rate = le16_to_cpu(ss_assign->v3.usSpreadRateIn10Hz); if ((id == ASIC_INTERNAL_ENGINE_SS) || (id == ASIC_INTERNAL_MEMORY_SS)) ss->rate /= 100; @@ -1555,6 +1573,8 @@ bool radeon_atombios_get_asic_ss_info(struct radeon_device *rdev, radeon_atombios_get_igp_ss_overrides(rdev, ss, id); return true; } + ss_assign = (union asic_ss_assignment *) + ((u8 *)ss_assign + sizeof(ATOM_ASIC_SS_ASSIGNMENT_V3)); } break; default: diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index ac6ece61a476..66c222836631 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -85,8 +85,9 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) VRAM, also but everything into VRAM on AGP cards to avoid image corruptions */ if (p->ring == R600_RING_TYPE_UVD_INDEX && - (i == 0 || p->rdev->flags & RADEON_IS_AGP)) { - /* TODO: is this still needed for NI+ ? */ + p->rdev->family < CHIP_PALM && + (i == 0 || drm_pci_device_is_agp(p->rdev->ddev))) { + p->relocs[i].lobj.domain = RADEON_GEM_DOMAIN_VRAM; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index e29faa73b574..841d0e09be3e 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1320,13 +1320,22 @@ int radeon_device_init(struct radeon_device *rdev, return r; } if ((radeon_testing & 1)) { - radeon_test_moves(rdev); + if (rdev->accel_working) + radeon_test_moves(rdev); + else + DRM_INFO("radeon: acceleration disabled, skipping move tests\n"); } if ((radeon_testing & 2)) { - radeon_test_syncing(rdev); + if (rdev->accel_working) + radeon_test_syncing(rdev); + else + DRM_INFO("radeon: acceleration disabled, skipping sync tests\n"); } if (radeon_benchmarking) { - radeon_benchmark(rdev, radeon_benchmarking); + if (rdev->accel_working) + radeon_benchmark(rdev, radeon_benchmarking); + else + DRM_INFO("radeon: acceleration disabled, skipping benchmarks\n"); } return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 87e1d69e8fdb..4f6b7fc7ad3c 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -945,6 +945,8 @@ void radeon_dpm_enable_uvd(struct radeon_device *rdev, bool enable) if (enable) { mutex_lock(&rdev->pm.mutex); rdev->pm.dpm.uvd_active = true; + /* disable this for now */ +#if 0 if ((rdev->pm.dpm.sd == 1) && (rdev->pm.dpm.hd == 0)) dpm_state = POWER_STATE_TYPE_INTERNAL_UVD_SD; else if ((rdev->pm.dpm.sd == 2) && (rdev->pm.dpm.hd == 0)) @@ -954,6 +956,7 @@ void radeon_dpm_enable_uvd(struct radeon_device *rdev, bool enable) else if ((rdev->pm.dpm.sd == 0) && (rdev->pm.dpm.hd == 2)) dpm_state = POWER_STATE_TYPE_INTERNAL_UVD_HD2; else +#endif dpm_state = POWER_STATE_TYPE_INTERNAL_UVD; rdev->pm.dpm.state = dpm_state; mutex_unlock(&rdev->pm.mutex); @@ -1002,7 +1005,7 @@ static void radeon_pm_resume_old(struct radeon_device *rdev) { /* set up the default clocks if the MC ucode is loaded */ if ((rdev->family >= CHIP_BARTS) && - (rdev->family <= CHIP_HAINAN) && + (rdev->family <= CHIP_CAYMAN) && rdev->mc_fw) { if (rdev->pm.default_vddc) radeon_atom_set_voltage(rdev, rdev->pm.default_vddc, @@ -1046,7 +1049,7 @@ static void radeon_pm_resume_dpm(struct radeon_device *rdev) if (ret) { DRM_ERROR("radeon: dpm resume failed\n"); if ((rdev->family >= CHIP_BARTS) && - (rdev->family <= CHIP_HAINAN) && + (rdev->family <= CHIP_CAYMAN) && rdev->mc_fw) { if (rdev->pm.default_vddc) radeon_atom_set_voltage(rdev, rdev->pm.default_vddc, @@ -1097,7 +1100,7 @@ static int radeon_pm_init_old(struct radeon_device *rdev) radeon_pm_init_profile(rdev); /* set up the default clocks if the MC ucode is loaded */ if ((rdev->family >= CHIP_BARTS) && - (rdev->family <= CHIP_HAINAN) && + (rdev->family <= CHIP_CAYMAN) && rdev->mc_fw) { if (rdev->pm.default_vddc) radeon_atom_set_voltage(rdev, rdev->pm.default_vddc, @@ -1183,7 +1186,7 @@ static int radeon_pm_init_dpm(struct radeon_device *rdev) if (ret) { rdev->pm.dpm_enabled = false; if ((rdev->family >= CHIP_BARTS) && - (rdev->family <= CHIP_HAINAN) && + (rdev->family <= CHIP_CAYMAN) && rdev->mc_fw) { if (rdev->pm.default_vddc) radeon_atom_set_voltage(rdev, rdev->pm.default_vddc, diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 46a25f037b84..18254e1c3e71 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -839,9 +839,11 @@ static int radeon_debugfs_ring_info(struct seq_file *m, void *data) * packet that is the root issue */ i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask; - for (j = 0; j <= (count + 32); j++) { - seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]); - i = (i + 1) & ring->ptr_mask; + if (ring->ready) { + for (j = 0; j <= (count + 32); j++) { + seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]); + i = (i + 1) & ring->ptr_mask; + } } return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_test.c b/drivers/gpu/drm/radeon/radeon_test.c index f4d6bcee9006..12e8099a0823 100644 --- a/drivers/gpu/drm/radeon/radeon_test.c +++ b/drivers/gpu/drm/radeon/radeon_test.c @@ -36,8 +36,8 @@ static void radeon_do_test_moves(struct radeon_device *rdev, int flag) struct radeon_bo *vram_obj = NULL; struct radeon_bo **gtt_obj = NULL; uint64_t gtt_addr, vram_addr; - unsigned i, n, size; - int r, ring; + unsigned n, size; + int i, r, ring; switch (flag) { case RADEON_TEST_COPY_DMA: diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 1a01bbff9bfa..4f2e73f79638 100644 --- a/drivers/gpu/drm/radeon/radeon_uvd.c +++ b/drivers/gpu/drm/radeon/radeon_uvd.c @@ -476,8 +476,7 @@ static int radeon_uvd_cs_reloc(struct radeon_cs_parser *p, return -EINVAL; } - /* TODO: is this still necessary on NI+ ? */ - if ((cmd == 0 || cmd == 0x3) && + if (p->rdev->family < CHIP_PALM && (cmd == 0 || cmd == 0x3) && (start >> 28) != (p->rdev->uvd.gpu_addr >> 28)) { DRM_ERROR("msg/fb buffer %LX-%LX out of 256MB segment!\n", start, end); @@ -799,7 +798,8 @@ void radeon_uvd_note_usage(struct radeon_device *rdev) (rdev->pm.dpm.hd != hd)) { rdev->pm.dpm.sd = sd; rdev->pm.dpm.hd = hd; - streams_changed = true; + /* disable this for now */ + /*streams_changed = true;*/ } } diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index c354c1094967..d4652af425b8 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -85,6 +85,9 @@ extern void si_dma_vm_set_page(struct radeon_device *rdev, uint32_t incr, uint32_t flags); static void si_enable_gui_idle_interrupt(struct radeon_device *rdev, bool enable); +static void si_fini_pg(struct radeon_device *rdev); +static void si_fini_cg(struct radeon_device *rdev); +static void si_rlc_stop(struct radeon_device *rdev); static const u32 verde_rlc_save_restore_register_list[] = { @@ -3608,6 +3611,13 @@ static void si_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask) dev_info(rdev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n", RREG32(VM_CONTEXT1_PROTECTION_FAULT_STATUS)); + /* disable PG/CG */ + si_fini_pg(rdev); + si_fini_cg(rdev); + + /* stop the rlc */ + si_rlc_stop(rdev); + /* Disable CP parsing/prefetching */ WREG32(CP_ME_CNTL, CP_ME_HALT | CP_PFP_HALT | CP_CE_HALT); diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c index cfe5d4d28915..2332aa1bf93c 100644 --- a/drivers/gpu/drm/radeon/si_dpm.c +++ b/drivers/gpu/drm/radeon/si_dpm.c @@ -2910,6 +2910,7 @@ static void si_apply_state_adjust_rules(struct radeon_device *rdev, bool disable_sclk_switching = false; u32 mclk, sclk; u16 vddc, vddci; + u32 max_sclk_vddc, max_mclk_vddci, max_mclk_vddc; int i; if ((rdev->pm.dpm.new_active_crtc_count > 1) || @@ -2943,6 +2944,29 @@ static void si_apply_state_adjust_rules(struct radeon_device *rdev, } } + /* limit clocks to max supported clocks based on voltage dependency tables */ + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_sclk, + &max_sclk_vddc); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk, + &max_mclk_vddci); + btc_get_max_clock_from_voltage_dependency_table(&rdev->pm.dpm.dyn_state.vddc_dependency_on_mclk, + &max_mclk_vddc); + + for (i = 0; i < ps->performance_level_count; i++) { + if (max_sclk_vddc) { + if (ps->performance_levels[i].sclk > max_sclk_vddc) + ps->performance_levels[i].sclk = max_sclk_vddc; + } + if (max_mclk_vddci) { + if (ps->performance_levels[i].mclk > max_mclk_vddci) + ps->performance_levels[i].mclk = max_mclk_vddci; + } + if (max_mclk_vddc) { + if (ps->performance_levels[i].mclk > max_mclk_vddc) + ps->performance_levels[i].mclk = max_mclk_vddc; + } + } + /* XXX validate the min clocks required for display */ if (disable_mclk_switching) { @@ -5184,7 +5208,7 @@ static int si_set_mc_special_registers(struct radeon_device *rdev, table->mc_reg_table_entry[k].mc_data[j] |= 0x100; } j++; - if (j > SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) return -EINVAL; if (!pi->mem_gddr5) { @@ -5194,7 +5218,7 @@ static int si_set_mc_special_registers(struct radeon_device *rdev, table->mc_reg_table_entry[k].mc_data[j] = (table->mc_reg_table_entry[k].mc_data[i] & 0xffff0000) >> 16; j++; - if (j > SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) return -EINVAL; } break; @@ -5207,7 +5231,7 @@ static int si_set_mc_special_registers(struct radeon_device *rdev, (temp_reg & 0xffff0000) | (table->mc_reg_table_entry[k].mc_data[i] & 0x0000ffff); j++; - if (j > SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) + if (j >= SMC_SISLANDS_MC_REGISTER_ARRAY_SIZE) return -EINVAL; break; default: diff --git a/drivers/gpu/drm/radeon/sid.h b/drivers/gpu/drm/radeon/sid.h index 52d2ab6b67a0..7e2e0ea66a00 100644 --- a/drivers/gpu/drm/radeon/sid.h +++ b/drivers/gpu/drm/radeon/sid.h @@ -1553,7 +1553,7 @@ * 6. COMMAND [30:21] | BYTE_COUNT [20:0] */ # define PACKET3_CP_DMA_DST_SEL(x) ((x) << 20) - /* 0 - SRC_ADDR + /* 0 - DST_ADDR * 1 - GDS */ # define PACKET3_CP_DMA_ENGINE(x) ((x) << 27) @@ -1568,7 +1568,7 @@ # define PACKET3_CP_DMA_CP_SYNC (1 << 31) /* COMMAND */ # define PACKET3_CP_DMA_DIS_WC (1 << 21) -# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 23) +# define PACKET3_CP_DMA_CMD_SRC_SWAP(x) ((x) << 22) /* 0 - none * 1 - 8 in 16 * 2 - 8 in 32 diff --git a/drivers/gpu/drm/radeon/trinity_dpm.c b/drivers/gpu/drm/radeon/trinity_dpm.c index 7f998bf1cc9d..9364129ba292 100644 --- a/drivers/gpu/drm/radeon/trinity_dpm.c +++ b/drivers/gpu/drm/radeon/trinity_dpm.c @@ -1868,7 +1868,7 @@ int trinity_dpm_init(struct radeon_device *rdev) for (i = 0; i < SUMO_MAX_HARDWARE_POWERLEVELS; i++) pi->at[i] = TRINITY_AT_DFLT; - pi->enable_bapm = true; + pi->enable_bapm = false; pi->enable_nbps_policy = true; pi->enable_sclk_ds = true; pi->enable_gfx_power_gating = true; diff --git a/drivers/gpu/drm/radeon/uvd_v1_0.c b/drivers/gpu/drm/radeon/uvd_v1_0.c index 7266805d9786..3100fa9cb52f 100644 --- a/drivers/gpu/drm/radeon/uvd_v1_0.c +++ b/drivers/gpu/drm/radeon/uvd_v1_0.c @@ -212,8 +212,8 @@ int uvd_v1_0_start(struct radeon_device *rdev) /* enable VCPU clock */ WREG32(UVD_VCPU_CNTL, 1 << 9); - /* enable UMC */ - WREG32_P(UVD_LMI_CTRL2, 0, ~(1 << 8)); + /* enable UMC and NC0 */ + WREG32_P(UVD_LMI_CTRL2, 1 << 13, ~((1 << 8) | (1 << 13))); /* boot up the VCPU */ WREG32(UVD_SOFT_RESET, 0); diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig index 71b70e3a7a71..c91d547191dd 100644 --- a/drivers/hid/Kconfig +++ b/drivers/hid/Kconfig @@ -241,6 +241,7 @@ config HID_HOLTEK - Sharkoon Drakonia / Perixx MX-2000 gaming mice - Tracer Sniper TRM-503 / NOVA Gaming Slider X200 / Zalman ZM-GM1 + - SHARKOON DarkGlider Gaming mouse config HOLTEK_FF bool "Holtek On Line Grip force feedback support" diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index b8470b1a10fe..5a8c01112a23 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1715,6 +1715,7 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_KEYBOARD) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A04A) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A067) }, + { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A081) }, { HID_USB_DEVICE(USB_VENDOR_ID_HUION, USB_DEVICE_ID_HUION_580) }, { HID_USB_DEVICE(USB_VENDOR_ID_JESS2, USB_DEVICE_ID_JESS2_COLOR_RUMBLE_PAD) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ION, USB_DEVICE_ID_ICADE) }, diff --git a/drivers/hid/hid-holtek-mouse.c b/drivers/hid/hid-holtek-mouse.c index 7e6db3cf46f9..e696566cde46 100644 --- a/drivers/hid/hid-holtek-mouse.c +++ b/drivers/hid/hid-holtek-mouse.c @@ -27,6 +27,7 @@ * - USB ID 04d9:a067, sold as Sharkoon Drakonia and Perixx MX-2000 * - USB ID 04d9:a04a, sold as Tracer Sniper TRM-503, NOVA Gaming Slider X200 * and Zalman ZM-GM1 + * - USB ID 04d9:a081, sold as SHARKOON DarkGlider Gaming mouse */ static __u8 *holtek_mouse_report_fixup(struct hid_device *hdev, __u8 *rdesc, @@ -46,6 +47,7 @@ static __u8 *holtek_mouse_report_fixup(struct hid_device *hdev, __u8 *rdesc, } break; case USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A04A: + case USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A081: if (*rsize >= 113 && rdesc[106] == 0xff && rdesc[107] == 0x7f && rdesc[111] == 0xff && rdesc[112] == 0x7f) { hid_info(hdev, "Fixing up report descriptor\n"); @@ -63,6 +65,8 @@ static const struct hid_device_id holtek_mouse_devices[] = { USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A067) }, { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A04A) }, + { HID_USB_DEVICE(USB_VENDOR_ID_HOLTEK_ALT, + USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A081) }, { } }; MODULE_DEVICE_TABLE(hid, holtek_mouse_devices); diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index e60e8d530697..9cbc7ab07dfa 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -450,6 +450,7 @@ #define USB_DEVICE_ID_HOLTEK_ALT_KEYBOARD 0xa055 #define USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A067 0xa067 #define USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A04A 0xa04a +#define USB_DEVICE_ID_HOLTEK_ALT_MOUSE_A081 0xa081 #define USB_VENDOR_ID_IMATION 0x0718 #define USB_DEVICE_ID_DISC_STAKKA 0xd000 diff --git a/drivers/hid/hid-roccat-kone.c b/drivers/hid/hid-roccat-kone.c index 602c188e9d86..6101816a7ddd 100644 --- a/drivers/hid/hid-roccat-kone.c +++ b/drivers/hid/hid-roccat-kone.c @@ -382,7 +382,7 @@ static ssize_t kone_sysfs_write_profilex(struct file *fp, } #define PROFILE_ATTR(number) \ static struct bin_attribute bin_attr_profile##number = { \ - .attr = { .name = "profile##number", .mode = 0660 }, \ + .attr = { .name = "profile" #number, .mode = 0660 }, \ .size = sizeof(struct kone_profile), \ .read = kone_sysfs_read_profilex, \ .write = kone_sysfs_write_profilex, \ diff --git a/drivers/hid/hid-roccat-koneplus.c b/drivers/hid/hid-roccat-koneplus.c index 5ddf605b6b89..5e99fcdc71b9 100644 --- a/drivers/hid/hid-roccat-koneplus.c +++ b/drivers/hid/hid-roccat-koneplus.c @@ -229,13 +229,13 @@ static ssize_t koneplus_sysfs_read_profilex_buttons(struct file *fp, #define PROFILE_ATTR(number) \ static struct bin_attribute bin_attr_profile##number##_settings = { \ - .attr = { .name = "profile##number##_settings", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_settings", .mode = 0440 }, \ .size = KONEPLUS_SIZE_PROFILE_SETTINGS, \ .read = koneplus_sysfs_read_profilex_settings, \ .private = &profile_numbers[number-1], \ }; \ static struct bin_attribute bin_attr_profile##number##_buttons = { \ - .attr = { .name = "profile##number##_buttons", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \ .size = KONEPLUS_SIZE_PROFILE_BUTTONS, \ .read = koneplus_sysfs_read_profilex_buttons, \ .private = &profile_numbers[number-1], \ diff --git a/drivers/hid/hid-roccat-kovaplus.c b/drivers/hid/hid-roccat-kovaplus.c index 515bc03136c0..0c8e1ef0b67d 100644 --- a/drivers/hid/hid-roccat-kovaplus.c +++ b/drivers/hid/hid-roccat-kovaplus.c @@ -257,13 +257,13 @@ static ssize_t kovaplus_sysfs_read_profilex_buttons(struct file *fp, #define PROFILE_ATTR(number) \ static struct bin_attribute bin_attr_profile##number##_settings = { \ - .attr = { .name = "profile##number##_settings", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_settings", .mode = 0440 }, \ .size = KOVAPLUS_SIZE_PROFILE_SETTINGS, \ .read = kovaplus_sysfs_read_profilex_settings, \ .private = &profile_numbers[number-1], \ }; \ static struct bin_attribute bin_attr_profile##number##_buttons = { \ - .attr = { .name = "profile##number##_buttons", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \ .size = KOVAPLUS_SIZE_PROFILE_BUTTONS, \ .read = kovaplus_sysfs_read_profilex_buttons, \ .private = &profile_numbers[number-1], \ diff --git a/drivers/hid/hid-roccat-pyra.c b/drivers/hid/hid-roccat-pyra.c index 5a6dbbeee790..1a07e07d99a0 100644 --- a/drivers/hid/hid-roccat-pyra.c +++ b/drivers/hid/hid-roccat-pyra.c @@ -225,13 +225,13 @@ static ssize_t pyra_sysfs_read_profilex_buttons(struct file *fp, #define PROFILE_ATTR(number) \ static struct bin_attribute bin_attr_profile##number##_settings = { \ - .attr = { .name = "profile##number##_settings", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_settings", .mode = 0440 }, \ .size = PYRA_SIZE_PROFILE_SETTINGS, \ .read = pyra_sysfs_read_profilex_settings, \ .private = &profile_numbers[number-1], \ }; \ static struct bin_attribute bin_attr_profile##number##_buttons = { \ - .attr = { .name = "profile##number##_buttons", .mode = 0440 }, \ + .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \ .size = PYRA_SIZE_PROFILE_BUTTONS, \ .read = pyra_sysfs_read_profilex_buttons, \ .private = &profile_numbers[number-1], \ diff --git a/drivers/hid/hid-wiimote-modules.c b/drivers/hid/hid-wiimote-modules.c index 2e7d644dba18..71adf9e60b13 100644 --- a/drivers/hid/hid-wiimote-modules.c +++ b/drivers/hid/hid-wiimote-modules.c @@ -119,12 +119,22 @@ static const struct wiimod_ops wiimod_keys = { * the rumble motor, this flag shouldn't be set. */ +/* used by wiimod_rumble and wiipro_rumble */ +static void wiimod_rumble_worker(struct work_struct *work) +{ + struct wiimote_data *wdata = container_of(work, struct wiimote_data, + rumble_worker); + + spin_lock_irq(&wdata->state.lock); + wiiproto_req_rumble(wdata, wdata->state.cache_rumble); + spin_unlock_irq(&wdata->state.lock); +} + static int wiimod_rumble_play(struct input_dev *dev, void *data, struct ff_effect *eff) { struct wiimote_data *wdata = input_get_drvdata(dev); __u8 value; - unsigned long flags; /* * The wiimote supports only a single rumble motor so if any magnitude @@ -137,9 +147,10 @@ static int wiimod_rumble_play(struct input_dev *dev, void *data, else value = 0; - spin_lock_irqsave(&wdata->state.lock, flags); - wiiproto_req_rumble(wdata, value); - spin_unlock_irqrestore(&wdata->state.lock, flags); + /* Locking state.lock here might deadlock with input_event() calls. + * schedule_work acts as barrier. Merging multiple changes is fine. */ + wdata->state.cache_rumble = value; + schedule_work(&wdata->rumble_worker); return 0; } @@ -147,6 +158,8 @@ static int wiimod_rumble_play(struct input_dev *dev, void *data, static int wiimod_rumble_probe(const struct wiimod_ops *ops, struct wiimote_data *wdata) { + INIT_WORK(&wdata->rumble_worker, wiimod_rumble_worker); + set_bit(FF_RUMBLE, wdata->input->ffbit); if (input_ff_create_memless(wdata->input, NULL, wiimod_rumble_play)) return -ENOMEM; @@ -159,6 +172,8 @@ static void wiimod_rumble_remove(const struct wiimod_ops *ops, { unsigned long flags; + cancel_work_sync(&wdata->rumble_worker); + spin_lock_irqsave(&wdata->state.lock, flags); wiiproto_req_rumble(wdata, 0); spin_unlock_irqrestore(&wdata->state.lock, flags); @@ -1731,7 +1746,6 @@ static int wiimod_pro_play(struct input_dev *dev, void *data, { struct wiimote_data *wdata = input_get_drvdata(dev); __u8 value; - unsigned long flags; /* * The wiimote supports only a single rumble motor so if any magnitude @@ -1744,9 +1758,10 @@ static int wiimod_pro_play(struct input_dev *dev, void *data, else value = 0; - spin_lock_irqsave(&wdata->state.lock, flags); - wiiproto_req_rumble(wdata, value); - spin_unlock_irqrestore(&wdata->state.lock, flags); + /* Locking state.lock here might deadlock with input_event() calls. + * schedule_work acts as barrier. Merging multiple changes is fine. */ + wdata->state.cache_rumble = value; + schedule_work(&wdata->rumble_worker); return 0; } @@ -1756,6 +1771,8 @@ static int wiimod_pro_probe(const struct wiimod_ops *ops, { int ret, i; + INIT_WORK(&wdata->rumble_worker, wiimod_rumble_worker); + wdata->extension.input = input_allocate_device(); if (!wdata->extension.input) return -ENOMEM; @@ -1817,12 +1834,13 @@ static void wiimod_pro_remove(const struct wiimod_ops *ops, if (!wdata->extension.input) return; + input_unregister_device(wdata->extension.input); + wdata->extension.input = NULL; + cancel_work_sync(&wdata->rumble_worker); + spin_lock_irqsave(&wdata->state.lock, flags); wiiproto_req_rumble(wdata, 0); spin_unlock_irqrestore(&wdata->state.lock, flags); - - input_unregister_device(wdata->extension.input); - wdata->extension.input = NULL; } static const struct wiimod_ops wiimod_pro = { diff --git a/drivers/hid/hid-wiimote.h b/drivers/hid/hid-wiimote.h index f1474f372c0b..75db0c400037 100644 --- a/drivers/hid/hid-wiimote.h +++ b/drivers/hid/hid-wiimote.h @@ -133,13 +133,15 @@ struct wiimote_state { __u8 *cmd_read_buf; __u8 cmd_read_size; - /* calibration data */ + /* calibration/cache data */ __u16 calib_bboard[4][3]; + __u8 cache_rumble; }; struct wiimote_data { struct hid_device *hdev; struct input_dev *input; + struct work_struct rumble_worker; struct led_classdev *leds[4]; struct input_dev *accel; struct input_dev *ir; diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c index 8918dd12bb69..6a6dd5cd7833 100644 --- a/drivers/hid/hidraw.c +++ b/drivers/hid/hidraw.c @@ -308,18 +308,25 @@ static int hidraw_fasync(int fd, struct file *file, int on) static void drop_ref(struct hidraw *hidraw, int exists_bit) { if (exists_bit) { - hid_hw_close(hidraw->hid); hidraw->exist = 0; - if (hidraw->open) + if (hidraw->open) { + hid_hw_close(hidraw->hid); wake_up_interruptible(&hidraw->wait); + } } else { --hidraw->open; } - - if (!hidraw->open && !hidraw->exist) { - device_destroy(hidraw_class, MKDEV(hidraw_major, hidraw->minor)); - hidraw_table[hidraw->minor] = NULL; - kfree(hidraw); + if (!hidraw->open) { + if (!hidraw->exist) { + device_destroy(hidraw_class, + MKDEV(hidraw_major, hidraw->minor)); + hidraw_table[hidraw->minor] = NULL; + kfree(hidraw); + } else { + /* close device for last reader */ + hid_hw_power(hidraw->hid, PM_HINT_NORMAL); + hid_hw_close(hidraw->hid); + } } } diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index 5bf2fb785844..93b00d76374c 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c @@ -615,7 +615,7 @@ static const struct file_operations uhid_fops = { static struct miscdevice uhid_misc = { .fops = &uhid_fops, - .minor = MISC_DYNAMIC_MINOR, + .minor = UHID_MINOR, .name = UHID_NAME, }; @@ -634,4 +634,5 @@ module_exit(uhid_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("David Herrmann "); MODULE_DESCRIPTION("User-space I/O driver support for HID subsystem"); +MODULE_ALIAS_MISCDEV(UHID_MINOR); MODULE_ALIAS("devname:" UHID_NAME); diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index 8f4743ab5fb2..936093e0271e 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -195,7 +195,7 @@ int vmbus_connect(void) do { ret = vmbus_negotiate_version(msginfo, version); - if (ret) + if (ret == -ETIMEDOUT) goto cleanup; if (vmbus_connection.conn_state == CONNECTED) diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index 28b03325b872..09988b289622 100644 --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -32,13 +32,17 @@ /* * Pre win8 version numbers used in ws2008 and ws 2008 r2 (win7) */ +#define WS2008_SRV_MAJOR 1 +#define WS2008_SRV_MINOR 0 +#define WS2008_SRV_VERSION (WS2008_SRV_MAJOR << 16 | WS2008_SRV_MINOR) + #define WIN7_SRV_MAJOR 3 #define WIN7_SRV_MINOR 0 -#define WIN7_SRV_MAJOR_MINOR (WIN7_SRV_MAJOR << 16 | WIN7_SRV_MINOR) +#define WIN7_SRV_VERSION (WIN7_SRV_MAJOR << 16 | WIN7_SRV_MINOR) #define WIN8_SRV_MAJOR 4 #define WIN8_SRV_MINOR 0 -#define WIN8_SRV_MAJOR_MINOR (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR) +#define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR) /* * Global state maintained for transaction that is being processed. @@ -587,6 +591,8 @@ void hv_kvp_onchannelcallback(void *context) struct icmsg_hdr *icmsghdrp; struct icmsg_negotiate *negop = NULL; + int util_fw_version; + int kvp_srv_version; if (kvp_transaction.active) { /* @@ -606,17 +612,26 @@ void hv_kvp_onchannelcallback(void *context) if (icmsghdrp->icmsgtype == ICMSGTYPE_NEGOTIATE) { /* - * We start with win8 version and if the host cannot - * support that we use the previous version. + * Based on the host, select appropriate + * framework and service versions we will + * negotiate. */ - if (vmbus_prep_negotiate_resp(icmsghdrp, negop, - recv_buffer, UTIL_FW_MAJOR_MINOR, - WIN8_SRV_MAJOR_MINOR)) - goto done; - + switch (vmbus_proto_version) { + case (VERSION_WS2008): + util_fw_version = UTIL_WS2K8_FW_VERSION; + kvp_srv_version = WS2008_SRV_VERSION; + break; + case (VERSION_WIN7): + util_fw_version = UTIL_FW_VERSION; + kvp_srv_version = WIN7_SRV_VERSION; + break; + default: + util_fw_version = UTIL_FW_VERSION; + kvp_srv_version = WIN8_SRV_VERSION; + } vmbus_prep_negotiate_resp(icmsghdrp, negop, - recv_buffer, UTIL_FW_MAJOR_MINOR, - WIN7_SRV_MAJOR_MINOR); + recv_buffer, util_fw_version, + kvp_srv_version); } else { kvp_msg = (struct hv_kvp_msg *)&recv_buffer[ @@ -649,7 +664,6 @@ void hv_kvp_onchannelcallback(void *context) return; } -done: icmsghdrp->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE; diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c index e4572f3f2834..0c3546224376 100644 --- a/drivers/hv/hv_snapshot.c +++ b/drivers/hv/hv_snapshot.c @@ -26,7 +26,7 @@ #define VSS_MAJOR 5 #define VSS_MINOR 0 -#define VSS_MAJOR_MINOR (VSS_MAJOR << 16 | VSS_MINOR) +#define VSS_VERSION (VSS_MAJOR << 16 | VSS_MINOR) @@ -190,8 +190,8 @@ void hv_vss_onchannelcallback(void *context) if (icmsghdrp->icmsgtype == ICMSGTYPE_NEGOTIATE) { vmbus_prep_negotiate_resp(icmsghdrp, negop, - recv_buffer, UTIL_FW_MAJOR_MINOR, - VSS_MAJOR_MINOR); + recv_buffer, UTIL_FW_VERSION, + VSS_VERSION); } else { vss_msg = (struct hv_vss_msg *)&recv_buffer[ sizeof(struct vmbuspipe_hdr) + diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c index cb82233541b1..273e3ddb3a20 100644 --- a/drivers/hv/hv_util.c +++ b/drivers/hv/hv_util.c @@ -28,17 +28,32 @@ #include #include -#define SHUTDOWN_MAJOR 3 -#define SHUTDOWN_MINOR 0 -#define SHUTDOWN_MAJOR_MINOR (SHUTDOWN_MAJOR << 16 | SHUTDOWN_MINOR) -#define TIMESYNCH_MAJOR 3 -#define TIMESYNCH_MINOR 0 -#define TIMESYNCH_MAJOR_MINOR (TIMESYNCH_MAJOR << 16 | TIMESYNCH_MINOR) +#define SD_MAJOR 3 +#define SD_MINOR 0 +#define SD_VERSION (SD_MAJOR << 16 | SD_MINOR) -#define HEARTBEAT_MAJOR 3 -#define HEARTBEAT_MINOR 0 -#define HEARTBEAT_MAJOR_MINOR (HEARTBEAT_MAJOR << 16 | HEARTBEAT_MINOR) +#define SD_WS2008_MAJOR 1 +#define SD_WS2008_VERSION (SD_WS2008_MAJOR << 16 | SD_MINOR) + +#define TS_MAJOR 3 +#define TS_MINOR 0 +#define TS_VERSION (TS_MAJOR << 16 | TS_MINOR) + +#define TS_WS2008_MAJOR 1 +#define TS_WS2008_VERSION (TS_WS2008_MAJOR << 16 | TS_MINOR) + +#define HB_MAJOR 3 +#define HB_MINOR 0 +#define HB_VERSION (HB_MAJOR << 16 | HB_MINOR) + +#define HB_WS2008_MAJOR 1 +#define HB_WS2008_VERSION (HB_WS2008_MAJOR << 16 | HB_MINOR) + +static int sd_srv_version; +static int ts_srv_version; +static int hb_srv_version; +static int util_fw_version; static void shutdown_onchannelcallback(void *context); static struct hv_util_service util_shutdown = { @@ -99,8 +114,8 @@ static void shutdown_onchannelcallback(void *context) if (icmsghdrp->icmsgtype == ICMSGTYPE_NEGOTIATE) { vmbus_prep_negotiate_resp(icmsghdrp, negop, - shut_txf_buf, UTIL_FW_MAJOR_MINOR, - SHUTDOWN_MAJOR_MINOR); + shut_txf_buf, util_fw_version, + sd_srv_version); } else { shutdown_msg = (struct shutdown_msg_data *)&shut_txf_buf[ @@ -216,6 +231,7 @@ static void timesync_onchannelcallback(void *context) struct icmsg_hdr *icmsghdrp; struct ictimesync_data *timedatap; u8 *time_txf_buf = util_timesynch.recv_buffer; + struct icmsg_negotiate *negop = NULL; vmbus_recvpacket(channel, time_txf_buf, PAGE_SIZE, &recvlen, &requestid); @@ -225,9 +241,10 @@ static void timesync_onchannelcallback(void *context) sizeof(struct vmbuspipe_hdr)]; if (icmsghdrp->icmsgtype == ICMSGTYPE_NEGOTIATE) { - vmbus_prep_negotiate_resp(icmsghdrp, NULL, time_txf_buf, - UTIL_FW_MAJOR_MINOR, - TIMESYNCH_MAJOR_MINOR); + vmbus_prep_negotiate_resp(icmsghdrp, negop, + time_txf_buf, + util_fw_version, + ts_srv_version); } else { timedatap = (struct ictimesync_data *)&time_txf_buf[ sizeof(struct vmbuspipe_hdr) + @@ -257,6 +274,7 @@ static void heartbeat_onchannelcallback(void *context) struct icmsg_hdr *icmsghdrp; struct heartbeat_msg_data *heartbeat_msg; u8 *hbeat_txf_buf = util_heartbeat.recv_buffer; + struct icmsg_negotiate *negop = NULL; vmbus_recvpacket(channel, hbeat_txf_buf, PAGE_SIZE, &recvlen, &requestid); @@ -266,9 +284,9 @@ static void heartbeat_onchannelcallback(void *context) sizeof(struct vmbuspipe_hdr)]; if (icmsghdrp->icmsgtype == ICMSGTYPE_NEGOTIATE) { - vmbus_prep_negotiate_resp(icmsghdrp, NULL, - hbeat_txf_buf, UTIL_FW_MAJOR_MINOR, - HEARTBEAT_MAJOR_MINOR); + vmbus_prep_negotiate_resp(icmsghdrp, negop, + hbeat_txf_buf, util_fw_version, + hb_srv_version); } else { heartbeat_msg = (struct heartbeat_msg_data *)&hbeat_txf_buf[ @@ -321,6 +339,25 @@ static int util_probe(struct hv_device *dev, goto error; hv_set_drvdata(dev, srv); + /* + * Based on the host; initialize the framework and + * service version numbers we will negotiate. + */ + switch (vmbus_proto_version) { + case (VERSION_WS2008): + util_fw_version = UTIL_WS2K8_FW_VERSION; + sd_srv_version = SD_WS2008_VERSION; + ts_srv_version = TS_WS2008_VERSION; + hb_srv_version = HB_WS2008_VERSION; + break; + + default: + util_fw_version = UTIL_FW_VERSION; + sd_srv_version = SD_VERSION; + ts_srv_version = TS_VERSION; + hb_srv_version = HB_VERSION; + } + return 0; error: diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 62c2e32e25ef..3288f13d2d87 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -230,6 +230,7 @@ static int send_argument(const char *key) static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) { + u8 status, data = 0; int i; if (send_command(cmd) || send_argument(key)) { @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } + /* This has no effect on newer (2012) SMCs */ if (send_byte(len, APPLESMC_DATA_PORT)) { pr_warn("%.4s: read len fail\n", key); return -EIO; @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) buffer[i] = inb(APPLESMC_DATA_PORT); } + /* Read the data port until bit0 is cleared */ + for (i = 0; i < 16; i++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (i) + pr_warn("flushed %d bytes, last value is: %d\n", i, data); + return 0; } @@ -525,16 +538,25 @@ static int applesmc_init_smcreg_try(void) { struct applesmc_registers *s = &smcreg; bool left_light_sensor, right_light_sensor; + unsigned int count; u8 tmp[1]; int ret; if (s->init_complete) return 0; - ret = read_register_count(&s->key_count); + ret = read_register_count(&count); if (ret) return ret; + if (s->cache && s->key_count != count) { + pr_warn("key count changed from %d to %d\n", + s->key_count, count); + kfree(s->cache); + s->cache = NULL; + } + s->key_count = count; + if (!s->cache) s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); if (!s->cache) diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c index dbecf08399f8..5888feef1ac5 100644 --- a/drivers/i2c/busses/i2c-designware-core.c +++ b/drivers/i2c/busses/i2c-designware-core.c @@ -98,6 +98,8 @@ #define DW_IC_ERR_TX_ABRT 0x1 +#define DW_IC_TAR_10BITADDR_MASTER BIT(12) + /* * status codes */ @@ -388,22 +390,34 @@ static int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev) static void i2c_dw_xfer_init(struct dw_i2c_dev *dev) { struct i2c_msg *msgs = dev->msgs; - u32 ic_con; + u32 ic_con, ic_tar = 0; /* Disable the adapter */ __i2c_dw_enable(dev, false); - /* set the slave (target) address */ - dw_writel(dev, msgs[dev->msg_write_idx].addr, DW_IC_TAR); - /* if the slave address is ten bit address, enable 10BITADDR */ ic_con = dw_readl(dev, DW_IC_CON); - if (msgs[dev->msg_write_idx].flags & I2C_M_TEN) + if (msgs[dev->msg_write_idx].flags & I2C_M_TEN) { ic_con |= DW_IC_CON_10BITADDR_MASTER; - else + /* + * If I2C_DYNAMIC_TAR_UPDATE is set, the 10-bit addressing + * mode has to be enabled via bit 12 of IC_TAR register. + * We set it always as I2C_DYNAMIC_TAR_UPDATE can't be + * detected from registers. + */ + ic_tar = DW_IC_TAR_10BITADDR_MASTER; + } else { ic_con &= ~DW_IC_CON_10BITADDR_MASTER; + } + dw_writel(dev, ic_con, DW_IC_CON); + /* + * Set the slave (target) address and enable 10-bit addressing mode + * if applicable. + */ + dw_writel(dev, msgs[dev->msg_write_idx].addr | ic_tar, DW_IC_TAR); + /* Enable the adapter */ __i2c_dw_enable(dev, true); diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c index 4c1b60539a25..0aa01136f8d9 100644 --- a/drivers/i2c/busses/i2c-designware-platdrv.c +++ b/drivers/i2c/busses/i2c-designware-platdrv.c @@ -270,7 +270,8 @@ static SIMPLE_DEV_PM_OPS(dw_i2c_dev_pm_ops, dw_i2c_suspend, dw_i2c_resume); MODULE_ALIAS("platform:i2c_designware"); static struct platform_driver dw_i2c_driver = { - .remove = dw_i2c_remove, + .probe = dw_i2c_probe, + .remove = dw_i2c_remove, .driver = { .name = "i2c_designware", .owner = THIS_MODULE, @@ -282,7 +283,7 @@ static struct platform_driver dw_i2c_driver = { static int __init dw_i2c_init_driver(void) { - return platform_driver_probe(&dw_i2c_driver, dw_i2c_probe); + return platform_driver_register(&dw_i2c_driver); } subsys_initcall(dw_i2c_init_driver); diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c index ccf46656bdad..1d7efa3169cd 100644 --- a/drivers/i2c/busses/i2c-imx.c +++ b/drivers/i2c/busses/i2c-imx.c @@ -365,7 +365,7 @@ static void i2c_imx_stop(struct imx_i2c_struct *i2c_imx) clk_disable_unprepare(i2c_imx->clk); } -static void __init i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx, +static void i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx, unsigned int rate) { struct imx_i2c_clk_pair *i2c_clk_div = i2c_imx->hwdata->clk_div; @@ -589,7 +589,7 @@ static struct i2c_algorithm i2c_imx_algo = { .functionality = i2c_imx_func, }; -static int __init i2c_imx_probe(struct platform_device *pdev) +static int i2c_imx_probe(struct platform_device *pdev) { const struct of_device_id *of_id = of_match_device(i2c_imx_dt_ids, &pdev->dev); @@ -697,7 +697,7 @@ static int __init i2c_imx_probe(struct platform_device *pdev) return 0; /* Return OK */ } -static int __exit i2c_imx_remove(struct platform_device *pdev) +static int i2c_imx_remove(struct platform_device *pdev) { struct imx_i2c_struct *i2c_imx = platform_get_drvdata(pdev); @@ -715,7 +715,8 @@ static int __exit i2c_imx_remove(struct platform_device *pdev) } static struct platform_driver i2c_imx_driver = { - .remove = __exit_p(i2c_imx_remove), + .probe = i2c_imx_probe, + .remove = i2c_imx_remove, .driver = { .name = DRIVER_NAME, .owner = THIS_MODULE, @@ -726,7 +727,7 @@ static struct platform_driver i2c_imx_driver = { static int __init i2c_adap_imx_init(void) { - return platform_driver_probe(&i2c_imx_driver, i2c_imx_probe); + return platform_driver_register(&i2c_imx_driver); } subsys_initcall(i2c_adap_imx_init); diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c index 8ed79a086f85..1672effbcebb 100644 --- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -393,6 +393,9 @@ static int ismt_access(struct i2c_adapter *adap, u16 addr, desc = &priv->hw[priv->head]; + /* Initialize the DMA buffer */ + memset(priv->dma_buffer, 0, sizeof(priv->dma_buffer)); + /* Initialize the descriptor */ memset(desc, 0, sizeof(struct ismt_desc)); desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, read_write); diff --git a/drivers/i2c/busses/i2c-mv64xxx.c b/drivers/i2c/busses/i2c-mv64xxx.c index 7f3a47443494..d3e9cc3153a9 100644 --- a/drivers/i2c/busses/i2c-mv64xxx.c +++ b/drivers/i2c/busses/i2c-mv64xxx.c @@ -234,9 +234,9 @@ static int mv64xxx_i2c_offload_msg(struct mv64xxx_i2c_data *drv_data) ctrl_reg |= MV64XXX_I2C_BRIDGE_CONTROL_WR | (msg->len - 1) << MV64XXX_I2C_BRIDGE_CONTROL_TX_SIZE_SHIFT; - writel_relaxed(data_reg_lo, + writel(data_reg_lo, drv_data->reg_base + MV64XXX_I2C_REG_TX_DATA_LO); - writel_relaxed(data_reg_hi, + writel(data_reg_hi, drv_data->reg_base + MV64XXX_I2C_REG_TX_DATA_HI); } else { @@ -697,6 +697,7 @@ static const struct of_device_id mv64xxx_i2c_of_match_table[] = { MODULE_DEVICE_TABLE(of, mv64xxx_i2c_of_match_table); #ifdef CONFIG_OF +#ifdef CONFIG_HAVE_CLK static int mv64xxx_calc_freq(const int tclk, const int n, const int m) { @@ -726,16 +727,12 @@ mv64xxx_find_baud_factors(const u32 req_freq, const u32 tclk, u32 *best_n, return false; return true; } +#endif /* CONFIG_HAVE_CLK */ static int mv64xxx_of_config(struct mv64xxx_i2c_data *drv_data, struct device *dev) { - const struct of_device_id *device; - struct device_node *np = dev->of_node; - u32 bus_freq, tclk; - int rc = 0; - /* CLK is mandatory when using DT to describe the i2c bus. We * need to know tclk in order to calculate bus clock * factors. @@ -744,6 +741,11 @@ mv64xxx_of_config(struct mv64xxx_i2c_data *drv_data, /* Have OF but no CLK */ return -ENODEV; #else + const struct of_device_id *device; + struct device_node *np = dev->of_node; + u32 bus_freq, tclk; + int rc = 0; + if (IS_ERR(drv_data->clk)) { rc = -ENODEV; goto out; diff --git a/drivers/i2c/busses/i2c-mxs.c b/drivers/i2c/busses/i2c-mxs.c index f4a01675fa71..b7c857774708 100644 --- a/drivers/i2c/busses/i2c-mxs.c +++ b/drivers/i2c/busses/i2c-mxs.c @@ -780,12 +780,13 @@ static struct platform_driver mxs_i2c_driver = { .owner = THIS_MODULE, .of_match_table = mxs_i2c_dt_ids, }, + .probe = mxs_i2c_probe, .remove = mxs_i2c_remove, }; static int __init mxs_i2c_init(void) { - return platform_driver_probe(&mxs_i2c_driver, mxs_i2c_probe); + return platform_driver_register(&mxs_i2c_driver); } subsys_initcall(mxs_i2c_init); diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c index 6d8308d5dc4e..9967a6f9c2ff 100644 --- a/drivers/i2c/busses/i2c-omap.c +++ b/drivers/i2c/busses/i2c-omap.c @@ -939,6 +939,9 @@ omap_i2c_isr_thread(int this_irq, void *dev_id) /* * ProDB0017052: Clear ARDY bit twice */ + if (stat & OMAP_I2C_STAT_ARDY) + omap_i2c_ack_stat(dev, OMAP_I2C_STAT_ARDY); + if (stat & (OMAP_I2C_STAT_ARDY | OMAP_I2C_STAT_NACK | OMAP_I2C_STAT_AL)) { omap_i2c_ack_stat(dev, (OMAP_I2C_STAT_RRDY | diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c index 3535f3c0f7b4..3747b9bf67d6 100644 --- a/drivers/i2c/busses/i2c-s3c2410.c +++ b/drivers/i2c/busses/i2c-s3c2410.c @@ -1178,8 +1178,6 @@ static int s3c24xx_i2c_remove(struct platform_device *pdev) i2c_del_adapter(&i2c->adap); - clk_disable_unprepare(i2c->clk); - if (pdev->dev.of_node && IS_ERR(i2c->pctrl)) s3c24xx_i2c_dt_gpio_free(i2c); diff --git a/drivers/i2c/busses/i2c-stu300.c b/drivers/i2c/busses/i2c-stu300.c index f8f6f2e552db..04a17b9b38bb 100644 --- a/drivers/i2c/busses/i2c-stu300.c +++ b/drivers/i2c/busses/i2c-stu300.c @@ -859,8 +859,7 @@ static const struct i2c_algorithm stu300_algo = { .functionality = stu300_func, }; -static int __init -stu300_probe(struct platform_device *pdev) +static int stu300_probe(struct platform_device *pdev) { struct stu300_dev *dev; struct i2c_adapter *adap; @@ -966,8 +965,7 @@ static SIMPLE_DEV_PM_OPS(stu300_pm, stu300_suspend, stu300_resume); #define STU300_I2C_PM NULL #endif -static int __exit -stu300_remove(struct platform_device *pdev) +static int stu300_remove(struct platform_device *pdev) { struct stu300_dev *dev = platform_get_drvdata(pdev); @@ -989,13 +987,14 @@ static struct platform_driver stu300_i2c_driver = { .pm = STU300_I2C_PM, .of_match_table = stu300_dt_match, }, - .remove = __exit_p(stu300_remove), + .probe = stu300_probe, + .remove = stu300_remove, }; static int __init stu300_init(void) { - return platform_driver_probe(&stu300_i2c_driver, stu300_probe); + return platform_driver_register(&stu300_i2c_driver); } static void __exit stu300_exit(void) diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 29d3f045a2bf..3be58f89ac77 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -1134,6 +1134,9 @@ static void acpi_i2c_register_devices(struct i2c_adapter *adap) acpi_handle handle; acpi_status status; + if (!adap->dev.parent) + return; + handle = ACPI_HANDLE(adap->dev.parent); if (!handle) return; diff --git a/drivers/i2c/muxes/i2c-arb-gpio-challenge.c b/drivers/i2c/muxes/i2c-arb-gpio-challenge.c index 74b41ae690f3..928656e241dd 100644 --- a/drivers/i2c/muxes/i2c-arb-gpio-challenge.c +++ b/drivers/i2c/muxes/i2c-arb-gpio-challenge.c @@ -200,7 +200,7 @@ static int i2c_arbitrator_probe(struct platform_device *pdev) arb->parent = of_find_i2c_adapter_by_node(parent_np); if (!arb->parent) { dev_err(dev, "Cannot find parent bus\n"); - return -EINVAL; + return -EPROBE_DEFER; } /* Actually add the mux adapter */ diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c index 5d4a99ba743e..a764da777f08 100644 --- a/drivers/i2c/muxes/i2c-mux-gpio.c +++ b/drivers/i2c/muxes/i2c-mux-gpio.c @@ -66,7 +66,7 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux, struct device_node *adapter_np, *child; struct i2c_adapter *adapter; unsigned *values, *gpios; - int i = 0; + int i = 0, ret; if (!np) return -ENODEV; @@ -79,7 +79,7 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux, adapter = of_find_i2c_adapter_by_node(adapter_np); if (!adapter) { dev_err(&pdev->dev, "Cannot find parent bus\n"); - return -ENODEV; + return -EPROBE_DEFER; } mux->data.parent = i2c_adapter_id(adapter); put_device(&adapter->dev); @@ -116,8 +116,12 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux, return -ENOMEM; } - for (i = 0; i < mux->data.n_gpios; i++) - gpios[i] = of_get_named_gpio(np, "mux-gpios", i); + for (i = 0; i < mux->data.n_gpios; i++) { + ret = of_get_named_gpio(np, "mux-gpios", i); + if (ret < 0) + return ret; + gpios[i] = ret; + } mux->data.gpios = gpios; @@ -177,7 +181,7 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev) if (!parent) { dev_err(&pdev->dev, "Parent adapter (%d) not found\n", mux->data.parent); - return -ENODEV; + return -EPROBE_DEFER; } mux->parent = parent; diff --git a/drivers/i2c/muxes/i2c-mux-pinctrl.c b/drivers/i2c/muxes/i2c-mux-pinctrl.c index 69a91732ae65..68a37157377d 100644 --- a/drivers/i2c/muxes/i2c-mux-pinctrl.c +++ b/drivers/i2c/muxes/i2c-mux-pinctrl.c @@ -113,7 +113,7 @@ static int i2c_mux_pinctrl_parse_dt(struct i2c_mux_pinctrl *mux, adapter = of_find_i2c_adapter_by_node(adapter_np); if (!adapter) { dev_err(mux->dev, "Cannot find parent bus\n"); - return -ENODEV; + return -EPROBE_DEFER; } mux->pdata->parent_bus_num = i2c_adapter_id(adapter); put_device(&adapter->dev); @@ -211,7 +211,7 @@ static int i2c_mux_pinctrl_probe(struct platform_device *pdev) if (!mux->parent) { dev_err(&pdev->dev, "Parent adapter (%d) not found\n", mux->pdata->parent_bus_num); - ret = -ENODEV; + ret = -EPROBE_DEFER; goto err; } diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index fa6964d8681a..f116d664b473 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_device *dev, if (!(lapic_timer_reliable_states & (1 << (cstate)))) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); - if (!need_resched()) { + if (!current_set_polling_and_test()) { __monitor((void *)¤t_thread_info()->flags, 0, 0); smp_mb(); diff --git a/drivers/iio/amplifiers/ad8366.c b/drivers/iio/amplifiers/ad8366.c index d0a79a4bce1c..ba6f6a91dfff 100644 --- a/drivers/iio/amplifiers/ad8366.c +++ b/drivers/iio/amplifiers/ad8366.c @@ -185,10 +185,8 @@ static int ad8366_remove(struct spi_device *spi) iio_device_unregister(indio_dev); - if (!IS_ERR(reg)) { + if (!IS_ERR(reg)) regulator_disable(reg); - regulator_put(reg); - } return 0; } diff --git a/drivers/iio/frequency/adf4350.c b/drivers/iio/frequency/adf4350.c index a7b30be86ae0..52605c0ea3a6 100644 --- a/drivers/iio/frequency/adf4350.c +++ b/drivers/iio/frequency/adf4350.c @@ -525,8 +525,10 @@ static int adf4350_probe(struct spi_device *spi) } indio_dev = devm_iio_device_alloc(&spi->dev, sizeof(*st)); - if (indio_dev == NULL) - return -ENOMEM; + if (indio_dev == NULL) { + ret = -ENOMEM; + goto error_disable_clk; + } st = iio_priv(indio_dev); diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index 2710f7245c3b..2db7dcd826b9 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -477,6 +477,9 @@ void iio_disable_all_buffers(struct iio_dev *indio_dev) indio_dev->currentmode = INDIO_DIRECT_MODE; if (indio_dev->setup_ops->postdisable) indio_dev->setup_ops->postdisable(indio_dev); + + if (indio_dev->available_scan_masks == NULL) + kfree(indio_dev->active_scan_mask); } int iio_update_buffers(struct iio_dev *indio_dev, diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c index 8e84cd522e49..f95c6979efd8 100644 --- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -852,7 +852,6 @@ static void iio_dev_release(struct device *device) iio_device_unregister_trigger_consumer(indio_dev); iio_device_unregister_eventset(indio_dev); iio_device_unregister_sysfs(indio_dev); - iio_device_unregister_debugfs(indio_dev); ida_simple_remove(&iio_ida, indio_dev->id); kfree(indio_dev); @@ -1087,6 +1086,7 @@ void iio_device_unregister(struct iio_dev *indio_dev) if (indio_dev->chrdev.dev) cdev_del(&indio_dev->chrdev); + iio_device_unregister_debugfs(indio_dev); iio_disable_all_buffers(indio_dev); diff --git a/drivers/iio/magnetometer/st_magn_core.c b/drivers/iio/magnetometer/st_magn_core.c index e8d2849cc81d..cab3bc7494a2 100644 --- a/drivers/iio/magnetometer/st_magn_core.c +++ b/drivers/iio/magnetometer/st_magn_core.c @@ -29,9 +29,9 @@ #define ST_MAGN_NUMBER_DATA_CHANNELS 3 /* DEFAULT VALUE FOR SENSORS */ -#define ST_MAGN_DEFAULT_OUT_X_L_ADDR 0X04 -#define ST_MAGN_DEFAULT_OUT_Y_L_ADDR 0X08 -#define ST_MAGN_DEFAULT_OUT_Z_L_ADDR 0X06 +#define ST_MAGN_DEFAULT_OUT_X_H_ADDR 0X03 +#define ST_MAGN_DEFAULT_OUT_Y_H_ADDR 0X07 +#define ST_MAGN_DEFAULT_OUT_Z_H_ADDR 0X05 /* FULLSCALE */ #define ST_MAGN_FS_AVL_1300MG 1300 @@ -117,16 +117,16 @@ static const struct iio_chan_spec st_magn_16bit_channels[] = { ST_SENSORS_LSM_CHANNELS(IIO_MAGN, BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE), - ST_SENSORS_SCAN_X, 1, IIO_MOD_X, 's', IIO_LE, 16, 16, - ST_MAGN_DEFAULT_OUT_X_L_ADDR), + ST_SENSORS_SCAN_X, 1, IIO_MOD_X, 's', IIO_BE, 16, 16, + ST_MAGN_DEFAULT_OUT_X_H_ADDR), ST_SENSORS_LSM_CHANNELS(IIO_MAGN, BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE), - ST_SENSORS_SCAN_Y, 1, IIO_MOD_Y, 's', IIO_LE, 16, 16, - ST_MAGN_DEFAULT_OUT_Y_L_ADDR), + ST_SENSORS_SCAN_Y, 1, IIO_MOD_Y, 's', IIO_BE, 16, 16, + ST_MAGN_DEFAULT_OUT_Y_H_ADDR), ST_SENSORS_LSM_CHANNELS(IIO_MAGN, BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE), - ST_SENSORS_SCAN_Z, 1, IIO_MOD_Z, 's', IIO_LE, 16, 16, - ST_MAGN_DEFAULT_OUT_Z_L_ADDR), + ST_SENSORS_SCAN_Z, 1, IIO_MOD_Z, 's', IIO_BE, 16, 16, + ST_MAGN_DEFAULT_OUT_Z_H_ADDR), IIO_CHAN_SOFT_TIMESTAMP(3) }; diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c index d5d1929753e4..cedda25232be 100644 --- a/drivers/infiniband/hw/amso1100/c2_ae.c +++ b/drivers/infiniband/hw/amso1100/c2_ae.c @@ -141,7 +141,7 @@ static const char *to_qp_state_str(int state) return "C2_QP_STATE_ERROR"; default: return ""; - }; + } } void c2_ae_event(struct c2_dev *c2dev, u32 mq_index) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de9a4d8..b1a6cb3a2809 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -164,6 +164,7 @@ int mlx5_vector2eqn(struct mlx5_ib_dev *dev, int vector, int *eqn, int *irqn) static int alloc_comp_eqs(struct mlx5_ib_dev *dev) { struct mlx5_eq_table *table = &dev->mdev.priv.eq_table; + char name[MLX5_MAX_EQ_NAME]; struct mlx5_eq *eq, *n; int ncomp_vec; int nent; @@ -180,11 +181,10 @@ static int alloc_comp_eqs(struct mlx5_ib_dev *dev) goto clean; } - snprintf(eq->name, MLX5_MAX_EQ_NAME, "mlx5_comp%d", i); + snprintf(name, MLX5_MAX_EQ_NAME, "mlx5_comp%d", i); err = mlx5_create_map_eq(&dev->mdev, eq, i + MLX5_EQ_VEC_COMP_BASE, nent, 0, - eq->name, - &dev->mdev.priv.uuari.uars[0]); + name, &dev->mdev.priv.uuari.uars[0]); if (err) { kfree(eq); goto clean; @@ -301,9 +301,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, props->max_srq_sge = max_rq_sg - 1; props->max_fast_reg_page_list_len = (unsigned int)-1; props->local_ca_ack_delay = dev->mdev.caps.local_ca_ack_delay; - props->atomic_cap = dev->mdev.caps.flags & MLX5_DEV_CAP_FLAG_ATOMIC ? - IB_ATOMIC_HCA : IB_ATOMIC_NONE; - props->masked_atomic_cap = IB_ATOMIC_HCA; + props->atomic_cap = IB_ATOMIC_NONE; + props->masked_atomic_cap = IB_ATOMIC_NONE; props->max_pkeys = be16_to_cpup((__be16 *)(out_mad->data + 28)); props->max_mcast_grp = 1 << dev->mdev.caps.log_max_mcg; props->max_mcast_qp_attach = dev->mdev.caps.max_qp_mcg; @@ -1006,6 +1005,11 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event, ibev.device = &ibdev->ib_dev; ibev.element.port_num = port; + if (port < 1 || port > ibdev->num_ports) { + mlx5_ib_warn(ibdev, "warning: event on port %d\n", port); + return; + } + if (ibdev->ib_active) ib_dispatch_event(&ibev); } diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df95b6f0..3453580b1eb2 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -42,6 +42,10 @@ enum { DEF_CACHE_SIZE = 10, }; +enum { + MLX5_UMR_ALIGN = 2048 +}; + static __be64 *mr_align(__be64 *ptr, int align) { unsigned long mask = align - 1; @@ -61,13 +65,11 @@ static int order2idx(struct mlx5_ib_dev *dev, int order) static int add_keys(struct mlx5_ib_dev *dev, int c, int num) { - struct device *ddev = dev->ib_dev.dma_device; struct mlx5_mr_cache *cache = &dev->cache; struct mlx5_cache_ent *ent = &cache->ent[c]; struct mlx5_create_mkey_mbox_in *in; struct mlx5_ib_mr *mr; int npages = 1 << ent->order; - int size = sizeof(u64) * npages; int err = 0; int i; @@ -83,21 +85,6 @@ static int add_keys(struct mlx5_ib_dev *dev, int c, int num) } mr->order = ent->order; mr->umred = 1; - mr->pas = kmalloc(size + 0x3f, GFP_KERNEL); - if (!mr->pas) { - kfree(mr); - err = -ENOMEM; - goto out; - } - mr->dma = dma_map_single(ddev, mr_align(mr->pas, 0x40), size, - DMA_TO_DEVICE); - if (dma_mapping_error(ddev, mr->dma)) { - kfree(mr->pas); - kfree(mr); - err = -ENOMEM; - goto out; - } - in->seg.status = 1 << 6; in->seg.xlt_oct_size = cpu_to_be32((npages + 1) / 2); in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); @@ -108,8 +95,6 @@ static int add_keys(struct mlx5_ib_dev *dev, int c, int num) sizeof(*in)); if (err) { mlx5_ib_warn(dev, "create mkey failed %d\n", err); - dma_unmap_single(ddev, mr->dma, size, DMA_TO_DEVICE); - kfree(mr->pas); kfree(mr); goto out; } @@ -129,11 +114,9 @@ out: static void remove_keys(struct mlx5_ib_dev *dev, int c, int num) { - struct device *ddev = dev->ib_dev.dma_device; struct mlx5_mr_cache *cache = &dev->cache; struct mlx5_cache_ent *ent = &cache->ent[c]; struct mlx5_ib_mr *mr; - int size; int err; int i; @@ -149,14 +132,10 @@ static void remove_keys(struct mlx5_ib_dev *dev, int c, int num) ent->size--; spin_unlock(&ent->lock); err = mlx5_core_destroy_mkey(&dev->mdev, &mr->mmr); - if (err) { + if (err) mlx5_ib_warn(dev, "failed destroy mkey\n"); - } else { - size = ALIGN(sizeof(u64) * (1 << mr->order), 0x40); - dma_unmap_single(ddev, mr->dma, size, DMA_TO_DEVICE); - kfree(mr->pas); + else kfree(mr); - } } } @@ -408,13 +387,12 @@ static void free_cached_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) static void clean_keys(struct mlx5_ib_dev *dev, int c) { - struct device *ddev = dev->ib_dev.dma_device; struct mlx5_mr_cache *cache = &dev->cache; struct mlx5_cache_ent *ent = &cache->ent[c]; struct mlx5_ib_mr *mr; - int size; int err; + cancel_delayed_work(&ent->dwork); while (1) { spin_lock(&ent->lock); if (list_empty(&ent->head)) { @@ -427,14 +405,10 @@ static void clean_keys(struct mlx5_ib_dev *dev, int c) ent->size--; spin_unlock(&ent->lock); err = mlx5_core_destroy_mkey(&dev->mdev, &mr->mmr); - if (err) { + if (err) mlx5_ib_warn(dev, "failed destroy mkey\n"); - } else { - size = ALIGN(sizeof(u64) * (1 << mr->order), 0x40); - dma_unmap_single(ddev, mr->dma, size, DMA_TO_DEVICE); - kfree(mr->pas); + else kfree(mr); - } } } @@ -540,13 +514,15 @@ int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev) int i; dev->cache.stopped = 1; - destroy_workqueue(dev->cache.wq); + flush_workqueue(dev->cache.wq); mlx5_mr_cache_debugfs_cleanup(dev); for (i = 0; i < MAX_MR_CACHE_ENTRIES; i++) clean_keys(dev, i); + destroy_workqueue(dev->cache.wq); + return 0; } @@ -675,10 +651,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem, int page_shift, int order, int access_flags) { struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct device *ddev = dev->ib_dev.dma_device; struct umr_common *umrc = &dev->umrc; struct ib_send_wr wr, *bad; struct mlx5_ib_mr *mr; struct ib_sge sg; + int size = sizeof(u64) * npages; int err; int i; @@ -697,7 +675,22 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem, if (!mr) return ERR_PTR(-EAGAIN); - mlx5_ib_populate_pas(dev, umem, page_shift, mr_align(mr->pas, 0x40), 1); + mr->pas = kmalloc(size + MLX5_UMR_ALIGN - 1, GFP_KERNEL); + if (!mr->pas) { + err = -ENOMEM; + goto error; + } + + mlx5_ib_populate_pas(dev, umem, page_shift, + mr_align(mr->pas, MLX5_UMR_ALIGN), 1); + + mr->dma = dma_map_single(ddev, mr_align(mr->pas, MLX5_UMR_ALIGN), size, + DMA_TO_DEVICE); + if (dma_mapping_error(ddev, mr->dma)) { + kfree(mr->pas); + err = -ENOMEM; + goto error; + } memset(&wr, 0, sizeof(wr)); wr.wr_id = (u64)(unsigned long)mr; @@ -718,6 +711,9 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem, wait_for_completion(&mr->done); up(&umrc->sem); + dma_unmap_single(ddev, mr->dma, size, DMA_TO_DEVICE); + kfree(mr->pas); + if (mr->status != IB_WC_SUCCESS) { mlx5_ib_warn(dev, "reg umr failed\n"); err = -EFAULT; diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 045f8cdbd303..5659ea880741 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -203,7 +203,7 @@ static int sq_overhead(enum ib_qp_type qp_type) switch (qp_type) { case IB_QPT_XRC_INI: - size = sizeof(struct mlx5_wqe_xrc_seg); + size += sizeof(struct mlx5_wqe_xrc_seg); /* fall through */ case IB_QPT_RC: size += sizeof(struct mlx5_wqe_ctrl_seg) + @@ -211,20 +211,23 @@ static int sq_overhead(enum ib_qp_type qp_type) sizeof(struct mlx5_wqe_raddr_seg); break; + case IB_QPT_XRC_TGT: + return 0; + case IB_QPT_UC: - size = sizeof(struct mlx5_wqe_ctrl_seg) + + size += sizeof(struct mlx5_wqe_ctrl_seg) + sizeof(struct mlx5_wqe_raddr_seg); break; case IB_QPT_UD: case IB_QPT_SMI: case IB_QPT_GSI: - size = sizeof(struct mlx5_wqe_ctrl_seg) + + size += sizeof(struct mlx5_wqe_ctrl_seg) + sizeof(struct mlx5_wqe_datagram_seg); break; case MLX5_IB_QPT_REG_UMR: - size = sizeof(struct mlx5_wqe_ctrl_seg) + + size += sizeof(struct mlx5_wqe_ctrl_seg) + sizeof(struct mlx5_wqe_umr_ctrl_seg) + sizeof(struct mlx5_mkey_seg); break; @@ -270,7 +273,8 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, return wqe_size; if (wqe_size > dev->mdev.caps.max_sq_desc_sz) { - mlx5_ib_dbg(dev, "\n"); + mlx5_ib_dbg(dev, "wqe_size(%d) > max_sq_desc_sz(%d)\n", + wqe_size, dev->mdev.caps.max_sq_desc_sz); return -EINVAL; } @@ -280,9 +284,15 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, wq_size = roundup_pow_of_two(attr->cap.max_send_wr * wqe_size); qp->sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB; + if (qp->sq.wqe_cnt > dev->mdev.caps.max_wqes) { + mlx5_ib_dbg(dev, "wqe count(%d) exceeds limits(%d)\n", + qp->sq.wqe_cnt, dev->mdev.caps.max_wqes); + return -ENOMEM; + } qp->sq.wqe_shift = ilog2(MLX5_SEND_WQE_BB); qp->sq.max_gs = attr->cap.max_send_sge; - qp->sq.max_post = 1 << ilog2(wq_size / wqe_size); + qp->sq.max_post = wq_size / wqe_size; + attr->cap.max_send_wr = qp->sq.max_post; return wq_size; } @@ -1280,6 +1290,11 @@ static enum mlx5_qp_optpar opt_mask[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE][MLX5_Q MLX5_QP_OPTPAR_Q_KEY, [MLX5_QP_ST_MLX] = MLX5_QP_OPTPAR_PKEY_INDEX | MLX5_QP_OPTPAR_Q_KEY, + [MLX5_QP_ST_XRC] = MLX5_QP_OPTPAR_ALT_ADDR_PATH | + MLX5_QP_OPTPAR_RRE | + MLX5_QP_OPTPAR_RAE | + MLX5_QP_OPTPAR_RWE | + MLX5_QP_OPTPAR_PKEY_INDEX, }, }, [MLX5_QP_STATE_RTR] = { @@ -1314,6 +1329,11 @@ static enum mlx5_qp_optpar opt_mask[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE][MLX5_Q [MLX5_QP_STATE_RTS] = { [MLX5_QP_ST_UD] = MLX5_QP_OPTPAR_Q_KEY, [MLX5_QP_ST_MLX] = MLX5_QP_OPTPAR_Q_KEY, + [MLX5_QP_ST_UC] = MLX5_QP_OPTPAR_RWE, + [MLX5_QP_ST_RC] = MLX5_QP_OPTPAR_RNR_TIMEOUT | + MLX5_QP_OPTPAR_RWE | + MLX5_QP_OPTPAR_RAE | + MLX5_QP_OPTPAR_RRE, }, }, }; @@ -1651,29 +1671,6 @@ static __always_inline void set_raddr_seg(struct mlx5_wqe_raddr_seg *rseg, rseg->reserved = 0; } -static void set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, struct ib_send_wr *wr) -{ - if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP) { - aseg->swap_add = cpu_to_be64(wr->wr.atomic.swap); - aseg->compare = cpu_to_be64(wr->wr.atomic.compare_add); - } else if (wr->opcode == IB_WR_MASKED_ATOMIC_FETCH_AND_ADD) { - aseg->swap_add = cpu_to_be64(wr->wr.atomic.compare_add); - aseg->compare = cpu_to_be64(wr->wr.atomic.compare_add_mask); - } else { - aseg->swap_add = cpu_to_be64(wr->wr.atomic.compare_add); - aseg->compare = 0; - } -} - -static void set_masked_atomic_seg(struct mlx5_wqe_masked_atomic_seg *aseg, - struct ib_send_wr *wr) -{ - aseg->swap_add = cpu_to_be64(wr->wr.atomic.swap); - aseg->swap_add_mask = cpu_to_be64(wr->wr.atomic.swap_mask); - aseg->compare = cpu_to_be64(wr->wr.atomic.compare_add); - aseg->compare_mask = cpu_to_be64(wr->wr.atomic.compare_add_mask); -} - static void set_datagram_seg(struct mlx5_wqe_datagram_seg *dseg, struct ib_send_wr *wr) { @@ -2063,28 +2060,11 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, case IB_WR_ATOMIC_CMP_AND_SWP: case IB_WR_ATOMIC_FETCH_AND_ADD: - set_raddr_seg(seg, wr->wr.atomic.remote_addr, - wr->wr.atomic.rkey); - seg += sizeof(struct mlx5_wqe_raddr_seg); - - set_atomic_seg(seg, wr); - seg += sizeof(struct mlx5_wqe_atomic_seg); - - size += (sizeof(struct mlx5_wqe_raddr_seg) + - sizeof(struct mlx5_wqe_atomic_seg)) / 16; - break; - case IB_WR_MASKED_ATOMIC_CMP_AND_SWP: - set_raddr_seg(seg, wr->wr.atomic.remote_addr, - wr->wr.atomic.rkey); - seg += sizeof(struct mlx5_wqe_raddr_seg); - - set_masked_atomic_seg(seg, wr); - seg += sizeof(struct mlx5_wqe_masked_atomic_seg); - - size += (sizeof(struct mlx5_wqe_raddr_seg) + - sizeof(struct mlx5_wqe_masked_atomic_seg)) / 16; - break; + mlx5_ib_warn(dev, "Atomic operations are not supported yet\n"); + err = -ENOSYS; + *bad_wr = wr; + goto out; case IB_WR_LOCAL_INV: next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index 84d297afd6a9..0aa478bc291a 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -295,7 +295,7 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd, mlx5_vfree(in); if (err) { mlx5_ib_dbg(dev, "create SRQ failed, err %d\n", err); - goto err_srq; + goto err_usr_kern_srq; } mlx5_ib_dbg(dev, "create SRQ with srqn 0x%x\n", srq->msrq.srqn); @@ -316,6 +316,8 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd, err_core: mlx5_core_destroy_srq(&dev->mdev, &srq->msrq); + +err_usr_kern_srq: if (pd->uobject) destroy_srq_user(pd, srq); else diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 7c9d35f39d75..690201738993 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -357,7 +357,7 @@ static int mthca_eq_int(struct mthca_dev *dev, struct mthca_eq *eq) mthca_warn(dev, "Unhandled event %02x(%02x) on EQ %d\n", eqe->type, eqe->subtype, eq->eqn); break; - }; + } set_eqe_hw(eqe); ++eq->cons_index; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index 4ed8235d2d36..50219ab2279d 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -150,7 +150,7 @@ enum ib_qp_state get_ibqp_state(enum ocrdma_qp_state qps) return IB_QPS_SQE; case OCRDMA_QPS_ERR: return IB_QPS_ERR; - }; + } return IB_QPS_ERR; } @@ -171,7 +171,7 @@ static enum ocrdma_qp_state get_ocrdma_qp_state(enum ib_qp_state qps) return OCRDMA_QPS_SQE; case IB_QPS_ERR: return OCRDMA_QPS_ERR; - }; + } return OCRDMA_QPS_ERR; } @@ -1982,7 +1982,7 @@ int ocrdma_mbx_create_qp(struct ocrdma_qp *qp, struct ib_qp_init_attr *attrs, break; default: return -EINVAL; - }; + } cmd = ocrdma_init_emb_mqe(OCRDMA_CMD_CREATE_QP, sizeof(*cmd)); if (!cmd) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 56e004940f18..0ce7674621ea 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -531,7 +531,7 @@ static void ocrdma_event_handler(struct ocrdma_dev *dev, u32 event) case BE_DEV_DOWN: ocrdma_close(dev); break; - }; + } } static struct ocrdma_driver ocrdma_drv = { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 6e982bb43c31..69f1d1221a6b 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -141,7 +141,7 @@ static inline void get_link_speed_and_width(struct ocrdma_dev *dev, /* Unsupported */ *ib_speed = IB_SPEED_SDR; *ib_width = IB_WIDTH_1X; - }; + } } @@ -2331,7 +2331,7 @@ static enum ib_wc_status ocrdma_to_ibwc_err(u16 status) default: ibwc_status = IB_WC_GENERAL_ERR; break; - }; + } return ibwc_status; } @@ -2370,7 +2370,7 @@ static void ocrdma_update_wc(struct ocrdma_qp *qp, struct ib_wc *ibwc, pr_err("%s() invalid opcode received = 0x%x\n", __func__, hdr->cw & OCRDMA_WQE_OPCODE_MASK); break; - }; + } } static void ocrdma_set_cqe_status_flushed(struct ocrdma_qp *qp, diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 653ac6bfc57a..6c923c7039a1 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1588,7 +1588,7 @@ static int srpt_build_tskmgmt_rsp(struct srpt_rdma_ch *ch, int resp_data_len; int resp_len; - resp_data_len = (rsp_code == SRP_TSK_MGMT_SUCCESS) ? 0 : 4; + resp_data_len = 4; resp_len = sizeof(*srp_rsp) + resp_data_len; srp_rsp = ioctx->ioctx.buf; @@ -1600,11 +1600,9 @@ static int srpt_build_tskmgmt_rsp(struct srpt_rdma_ch *ch, + atomic_xchg(&ch->req_lim_delta, 0)); srp_rsp->tag = tag; - if (rsp_code != SRP_TSK_MGMT_SUCCESS) { - srp_rsp->flags |= SRP_RSP_FLAG_RSPVALID; - srp_rsp->resp_data_len = cpu_to_be32(resp_data_len); - srp_rsp->data[3] = rsp_code; - } + srp_rsp->flags |= SRP_RSP_FLAG_RSPVALID; + srp_rsp->resp_data_len = cpu_to_be32(resp_data_len); + srp_rsp->data[3] = rsp_code; return resp_len; } @@ -2358,6 +2356,8 @@ static void srpt_release_channel_work(struct work_struct *w) transport_deregister_session(se_sess); ch->sess = NULL; + ib_destroy_cm_id(ch->cm_id); + srpt_destroy_ch_ib(ch); srpt_free_ioctx_ring((struct srpt_ioctx **)ch->ioctx_ring, @@ -2368,8 +2368,6 @@ static void srpt_release_channel_work(struct work_struct *w) list_del(&ch->list); spin_unlock_irq(&sdev->spinlock); - ib_destroy_cm_id(ch->cm_id); - if (ch->release_done) complete(ch->release_done); diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index fe302e33f72e..c880ebaf1553 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -52,7 +52,7 @@ config AMD_IOMMU select PCI_PRI select PCI_PASID select IOMMU_API - depends on X86_64 && PCI && ACPI && X86_IO_APIC + depends on X86_64 && PCI && ACPI ---help--- With this option you can enable support for AMD IOMMU hardware in your system. An IOMMU is a hardware component which provides diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index f417e89e1e7e..181c9ba929cd 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -377,6 +377,7 @@ struct arm_smmu_cfg { u32 cbar; pgd_t *pgd; }; +#define INVALID_IRPTNDX 0xff #define ARM_SMMU_CB_ASID(cfg) ((cfg)->cbndx) #define ARM_SMMU_CB_VMID(cfg) ((cfg)->cbndx + 1) @@ -840,7 +841,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (IS_ERR_VALUE(ret)) { dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n", root_cfg->irptndx, irq); - root_cfg->irptndx = -1; + root_cfg->irptndx = INVALID_IRPTNDX; goto out_free_context; } @@ -869,7 +870,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR); arm_smmu_tlb_inv_context(root_cfg); - if (root_cfg->irptndx != -1) { + if (root_cfg->irptndx != INVALID_IRPTNDX) { irq = smmu->irqs[smmu->num_global_irqs + root_cfg->irptndx]; free_irq(irq, domain); } @@ -1857,8 +1858,6 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev) goto out_put_parent; } - arm_smmu_device_reset(smmu); - for (i = 0; i < smmu->num_global_irqs; ++i) { err = request_irq(smmu->irqs[i], arm_smmu_global_fault, @@ -1876,6 +1875,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev) spin_lock(&arm_smmu_devices_lock); list_add(&smmu->list, &arm_smmu_devices); spin_unlock(&arm_smmu_devices_lock); + + arm_smmu_device_reset(smmu); return 0; out_free_irqs: @@ -1966,10 +1967,10 @@ static int __init arm_smmu_init(void) return ret; /* Oh, for a proper bus abstraction */ - if (!iommu_present(&platform_bus_type)); + if (!iommu_present(&platform_bus_type)) bus_set_iommu(&platform_bus_type, &arm_smmu_ops); - if (!iommu_present(&amba_bustype)); + if (!iommu_present(&amba_bustype)) bus_set_iommu(&amba_bustype, &arm_smmu_ops); return 0; diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 71eb233b9ace..b6a74bcbb08f 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -996,6 +996,7 @@ static void request_write(struct cached_dev *dc, struct search *s) closure_bio_submit(bio, cl, s->d); } else { bch_writeback_add(dc); + s->op.cache_bio = bio; if (bio->bi_rw & REQ_FLUSH) { /* Also need to send a flush to the backing device */ @@ -1008,8 +1009,6 @@ static void request_write(struct cached_dev *dc, struct search *s) flush->bi_private = cl; closure_bio_submit(flush, cl, s->d); - } else { - s->op.cache_bio = bio; } } out: diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index ea49834377c8..2a20986a2fec 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -19,8 +19,6 @@ #define DM_MSG_PREFIX "io" #define DM_IO_MAX_REGIONS BITS_PER_LONG -#define MIN_IOS 16 -#define MIN_BIOS 16 struct dm_io_client { mempool_t *pool; @@ -50,16 +48,17 @@ static struct kmem_cache *_dm_io_cache; struct dm_io_client *dm_io_client_create(void) { struct dm_io_client *client; + unsigned min_ios = dm_get_reserved_bio_based_ios(); client = kmalloc(sizeof(*client), GFP_KERNEL); if (!client) return ERR_PTR(-ENOMEM); - client->pool = mempool_create_slab_pool(MIN_IOS, _dm_io_cache); + client->pool = mempool_create_slab_pool(min_ios, _dm_io_cache); if (!client->pool) goto bad; - client->bios = bioset_create(MIN_BIOS, 0); + client->bios = bioset_create(min_ios, 0); if (!client->bios) goto bad; diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index b759a127f9c3..de570a558764 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -7,6 +7,7 @@ #include +#include "dm.h" #include "dm-path-selector.h" #include "dm-uevent.h" @@ -116,8 +117,6 @@ struct dm_mpath_io { typedef int (*action_fn) (struct pgpath *pgpath); -#define MIN_IOS 256 /* Mempool size */ - static struct kmem_cache *_mpio_cache; static struct workqueue_struct *kmultipathd, *kmpath_handlerd; @@ -190,6 +189,7 @@ static void free_priority_group(struct priority_group *pg, static struct multipath *alloc_multipath(struct dm_target *ti) { struct multipath *m; + unsigned min_ios = dm_get_reserved_rq_based_ios(); m = kzalloc(sizeof(*m), GFP_KERNEL); if (m) { @@ -202,7 +202,7 @@ static struct multipath *alloc_multipath(struct dm_target *ti) INIT_WORK(&m->trigger_event, trigger_event); init_waitqueue_head(&m->pg_init_wait); mutex_init(&m->work_mutex); - m->mpio_pool = mempool_create_slab_pool(MIN_IOS, _mpio_cache); + m->mpio_pool = mempool_create_slab_pool(min_ios, _mpio_cache); if (!m->mpio_pool) { kfree(m); return NULL; @@ -1268,6 +1268,7 @@ static int noretry_error(int error) case -EREMOTEIO: case -EILSEQ: case -ENODATA: + case -ENOSPC: return 1; } @@ -1298,8 +1299,17 @@ static int do_end_io(struct multipath *m, struct request *clone, if (!error && !clone->errors) return 0; /* I/O complete */ - if (noretry_error(error)) + if (noretry_error(error)) { + if ((clone->cmd_flags & REQ_WRITE_SAME) && + !clone->q->limits.max_write_same_sectors) { + struct queue_limits *limits; + + /* device doesn't really support WRITE SAME, disable it */ + limits = dm_get_queue_limits(dm_table_get_md(m->ti->table)); + limits->max_write_same_sectors = 0; + } return error; + } if (mpio->pgpath) fail_path(mpio->pgpath); diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c index 3ac415675b6c..2d2b1b7588d7 100644 --- a/drivers/md/dm-snap-persistent.c +++ b/drivers/md/dm-snap-persistent.c @@ -256,7 +256,7 @@ static int chunk_io(struct pstore *ps, void *area, chunk_t chunk, int rw, */ INIT_WORK_ONSTACK(&req.work, do_metadata); queue_work(ps->metadata_wq, &req.work); - flush_work(&req.work); + flush_workqueue(ps->metadata_wq); return req.result; } @@ -269,6 +269,14 @@ static chunk_t area_location(struct pstore *ps, chunk_t area) return NUM_SNAPSHOT_HDR_CHUNKS + ((ps->exceptions_per_area + 1) * area); } +static void skip_metadata(struct pstore *ps) +{ + uint32_t stride = ps->exceptions_per_area + 1; + chunk_t next_free = ps->next_free; + if (sector_div(next_free, stride) == NUM_SNAPSHOT_HDR_CHUNKS) + ps->next_free++; +} + /* * Read or write a metadata area. Remembering to skip the first * chunk which holds the header. @@ -502,6 +510,8 @@ static int read_exceptions(struct pstore *ps, ps->current_area--; + skip_metadata(ps); + return 0; } @@ -616,8 +626,6 @@ static int persistent_prepare_exception(struct dm_exception_store *store, struct dm_exception *e) { struct pstore *ps = get_info(store); - uint32_t stride; - chunk_t next_free; sector_t size = get_dev_size(dm_snap_cow(store->snap)->bdev); /* Is there enough room ? */ @@ -630,10 +638,8 @@ static int persistent_prepare_exception(struct dm_exception_store *store, * Move onto the next free pending, making sure to take * into account the location of the metadata chunks. */ - stride = (ps->exceptions_per_area + 1); - next_free = ++ps->next_free; - if (sector_div(next_free, stride) == 1) - ps->next_free++; + ps->next_free++; + skip_metadata(ps); atomic_inc(&ps->pending_count); return 0; diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index c434e5aab2df..aec57d76db5d 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -725,17 +725,16 @@ static int calc_max_buckets(void) */ static int init_hash_tables(struct dm_snapshot *s) { - sector_t hash_size, cow_dev_size, origin_dev_size, max_buckets; + sector_t hash_size, cow_dev_size, max_buckets; /* * Calculate based on the size of the original volume or * the COW volume... */ cow_dev_size = get_dev_size(s->cow->bdev); - origin_dev_size = get_dev_size(s->origin->bdev); max_buckets = calc_max_buckets(); - hash_size = min(origin_dev_size, cow_dev_size) >> s->store->chunk_shift; + hash_size = cow_dev_size >> s->store->chunk_shift; hash_size = min(hash_size, max_buckets); if (hash_size < 64) diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c index 8ae31e8d3d64..3d404c1371ed 100644 --- a/drivers/md/dm-stats.c +++ b/drivers/md/dm-stats.c @@ -451,19 +451,26 @@ static void dm_stat_for_entry(struct dm_stat *s, size_t entry, struct dm_stat_percpu *p; /* - * For strict correctness we should use local_irq_disable/enable + * For strict correctness we should use local_irq_save/restore * instead of preempt_disable/enable. * - * This is racy if the driver finishes bios from non-interrupt - * context as well as from interrupt context or from more different - * interrupts. + * preempt_disable/enable is racy if the driver finishes bios + * from non-interrupt context as well as from interrupt context + * or from more different interrupts. * - * However, the race only results in not counting some events, - * so it is acceptable. + * On 64-bit architectures the race only results in not counting some + * events, so it is acceptable. On 32-bit architectures the race could + * cause the counter going off by 2^32, so we need to do proper locking + * there. * * part_stat_lock()/part_stat_unlock() have this race too. */ +#if BITS_PER_LONG == 32 + unsigned long flags; + local_irq_save(flags); +#else preempt_disable(); +#endif p = &s->stat_percpu[smp_processor_id()][entry]; if (!end) { @@ -478,7 +485,11 @@ static void dm_stat_for_entry(struct dm_stat *s, size_t entry, p->ticks[idx] += duration; } +#if BITS_PER_LONG == 32 + local_irq_restore(flags); +#else preempt_enable(); +#endif } static void __dm_stat_bio(struct dm_stat *s, unsigned long bi_rw, diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index ed063427d676..2c0cf511ec23 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2095,6 +2095,7 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv) * them down to the data device. The thin device's discard * processing will cause mappings to be removed from the btree. */ + ti->discard_zeroes_data_unsupported = true; if (pf.discard_enabled && pf.discard_passdown) { ti->num_discard_bios = 1; @@ -2104,7 +2105,6 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv) * thin devices' discard limits consistent). */ ti->discards_supported = true; - ti->discard_zeroes_data_unsupported = true; } ti->private = pt; @@ -2689,8 +2689,16 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits) * They get transferred to the live pool in bind_control_target() * called from pool_preresume(). */ - if (!pt->adjusted_pf.discard_enabled) + if (!pt->adjusted_pf.discard_enabled) { + /* + * Must explicitly disallow stacking discard limits otherwise the + * block layer will stack them if pool's data device has support. + * QUEUE_FLAG_DISCARD wouldn't be set but there is no way for the + * user to see that, so make sure to set all discard limits to 0. + */ + limits->discard_granularity = 0; return; + } disable_passdown_if_not_supported(pt); @@ -2826,10 +2834,10 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv) ti->per_bio_data_size = sizeof(struct dm_thin_endio_hook); /* In case the pool supports discards, pass them on. */ + ti->discard_zeroes_data_unsupported = true; if (tc->pool->pf.discard_enabled) { ti->discards_supported = true; ti->num_discard_bios = 1; - ti->discard_zeroes_data_unsupported = true; /* Discard bios must be split on a block boundary */ ti->split_discard_bios = true; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 6a5e9ed2fcc3..b3e26c7d1417 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -211,10 +211,55 @@ struct dm_md_mempools { struct bio_set *bs; }; -#define MIN_IOS 256 +#define RESERVED_BIO_BASED_IOS 16 +#define RESERVED_REQUEST_BASED_IOS 256 +#define RESERVED_MAX_IOS 1024 static struct kmem_cache *_io_cache; static struct kmem_cache *_rq_tio_cache; +/* + * Bio-based DM's mempools' reserved IOs set by the user. + */ +static unsigned reserved_bio_based_ios = RESERVED_BIO_BASED_IOS; + +/* + * Request-based DM's mempools' reserved IOs set by the user. + */ +static unsigned reserved_rq_based_ios = RESERVED_REQUEST_BASED_IOS; + +static unsigned __dm_get_reserved_ios(unsigned *reserved_ios, + unsigned def, unsigned max) +{ + unsigned ios = ACCESS_ONCE(*reserved_ios); + unsigned modified_ios = 0; + + if (!ios) + modified_ios = def; + else if (ios > max) + modified_ios = max; + + if (modified_ios) { + (void)cmpxchg(reserved_ios, ios, modified_ios); + ios = modified_ios; + } + + return ios; +} + +unsigned dm_get_reserved_bio_based_ios(void) +{ + return __dm_get_reserved_ios(&reserved_bio_based_ios, + RESERVED_BIO_BASED_IOS, RESERVED_MAX_IOS); +} +EXPORT_SYMBOL_GPL(dm_get_reserved_bio_based_ios); + +unsigned dm_get_reserved_rq_based_ios(void) +{ + return __dm_get_reserved_ios(&reserved_rq_based_ios, + RESERVED_REQUEST_BASED_IOS, RESERVED_MAX_IOS); +} +EXPORT_SYMBOL_GPL(dm_get_reserved_rq_based_ios); + static int __init local_init(void) { int r = -ENOMEM; @@ -2277,6 +2322,17 @@ struct target_type *dm_get_immutable_target_type(struct mapped_device *md) return md->immutable_target_type; } +/* + * The queue_limits are only valid as long as you have a reference + * count on 'md'. + */ +struct queue_limits *dm_get_queue_limits(struct mapped_device *md) +{ + BUG_ON(!atomic_read(&md->holders)); + return &md->queue->limits; +} +EXPORT_SYMBOL_GPL(dm_get_queue_limits); + /* * Fully initialize a request-based queue (->elevator, ->request_fn, etc). */ @@ -2862,18 +2918,18 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, u if (type == DM_TYPE_BIO_BASED) { cachep = _io_cache; - pool_size = 16; + pool_size = dm_get_reserved_bio_based_ios(); front_pad = roundup(per_bio_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone); } else if (type == DM_TYPE_REQUEST_BASED) { cachep = _rq_tio_cache; - pool_size = MIN_IOS; + pool_size = dm_get_reserved_rq_based_ios(); front_pad = offsetof(struct dm_rq_clone_bio_info, clone); /* per_bio_data_size is not used. See __bind_mempools(). */ WARN_ON(per_bio_data_size != 0); } else goto out; - pools->io_pool = mempool_create_slab_pool(MIN_IOS, cachep); + pools->io_pool = mempool_create_slab_pool(pool_size, cachep); if (!pools->io_pool) goto out; @@ -2924,6 +2980,13 @@ module_exit(dm_exit); module_param(major, uint, 0); MODULE_PARM_DESC(major, "The major number of the device mapper"); + +module_param(reserved_bio_based_ios, uint, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(reserved_bio_based_ios, "Reserved IOs in bio-based mempools"); + +module_param(reserved_rq_based_ios, uint, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(reserved_rq_based_ios, "Reserved IOs in request-based mempools"); + MODULE_DESCRIPTION(DM_NAME " driver"); MODULE_AUTHOR("Joe Thornber "); MODULE_LICENSE("GPL"); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 5e604cc7b4aa..1d1ad7b7e527 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -184,6 +184,9 @@ void dm_free_md_mempools(struct dm_md_mempools *pools); /* * Helpers that are used by DM core */ +unsigned dm_get_reserved_bio_based_ios(void); +unsigned dm_get_reserved_rq_based_ios(void); + static inline bool dm_message_test_buffer_overflow(char *result, unsigned maxlen) { return !maxlen || strlen(result) + 1 >= maxlen; diff --git a/drivers/misc/mei/amthif.c b/drivers/misc/mei/amthif.c index d0fdc134068a..f6ff711aa5bb 100644 --- a/drivers/misc/mei/amthif.c +++ b/drivers/misc/mei/amthif.c @@ -57,6 +57,7 @@ void mei_amthif_reset_params(struct mei_device *dev) dev->iamthif_ioctl = false; dev->iamthif_state = MEI_IAMTHIF_IDLE; dev->iamthif_timer = 0; + dev->iamthif_stall_timer = 0; } /** diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c index 6d0282c08a06..cd2033cd7120 100644 --- a/drivers/misc/mei/bus.c +++ b/drivers/misc/mei/bus.c @@ -297,10 +297,13 @@ int __mei_cl_recv(struct mei_cl *cl, u8 *buf, size_t length) if (cl->reading_state != MEI_READ_COMPLETE && !waitqueue_active(&cl->rx_wait)) { + mutex_unlock(&dev->device_lock); if (wait_event_interruptible(cl->rx_wait, - (MEI_READ_COMPLETE == cl->reading_state))) { + cl->reading_state == MEI_READ_COMPLETE || + mei_cl_is_transitioning(cl))) { + if (signal_pending(current)) return -EINTR; return -ERESTARTSYS; diff --git a/drivers/misc/mei/client.h b/drivers/misc/mei/client.h index 9eb031e92070..892cc4207fa2 100644 --- a/drivers/misc/mei/client.h +++ b/drivers/misc/mei/client.h @@ -90,6 +90,12 @@ static inline bool mei_cl_is_connected(struct mei_cl *cl) cl->dev->dev_state == MEI_DEV_ENABLED && cl->state == MEI_FILE_CONNECTED); } +static inline bool mei_cl_is_transitioning(struct mei_cl *cl) +{ + return (MEI_FILE_INITIALIZING == cl->state || + MEI_FILE_DISCONNECTED == cl->state || + MEI_FILE_DISCONNECTING == cl->state); +} bool mei_cl_is_other_connecting(struct mei_cl *cl); int mei_cl_disconnect(struct mei_cl *cl); diff --git a/drivers/misc/mei/hbm.c b/drivers/misc/mei/hbm.c index 6127ab64bb39..0a0448326e9d 100644 --- a/drivers/misc/mei/hbm.c +++ b/drivers/misc/mei/hbm.c @@ -35,11 +35,15 @@ static void mei_hbm_me_cl_allocate(struct mei_device *dev) struct mei_me_client *clients; int b; + dev->me_clients_num = 0; + dev->me_client_presentation_num = 0; + dev->me_client_index = 0; + /* count how many ME clients we have */ for_each_set_bit(b, dev->me_clients_map, MEI_CLIENTS_MAX) dev->me_clients_num++; - if (dev->me_clients_num <= 0) + if (dev->me_clients_num == 0) return; kfree(dev->me_clients); @@ -221,7 +225,7 @@ static int mei_hbm_prop_req(struct mei_device *dev) struct hbm_props_request *prop_req; const size_t len = sizeof(struct hbm_props_request); unsigned long next_client_index; - u8 client_num; + unsigned long client_num; client_num = dev->me_client_presentation_num; @@ -677,8 +681,6 @@ void mei_hbm_dispatch(struct mei_device *dev, struct mei_msg_hdr *hdr) if (dev->dev_state == MEI_DEV_INIT_CLIENTS && dev->hbm_state == MEI_HBM_ENUM_CLIENTS) { dev->init_clients_timer = 0; - dev->me_client_presentation_num = 0; - dev->me_client_index = 0; mei_hbm_me_cl_allocate(dev); dev->hbm_state = MEI_HBM_CLIENT_PROPERTIES; diff --git a/drivers/misc/mei/init.c b/drivers/misc/mei/init.c index 92c73118b13c..6197018e2f16 100644 --- a/drivers/misc/mei/init.c +++ b/drivers/misc/mei/init.c @@ -175,6 +175,9 @@ void mei_reset(struct mei_device *dev, int interrupts_enabled) memset(&dev->wr_ext_msg, 0, sizeof(dev->wr_ext_msg)); } + /* we're already in reset, cancel the init timer */ + dev->init_clients_timer = 0; + dev->me_clients_num = 0; dev->rd_msg_hdr = 0; dev->wd_pending = false; diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c index 173ff095be0d..cabeddd66c1f 100644 --- a/drivers/misc/mei/main.c +++ b/drivers/misc/mei/main.c @@ -249,19 +249,16 @@ static ssize_t mei_read(struct file *file, char __user *ubuf, mutex_unlock(&dev->device_lock); if (wait_event_interruptible(cl->rx_wait, - (MEI_READ_COMPLETE == cl->reading_state || - MEI_FILE_INITIALIZING == cl->state || - MEI_FILE_DISCONNECTED == cl->state || - MEI_FILE_DISCONNECTING == cl->state))) { + MEI_READ_COMPLETE == cl->reading_state || + mei_cl_is_transitioning(cl))) { + if (signal_pending(current)) return -EINTR; return -ERESTARTSYS; } mutex_lock(&dev->device_lock); - if (MEI_FILE_INITIALIZING == cl->state || - MEI_FILE_DISCONNECTED == cl->state || - MEI_FILE_DISCONNECTING == cl->state) { + if (mei_cl_is_transitioning(cl)) { rets = -EBUSY; goto out; } diff --git a/drivers/misc/mei/mei_dev.h b/drivers/misc/mei/mei_dev.h index 7b918b2fb894..456b322013e2 100644 --- a/drivers/misc/mei/mei_dev.h +++ b/drivers/misc/mei/mei_dev.h @@ -396,9 +396,9 @@ struct mei_device { struct mei_me_client *me_clients; /* Note: memory has to be allocated */ DECLARE_BITMAP(me_clients_map, MEI_CLIENTS_MAX); DECLARE_BITMAP(host_clients_map, MEI_CLIENTS_MAX); - u8 me_clients_num; - u8 me_client_presentation_num; - u8 me_client_index; + unsigned long me_clients_num; + unsigned long me_client_presentation_num; + unsigned long me_client_index; struct mei_cl wd_cl; enum mei_wd_states wd_state; diff --git a/drivers/mmc/host/sh_mobile_sdhi.c b/drivers/mmc/host/sh_mobile_sdhi.c index 87ed3fb5149a..f344659dceac 100644 --- a/drivers/mmc/host/sh_mobile_sdhi.c +++ b/drivers/mmc/host/sh_mobile_sdhi.c @@ -113,14 +113,14 @@ static const struct sh_mobile_sdhi_ops sdhi_ops = { }; static const struct of_device_id sh_mobile_sdhi_of_match[] = { - { .compatible = "renesas,shmobile-sdhi" }, - { .compatible = "renesas,sh7372-sdhi" }, - { .compatible = "renesas,sh73a0-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, - { .compatible = "renesas,r8a73a4-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, - { .compatible = "renesas,r8a7740-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, - { .compatible = "renesas,r8a7778-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, - { .compatible = "renesas,r8a7779-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, - { .compatible = "renesas,r8a7790-sdhi", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-shmobile" }, + { .compatible = "renesas,sdhi-sh7372" }, + { .compatible = "renesas,sdhi-sh73a0", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-r8a73a4", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-r8a7740", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-r8a7778", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-r8a7779", .data = &sh_mobile_sdhi_of_cfg[0], }, + { .compatible = "renesas,sdhi-r8a7790", .data = &sh_mobile_sdhi_of_cfg[0], }, {}, }; MODULE_DEVICE_TABLE(of, sh_mobile_sdhi_of_match); diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c index 26b14f9fcac6..6bc9618af094 100644 --- a/drivers/mtd/devices/m25p80.c +++ b/drivers/mtd/devices/m25p80.c @@ -168,12 +168,25 @@ static inline int write_disable(struct m25p *flash) */ static inline int set_4byte(struct m25p *flash, u32 jedec_id, int enable) { + int status; + bool need_wren = false; + switch (JEDEC_MFR(jedec_id)) { - case CFI_MFR_MACRONIX: case CFI_MFR_ST: /* Micron, actually */ + /* Some Micron need WREN command; all will accept it */ + need_wren = true; + case CFI_MFR_MACRONIX: case 0xEF /* winbond */: + if (need_wren) + write_enable(flash); + flash->command[0] = enable ? OPCODE_EN4B : OPCODE_EX4B; - return spi_write(flash->spi, flash->command, 1); + status = spi_write(flash->spi, flash->command, 1); + + if (need_wren) + write_disable(flash); + + return status; default: /* Spansion style */ flash->command[0] = OPCODE_BRWR; diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 7ed4841327f2..d340b2f198c6 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -2869,10 +2869,8 @@ static int nand_flash_detect_ext_param_page(struct mtd_info *mtd, len = le16_to_cpu(p->ext_param_page_length) * 16; ep = kmalloc(len, GFP_KERNEL); - if (!ep) { - ret = -ENOMEM; - goto ext_out; - } + if (!ep) + return -ENOMEM; /* Send our own NAND_CMD_PARAM. */ chip->cmdfunc(mtd, NAND_CMD_PARAM, 0, -1); @@ -2920,7 +2918,7 @@ static int nand_flash_detect_ext_param_page(struct mtd_info *mtd, } pr_info("ONFI extended param page detected.\n"); - return 0; + ret = 0; ext_out: kfree(ep); diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 55bbb8b8200c..e883bfe2e727 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1724,6 +1724,7 @@ static int __bond_release_one(struct net_device *bond_dev, struct bonding *bond = netdev_priv(bond_dev); struct slave *slave, *oldcurrent; struct sockaddr addr; + int old_flags = bond_dev->flags; netdev_features_t old_features = bond_dev->features; /* slave is not a slave or master is not master of this slave */ @@ -1855,12 +1856,18 @@ static int __bond_release_one(struct net_device *bond_dev, * bond_change_active_slave(..., NULL) */ if (!USES_PRIMARY(bond->params.mode)) { - /* unset promiscuity level from slave */ - if (bond_dev->flags & IFF_PROMISC) + /* unset promiscuity level from slave + * NOTE: The NETDEV_CHANGEADDR call above may change the value + * of the IFF_PROMISC flag in the bond_dev, but we need the + * value of that flag before that change, as that was the value + * when this slave was attached, so we cache at the start of the + * function and use it here. Same goes for ALLMULTI below + */ + if (old_flags & IFF_PROMISC) dev_set_promiscuity(slave_dev, -1); /* unset allmulti level from slave */ - if (bond_dev->flags & IFF_ALLMULTI) + if (old_flags & IFF_ALLMULTI) dev_set_allmulti(slave_dev, -1); bond_hw_addr_flush(bond_dev, slave_dev); diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index 71c677e651d7..3f21142138b7 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -702,7 +702,6 @@ static int flexcan_chip_start(struct net_device *dev) { struct flexcan_priv *priv = netdev_priv(dev); struct flexcan_regs __iomem *regs = priv->base; - unsigned int i; int err; u32 reg_mcr, reg_ctrl; @@ -772,17 +771,6 @@ static int flexcan_chip_start(struct net_device *dev) netdev_dbg(dev, "%s: writing ctrl=0x%08x", __func__, reg_ctrl); flexcan_write(reg_ctrl, ®s->ctrl); - for (i = 0; i < ARRAY_SIZE(regs->cantxfg); i++) { - flexcan_write(0, ®s->cantxfg[i].can_ctrl); - flexcan_write(0, ®s->cantxfg[i].can_id); - flexcan_write(0, ®s->cantxfg[i].data[0]); - flexcan_write(0, ®s->cantxfg[i].data[1]); - - /* put MB into rx queue */ - flexcan_write(FLEXCAN_MB_CNT_CODE(0x4), - ®s->cantxfg[i].can_ctrl); - } - /* acceptance mask/acceptance code (accept everything) */ flexcan_write(0x0, ®s->rxgmask); flexcan_write(0x0, ®s->rx14mask); diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c index 874188ba06f7..25377e547f9b 100644 --- a/drivers/net/can/slcan.c +++ b/drivers/net/can/slcan.c @@ -76,6 +76,10 @@ MODULE_PARM_DESC(maxdev, "Maximum number of slcan interfaces"); /* maximum rx buffer len: extended CAN frame with timestamp */ #define SLC_MTU (sizeof("T1111222281122334455667788EA5F\r")+1) +#define SLC_CMD_LEN 1 +#define SLC_SFF_ID_LEN 3 +#define SLC_EFF_ID_LEN 8 + struct slcan { int magic; @@ -142,47 +146,63 @@ static void slc_bump(struct slcan *sl) { struct sk_buff *skb; struct can_frame cf; - int i, dlc_pos, tmp; - unsigned long ultmp; - char cmd = sl->rbuff[0]; - - if ((cmd != 't') && (cmd != 'T') && (cmd != 'r') && (cmd != 'R')) + int i, tmp; + u32 tmpid; + char *cmd = sl->rbuff; + + cf.can_id = 0; + + switch (*cmd) { + case 'r': + cf.can_id = CAN_RTR_FLAG; + /* fallthrough */ + case 't': + /* store dlc ASCII value and terminate SFF CAN ID string */ + cf.can_dlc = sl->rbuff[SLC_CMD_LEN + SLC_SFF_ID_LEN]; + sl->rbuff[SLC_CMD_LEN + SLC_SFF_ID_LEN] = 0; + /* point to payload data behind the dlc */ + cmd += SLC_CMD_LEN + SLC_SFF_ID_LEN + 1; + break; + case 'R': + cf.can_id = CAN_RTR_FLAG; + /* fallthrough */ + case 'T': + cf.can_id |= CAN_EFF_FLAG; + /* store dlc ASCII value and terminate EFF CAN ID string */ + cf.can_dlc = sl->rbuff[SLC_CMD_LEN + SLC_EFF_ID_LEN]; + sl->rbuff[SLC_CMD_LEN + SLC_EFF_ID_LEN] = 0; + /* point to payload data behind the dlc */ + cmd += SLC_CMD_LEN + SLC_EFF_ID_LEN + 1; + break; + default: return; + } - if (cmd & 0x20) /* tiny chars 'r' 't' => standard frame format */ - dlc_pos = 4; /* dlc position tiiid */ - else - dlc_pos = 9; /* dlc position Tiiiiiiiid */ - - if (!((sl->rbuff[dlc_pos] >= '0') && (sl->rbuff[dlc_pos] < '9'))) + if (kstrtou32(sl->rbuff + SLC_CMD_LEN, 16, &tmpid)) return; - cf.can_dlc = sl->rbuff[dlc_pos] - '0'; /* get can_dlc from ASCII val */ + cf.can_id |= tmpid; - sl->rbuff[dlc_pos] = 0; /* terminate can_id string */ - - if (kstrtoul(sl->rbuff+1, 16, &ultmp)) + /* get can_dlc from sanitized ASCII value */ + if (cf.can_dlc >= '0' && cf.can_dlc < '9') + cf.can_dlc -= '0'; + else return; - cf.can_id = ultmp; - - if (!(cmd & 0x20)) /* NO tiny chars => extended frame format */ - cf.can_id |= CAN_EFF_FLAG; - - if ((cmd | 0x20) == 'r') /* RTR frame */ - cf.can_id |= CAN_RTR_FLAG; - *(u64 *) (&cf.data) = 0; /* clear payload */ - for (i = 0, dlc_pos++; i < cf.can_dlc; i++) { - tmp = hex_to_bin(sl->rbuff[dlc_pos++]); - if (tmp < 0) - return; - cf.data[i] = (tmp << 4); - tmp = hex_to_bin(sl->rbuff[dlc_pos++]); - if (tmp < 0) - return; - cf.data[i] |= tmp; + /* RTR frames may have a dlc > 0 but they never have any data bytes */ + if (!(cf.can_id & CAN_RTR_FLAG)) { + for (i = 0; i < cf.can_dlc; i++) { + tmp = hex_to_bin(*cmd++); + if (tmp < 0) + return; + cf.data[i] = (tmp << 4); + tmp = hex_to_bin(*cmd++); + if (tmp < 0) + return; + cf.data[i] |= tmp; + } } skb = dev_alloc_skb(sizeof(struct can_frame) + @@ -209,7 +229,6 @@ static void slc_bump(struct slcan *sl) /* parse tty input stream */ static void slcan_unesc(struct slcan *sl, unsigned char s) { - if ((s == '\r') || (s == '\a')) { /* CR or BEL ends the pdu */ if (!test_and_clear_bit(SLF_ERROR, &sl->flags) && (sl->rcount > 4)) { @@ -236,27 +255,46 @@ static void slcan_unesc(struct slcan *sl, unsigned char s) /* Encapsulate one can_frame and stuff into a TTY queue. */ static void slc_encaps(struct slcan *sl, struct can_frame *cf) { - int actual, idx, i; - char cmd; + int actual, i; + unsigned char *pos; + unsigned char *endpos; + canid_t id = cf->can_id; + + pos = sl->xbuff; if (cf->can_id & CAN_RTR_FLAG) - cmd = 'R'; /* becomes 'r' in standard frame format */ + *pos = 'R'; /* becomes 'r' in standard frame format (SFF) */ else - cmd = 'T'; /* becomes 't' in standard frame format */ + *pos = 'T'; /* becomes 't' in standard frame format (SSF) */ - if (cf->can_id & CAN_EFF_FLAG) - sprintf(sl->xbuff, "%c%08X%d", cmd, - cf->can_id & CAN_EFF_MASK, cf->can_dlc); - else - sprintf(sl->xbuff, "%c%03X%d", cmd | 0x20, - cf->can_id & CAN_SFF_MASK, cf->can_dlc); + /* determine number of chars for the CAN-identifier */ + if (cf->can_id & CAN_EFF_FLAG) { + id &= CAN_EFF_MASK; + endpos = pos + SLC_EFF_ID_LEN; + } else { + *pos |= 0x20; /* convert R/T to lower case for SFF */ + id &= CAN_SFF_MASK; + endpos = pos + SLC_SFF_ID_LEN; + } - idx = strlen(sl->xbuff); + /* build 3 (SFF) or 8 (EFF) digit CAN identifier */ + pos++; + while (endpos >= pos) { + *endpos-- = hex_asc_upper[id & 0xf]; + id >>= 4; + } + + pos += (cf->can_id & CAN_EFF_FLAG) ? SLC_EFF_ID_LEN : SLC_SFF_ID_LEN; - for (i = 0; i < cf->can_dlc; i++) - sprintf(&sl->xbuff[idx + 2*i], "%02X", cf->data[i]); + *pos++ = cf->can_dlc + '0'; + + /* RTR frames may have a dlc > 0 but they never have any data bytes */ + if (!(cf->can_id & CAN_RTR_FLAG)) { + for (i = 0; i < cf->can_dlc; i++) + pos = hex_byte_pack_upper(pos, cf->data[i]); + } - strcat(sl->xbuff, "\r"); /* add terminating character */ + *pos++ = '\r'; /* Order of next two lines is *very* important. * When we are sending a little amount of data, @@ -267,8 +305,8 @@ static void slc_encaps(struct slcan *sl, struct can_frame *cf) * 14 Oct 1994 Dmitry Gorodchanin. */ set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags); - actual = sl->tty->ops->write(sl->tty, sl->xbuff, strlen(sl->xbuff)); - sl->xleft = strlen(sl->xbuff) - actual; + actual = sl->tty->ops->write(sl->tty, sl->xbuff, pos - sl->xbuff); + sl->xleft = (pos - sl->xbuff) - actual; sl->xhead = sl->xbuff + actual; sl->dev->stats.tx_bytes += cf->can_dlc; } @@ -286,11 +324,13 @@ static void slcan_write_wakeup(struct tty_struct *tty) if (!sl || sl->magic != SLCAN_MAGIC || !netif_running(sl->dev)) return; + spin_lock(&sl->lock); if (sl->xleft <= 0) { /* Now serial buffer is almost free & we can start * transmission of another packet */ sl->dev->stats.tx_packets++; clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); + spin_unlock(&sl->lock); netif_wake_queue(sl->dev); return; } @@ -298,6 +338,7 @@ static void slcan_write_wakeup(struct tty_struct *tty) actual = tty->ops->write(tty, sl->xhead, sl->xleft); sl->xleft -= actual; sl->xhead += actual; + spin_unlock(&sl->lock); } /* Send a can_frame to a TTY queue. */ diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index a0f647f92bf5..0b7a4c3b01a2 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -463,7 +463,7 @@ static int peak_usb_start(struct peak_usb_device *dev) if (i < PCAN_USB_MAX_TX_URBS) { if (i == 0) { netdev_err(netdev, "couldn't setup any tx URB\n"); - return err; + goto err_tx; } netdev_warn(netdev, "tx performance may be slow\n"); @@ -472,7 +472,7 @@ static int peak_usb_start(struct peak_usb_device *dev) if (dev->adapter->dev_start) { err = dev->adapter->dev_start(dev); if (err) - goto failed; + goto err_adapter; } dev->state |= PCAN_USB_STATE_STARTED; @@ -481,19 +481,26 @@ static int peak_usb_start(struct peak_usb_device *dev) if (dev->adapter->dev_set_bus) { err = dev->adapter->dev_set_bus(dev, 1); if (err) - goto failed; + goto err_adapter; } dev->can.state = CAN_STATE_ERROR_ACTIVE; return 0; -failed: +err_adapter: if (err == -ENODEV) netif_device_detach(dev->netdev); netdev_warn(netdev, "couldn't submit control: %d\n", err); + for (i = 0; i < PCAN_USB_MAX_TX_URBS; i++) { + usb_free_urb(dev->tx_contexts[i].urb); + dev->tx_contexts[i].urb = NULL; + } +err_tx: + usb_kill_anchored_urbs(&dev->rx_submitted); + return err; } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 61726af1de6e..e66beff2704d 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -2481,8 +2481,7 @@ load_error_cnic2: load_error_cnic1: bnx2x_napi_disable_cnic(bp); /* Update the number of queues without the cnic queues */ - rc = bnx2x_set_real_num_queues(bp, 0); - if (rc) + if (bnx2x_set_real_num_queues(bp, 0)) BNX2X_ERR("Unable to set real_num_queues not including cnic\n"); load_error_cnic0: BNX2X_ERR("CNIC-related load failed\n"); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index d60a2ea3da19..51468227bf3b 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -175,6 +175,7 @@ typedef int (*read_sfp_module_eeprom_func_p)(struct bnx2x_phy *phy, #define EDC_MODE_LINEAR 0x0022 #define EDC_MODE_LIMITING 0x0044 #define EDC_MODE_PASSIVE_DAC 0x0055 +#define EDC_MODE_ACTIVE_DAC 0x0066 /* ETS defines*/ #define DCBX_INVALID_COS (0xFF) @@ -3684,6 +3685,41 @@ static void bnx2x_warpcore_enable_AN_KR2(struct bnx2x_phy *phy, bnx2x_update_link_attr(params, vars->link_attr_sync); } +static void bnx2x_disable_kr2(struct link_params *params, + struct link_vars *vars, + struct bnx2x_phy *phy) +{ + struct bnx2x *bp = params->bp; + int i; + static struct bnx2x_reg_set reg_set[] = { + /* Step 1 - Program the TX/RX alignment markers */ + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL5, 0x7690}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL7, 0xe647}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL6, 0xc4f0}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL9, 0x7690}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_RX_CTRL11, 0xe647}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_RX_CTRL10, 0xc4f0}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_USERB0_CTRL, 0x000c}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CTRL1, 0x6000}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CTRL3, 0x0000}, + {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CODE_FIELD, 0x0002}, + {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI1, 0x0000}, + {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI2, 0x0af7}, + {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI3, 0x0af7}, + {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_LD_BAM_CODE, 0x0002}, + {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_LD_UD_CODE, 0x0000} + }; + DP(NETIF_MSG_LINK, "Disabling 20G-KR2\n"); + + for (i = 0; i < ARRAY_SIZE(reg_set); i++) + bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg, + reg_set[i].val); + vars->link_attr_sync &= ~LINK_ATTR_SYNC_KR2_ENABLE; + bnx2x_update_link_attr(params, vars->link_attr_sync); + + vars->check_kr2_recovery_cnt = CHECK_KR2_RECOVERY_CNT; +} + static void bnx2x_warpcore_set_lpi_passthrough(struct bnx2x_phy *phy, struct link_params *params) { @@ -3715,7 +3751,6 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy, struct link_params *params, struct link_vars *vars) { u16 lane, i, cl72_ctrl, an_adv = 0; - u16 ucode_ver; struct bnx2x *bp = params->bp; static struct bnx2x_reg_set reg_set[] = { {MDIO_WC_DEVAD, MDIO_WC_REG_SERDESDIGITAL_CONTROL1000X2, 0x7}, @@ -3806,15 +3841,7 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy, /* Advertise pause */ bnx2x_ext_phy_set_pause(params, phy, vars); - /* Set KR Autoneg Work-Around flag for Warpcore version older than D108 - */ - bnx2x_cl45_read(bp, phy, MDIO_WC_DEVAD, - MDIO_WC_REG_UC_INFO_B1_VERSION, &ucode_ver); - if (ucode_ver < 0xd108) { - DP(NETIF_MSG_LINK, "Enable AN KR work-around. WC ver:0x%x\n", - ucode_ver); - vars->rx_tx_asic_rst = MAX_KR_LINK_RETRY; - } + vars->rx_tx_asic_rst = MAX_KR_LINK_RETRY; bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD, MDIO_WC_REG_DIGITAL5_MISC7, 0x100); @@ -3838,6 +3865,8 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy, bnx2x_set_aer_mmd(params, phy); bnx2x_warpcore_enable_AN_KR2(phy, params, vars); + } else { + bnx2x_disable_kr2(params, vars, phy); } /* Enable Autoneg: only on the main lane */ @@ -4347,20 +4376,14 @@ static void bnx2x_warpcore_config_runtime(struct bnx2x_phy *phy, struct bnx2x *bp = params->bp; u32 serdes_net_if; u16 gp_status1 = 0, lnkup = 0, lnkup_kr = 0; - u16 lane = bnx2x_get_warpcore_lane(phy, params); vars->turn_to_run_wc_rt = vars->turn_to_run_wc_rt ? 0 : 1; if (!vars->turn_to_run_wc_rt) return; - /* Return if there is no link partner */ - if (!(bnx2x_warpcore_get_sigdet(phy, params))) { - DP(NETIF_MSG_LINK, "bnx2x_warpcore_get_sigdet false\n"); - return; - } - if (vars->rx_tx_asic_rst) { + u16 lane = bnx2x_get_warpcore_lane(phy, params); serdes_net_if = (REG_RD(bp, params->shmem_base + offsetof(struct shmem_region, dev_info. port_hw_config[params->port].default_cfg)) & @@ -4375,14 +4398,8 @@ static void bnx2x_warpcore_config_runtime(struct bnx2x_phy *phy, /*10G KR*/ lnkup_kr = (gp_status1 >> (12+lane)) & 0x1; - DP(NETIF_MSG_LINK, - "gp_status1 0x%x\n", gp_status1); - if (lnkup_kr || lnkup) { - vars->rx_tx_asic_rst = 0; - DP(NETIF_MSG_LINK, - "link up, rx_tx_asic_rst 0x%x\n", - vars->rx_tx_asic_rst); + vars->rx_tx_asic_rst = 0; } else { /* Reset the lane to see if link comes up.*/ bnx2x_warpcore_reset_lane(bp, phy, 1); @@ -4507,10 +4524,14 @@ static void bnx2x_warpcore_config_init(struct bnx2x_phy *phy, * enabled transmitter to avoid current leakage in case * no module is connected */ - if (bnx2x_is_sfp_module_plugged(phy, params)) - bnx2x_sfp_module_detection(phy, params); - else - bnx2x_sfp_e3_set_transmitter(params, phy, 1); + if ((params->loopback_mode == LOOPBACK_NONE) || + (params->loopback_mode == LOOPBACK_EXT)) { + if (bnx2x_is_sfp_module_plugged(phy, params)) + bnx2x_sfp_module_detection(phy, params); + else + bnx2x_sfp_e3_set_transmitter(params, + phy, 1); + } bnx2x_warpcore_config_sfi(phy, params); break; @@ -5757,6 +5778,11 @@ static int bnx2x_warpcore_read_status(struct bnx2x_phy *phy, rc = bnx2x_get_link_speed_duplex(phy, params, vars, link_up, gp_speed, duplex); + /* In case of KR link down, start up the recovering procedure */ + if ((!link_up) && (phy->media_type == ETH_PHY_KR) && + (!(phy->flags & FLAGS_WC_DUAL_MODE))) + vars->rx_tx_asic_rst = MAX_KR_LINK_RETRY; + DP(NETIF_MSG_LINK, "duplex %x flow_ctrl 0x%x link_status 0x%x\n", vars->duplex, vars->flow_ctrl, vars->link_status); return rc; @@ -6507,6 +6533,11 @@ static int bnx2x_link_initialize(struct link_params *params, params->phy[INT_PHY].config_init(phy, params, vars); } + /* Re-read this value in case it was changed inside config_init due to + * limitations of optic module + */ + vars->line_speed = params->phy[INT_PHY].req_line_speed; + /* Init external phy*/ if (non_ext_phy) { if (params->phy[INT_PHY].supported & @@ -8080,7 +8111,10 @@ static int bnx2x_get_edc_mode(struct bnx2x_phy *phy, if (copper_module_type & SFP_EEPROM_FC_TX_TECH_BITMASK_COPPER_ACTIVE) { DP(NETIF_MSG_LINK, "Active Copper cable detected\n"); - check_limiting_mode = 1; + if (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_DIRECT) + *edc_mode = EDC_MODE_ACTIVE_DAC; + else + check_limiting_mode = 1; } else if (copper_module_type & SFP_EEPROM_FC_TX_TECH_BITMASK_COPPER_PASSIVE) { DP(NETIF_MSG_LINK, @@ -8555,6 +8589,7 @@ static void bnx2x_warpcore_set_limiting_mode(struct link_params *params, mode = MDIO_WC_REG_UC_INFO_B1_FIRMWARE_MODE_DEFAULT; break; case EDC_MODE_PASSIVE_DAC: + case EDC_MODE_ACTIVE_DAC: mode = MDIO_WC_REG_UC_INFO_B1_FIRMWARE_MODE_SFP_DAC; break; default: @@ -9730,32 +9765,41 @@ static int bnx2x_848xx_cmn_config_init(struct bnx2x_phy *phy, MDIO_AN_DEVAD, MDIO_AN_REG_8481_1000T_CTRL, an_1000_val); - /* set 100 speed advertisement */ - if ((phy->req_line_speed == SPEED_AUTO_NEG) && - (phy->speed_cap_mask & - (PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_FULL | - PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_HALF))) { - an_10_100_val |= (1<<7); - /* Enable autoneg and restart autoneg for legacy speeds */ - autoneg_val |= (1<<9 | 1<<12); - - if (phy->req_duplex == DUPLEX_FULL) + /* Set 10/100 speed advertisement */ + if (phy->req_line_speed == SPEED_AUTO_NEG) { + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_FULL) { + /* Enable autoneg and restart autoneg for legacy speeds + */ + autoneg_val |= (1<<9 | 1<<12); an_10_100_val |= (1<<8); - DP(NETIF_MSG_LINK, "Advertising 100M\n"); - } - /* set 10 speed advertisement */ - if (((phy->req_line_speed == SPEED_AUTO_NEG) && - (phy->speed_cap_mask & - (PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_FULL | - PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_HALF)) && - (phy->supported & - (SUPPORTED_10baseT_Half | - SUPPORTED_10baseT_Full)))) { - an_10_100_val |= (1<<5); - autoneg_val |= (1<<9 | 1<<12); - if (phy->req_duplex == DUPLEX_FULL) + DP(NETIF_MSG_LINK, "Advertising 100M-FD\n"); + } + + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_HALF) { + /* Enable autoneg and restart autoneg for legacy speeds + */ + autoneg_val |= (1<<9 | 1<<12); + an_10_100_val |= (1<<7); + DP(NETIF_MSG_LINK, "Advertising 100M-HD\n"); + } + + if ((phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_FULL) && + (phy->supported & SUPPORTED_10baseT_Full)) { an_10_100_val |= (1<<6); - DP(NETIF_MSG_LINK, "Advertising 10M\n"); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 10M-FD\n"); + } + + if ((phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_HALF) && + (phy->supported & SUPPORTED_10baseT_Half)) { + an_10_100_val |= (1<<5); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 10M-HD\n"); + } } /* Only 10/100 are allowed to work in FORCE mode */ @@ -13432,43 +13476,6 @@ static void bnx2x_sfp_tx_fault_detection(struct bnx2x_phy *phy, } } } -static void bnx2x_disable_kr2(struct link_params *params, - struct link_vars *vars, - struct bnx2x_phy *phy) -{ - struct bnx2x *bp = params->bp; - int i; - static struct bnx2x_reg_set reg_set[] = { - /* Step 1 - Program the TX/RX alignment markers */ - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL5, 0x7690}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL7, 0xe647}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL6, 0xc4f0}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_TX_CTRL9, 0x7690}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_RX_CTRL11, 0xe647}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL82_USERB1_RX_CTRL10, 0xc4f0}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_USERB0_CTRL, 0x000c}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CTRL1, 0x6000}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CTRL3, 0x0000}, - {MDIO_WC_DEVAD, MDIO_WC_REG_CL73_BAM_CODE_FIELD, 0x0002}, - {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI1, 0x0000}, - {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI2, 0x0af7}, - {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_OUI3, 0x0af7}, - {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_LD_BAM_CODE, 0x0002}, - {MDIO_WC_DEVAD, MDIO_WC_REG_ETA_CL73_LD_UD_CODE, 0x0000} - }; - DP(NETIF_MSG_LINK, "Disabling 20G-KR2\n"); - - for (i = 0; i < ARRAY_SIZE(reg_set); i++) - bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg, - reg_set[i].val); - vars->link_attr_sync &= ~LINK_ATTR_SYNC_KR2_ENABLE; - bnx2x_update_link_attr(params, vars->link_attr_sync); - - vars->check_kr2_recovery_cnt = CHECK_KR2_RECOVERY_CNT; - /* Restart AN on leading lane */ - bnx2x_warpcore_restart_AN_KR(phy, params); -} - static void bnx2x_kr2_recovery(struct link_params *params, struct link_vars *vars, struct bnx2x_phy *phy) @@ -13546,6 +13553,8 @@ static void bnx2x_check_kr2_wa(struct link_params *params, /* Disable KR2 on both lanes */ DP(NETIF_MSG_LINK, "BP=0x%x, NP=0x%x\n", base_page, next_page); bnx2x_disable_kr2(params, vars, phy); + /* Restart AN on leading lane */ + bnx2x_warpcore_restart_AN_KR(phy, params); return; } } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index a6704b555042..82b658d8c04c 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -4703,6 +4703,14 @@ bool bnx2x_chk_parity_attn(struct bnx2x *bp, bool *global, bool print) attn.sig[3] = REG_RD(bp, MISC_REG_AEU_AFTER_INVERT_4_FUNC_0 + port*4); + /* Since MCP attentions can't be disabled inside the block, we need to + * read AEU registers to see whether they're currently disabled + */ + attn.sig[3] &= ((REG_RD(bp, + !port ? MISC_REG_AEU_ENABLE4_FUNC_0_OUT_0 + : MISC_REG_AEU_ENABLE4_FUNC_1_OUT_0) & + MISC_AEU_ENABLE_MCP_PRTY_BITS) | + ~MISC_AEU_ENABLE_MCP_PRTY_BITS); if (!CHIP_IS_E1x(bp)) attn.sig[4] = REG_RD(bp, @@ -5447,26 +5455,24 @@ static void bnx2x_timer(unsigned long data) if (IS_PF(bp) && !BP_NOMCP(bp)) { int mb_idx = BP_FW_MB_IDX(bp); - u32 drv_pulse; - u32 mcp_pulse; + u16 drv_pulse; + u16 mcp_pulse; ++bp->fw_drv_pulse_wr_seq; bp->fw_drv_pulse_wr_seq &= DRV_PULSE_SEQ_MASK; - /* TBD - add SYSTEM_TIME */ drv_pulse = bp->fw_drv_pulse_wr_seq; bnx2x_drv_pulse(bp); mcp_pulse = (SHMEM_RD(bp, func_mb[mb_idx].mcp_pulse_mb) & MCP_PULSE_SEQ_MASK); /* The delta between driver pulse and mcp response - * should be 1 (before mcp response) or 0 (after mcp response) + * should not get too big. If the MFW is more than 5 pulses + * behind, we should worry about it enough to generate an error + * log. */ - if ((drv_pulse != mcp_pulse) && - (drv_pulse != ((mcp_pulse + 1) & MCP_PULSE_SEQ_MASK))) { - /* someone lost a heartbeat... */ - BNX2X_ERR("drv_pulse (0x%x) != mcp_pulse (0x%x)\n", + if (((drv_pulse - mcp_pulse) & MCP_PULSE_SEQ_MASK) > 5) + BNX2X_ERR("MFW seems hanged: drv_pulse (0x%x) != mcp_pulse (0x%x)\n", drv_pulse, mcp_pulse); - } } if (bp->state == BNX2X_STATE_OPEN) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c index 2604b6204abe..9ad012bdd915 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c @@ -1819,7 +1819,7 @@ bnx2x_get_vf_igu_cam_info(struct bnx2x *bp) fid = GET_FIELD((val), IGU_REG_MAPPING_MEMORY_FID); if (fid & IGU_FID_ENCODE_IS_PF) current_pf = fid & IGU_FID_PF_NUM_MASK; - else if (current_pf == BP_ABS_FUNC(bp)) + else if (current_pf == BP_FUNC(bp)) bnx2x_vf_set_igu_info(bp, sb_id, (fid & IGU_FID_VF_NUM_MASK)); DP(BNX2X_MSG_IOV, "%s[%d], igu_sb_id=%d, msix=%d\n", @@ -3180,6 +3180,7 @@ int bnx2x_enable_sriov(struct bnx2x *bp) /* set local queue arrays */ vf->vfqs = &bp->vfdb->vfqs[qcount]; qcount += vf_sb_count(vf); + bnx2x_iov_static_resc(bp, vf); } /* prepare msix vectors in VF configuration space */ @@ -3187,6 +3188,8 @@ int bnx2x_enable_sriov(struct bnx2x *bp) bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf_idx)); REG_WR(bp, PCICFG_OFFSET + GRC_CONFIG_REG_VF_MSIX_CONTROL, num_vf_queues); + DP(BNX2X_MSG_IOV, "set msix vec num in VF %d cfg space to %d\n", + vf_idx, num_vf_queues); } bnx2x_pretend_func(bp, BP_ABS_FUNC(bp)); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c index 6cfb88732452..da16953eb2ec 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c @@ -1765,28 +1765,28 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf, switch (mbx->first_tlv.tl.type) { case CHANNEL_TLV_ACQUIRE: bnx2x_vf_mbx_acquire(bp, vf, mbx); - break; + return; case CHANNEL_TLV_INIT: bnx2x_vf_mbx_init_vf(bp, vf, mbx); - break; + return; case CHANNEL_TLV_SETUP_Q: bnx2x_vf_mbx_setup_q(bp, vf, mbx); - break; + return; case CHANNEL_TLV_SET_Q_FILTERS: bnx2x_vf_mbx_set_q_filters(bp, vf, mbx); - break; + return; case CHANNEL_TLV_TEARDOWN_Q: bnx2x_vf_mbx_teardown_q(bp, vf, mbx); - break; + return; case CHANNEL_TLV_CLOSE: bnx2x_vf_mbx_close_vf(bp, vf, mbx); - break; + return; case CHANNEL_TLV_RELEASE: bnx2x_vf_mbx_release_vf(bp, vf, mbx); - break; + return; case CHANNEL_TLV_UPDATE_RSS: bnx2x_vf_mbx_update_rss(bp, vf, mbx); - break; + return; } } else { @@ -1802,26 +1802,24 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf, for (i = 0; i < 20; i++) DP_CONT(BNX2X_MSG_IOV, "%x ", mbx->msg->req.tlv_buf_size.tlv_buffer[i]); + } - /* test whether we can respond to the VF (do we have an address - * for it?) - */ - if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) { - /* mbx_resp uses the op_rc of the VF */ - vf->op_rc = PFVF_STATUS_NOT_SUPPORTED; + /* can we respond to VF (do we have an address for it?) */ + if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) { + /* mbx_resp uses the op_rc of the VF */ + vf->op_rc = PFVF_STATUS_NOT_SUPPORTED; - /* notify the VF that we do not support this request */ - bnx2x_vf_mbx_resp(bp, vf); - } else { - /* can't send a response since this VF is unknown to us - * just ack the FW to release the mailbox and unlock - * the channel. - */ - storm_memset_vf_mbx_ack(bp, vf->abs_vfid); - mmiowb(); - bnx2x_unlock_vf_pf_channel(bp, vf, - mbx->first_tlv.tl.type); - } + /* notify the VF that we do not support this request */ + bnx2x_vf_mbx_resp(bp, vf); + } else { + /* can't send a response since this VF is unknown to us + * just ack the FW to release the mailbox and unlock + * the channel. + */ + storm_memset_vf_mbx_ack(bp, vf->abs_vfid); + /* Firmware ack should be written before unlocking channel */ + mmiowb(); + bnx2x_unlock_vf_pf_channel(bp, vf, mbx->first_tlv.tl.type); } } diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index ace5050dba38..db020230bd0b 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -88,6 +88,7 @@ static inline char *nic_name(struct pci_dev *pdev) #define BE_MIN_MTU 256 #define BE_NUM_VLANS_SUPPORTED 64 +#define BE_UMC_NUM_VLANS_SUPPORTED 15 #define BE_MAX_EQD 96u #define BE_MAX_TX_FRAG_COUNT 30 @@ -333,6 +334,7 @@ enum vf_state { #define BE_FLAGS_LINK_STATUS_INIT 1 #define BE_FLAGS_WORKER_SCHEDULED (1 << 3) +#define BE_FLAGS_VLAN_PROMISC (1 << 4) #define BE_FLAGS_NAPI_ENABLED (1 << 9) #define BE_UC_PMAC_COUNT 30 #define BE_VF_UC_PMAC_COUNT 2 diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index 1ab5dab11eff..bd0e0c0bbcd8 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -180,6 +180,9 @@ static int be_mcc_compl_process(struct be_adapter *adapter, dev_err(&adapter->pdev->dev, "opcode %d-%d failed:status %d-%d\n", opcode, subsystem, compl_status, extd_status); + + if (extd_status == MCC_ADDL_STS_INSUFFICIENT_RESOURCES) + return extd_status; } } done: @@ -1812,6 +1815,12 @@ int be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 value) } else if (flags & IFF_ALLMULTI) { req->if_flags_mask = req->if_flags = cpu_to_le32(BE_IF_FLAGS_MCAST_PROMISCUOUS); + } else if (flags & BE_FLAGS_VLAN_PROMISC) { + req->if_flags_mask = cpu_to_le32(BE_IF_FLAGS_VLAN_PROMISCUOUS); + + if (value == ON) + req->if_flags = + cpu_to_le32(BE_IF_FLAGS_VLAN_PROMISCUOUS); } else { struct netdev_hw_addr *ha; int i = 0; diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h index d026226db88c..108ca8abf0af 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.h +++ b/drivers/net/ethernet/emulex/benet/be_cmds.h @@ -60,6 +60,8 @@ enum { MCC_STATUS_NOT_SUPPORTED = 66 }; +#define MCC_ADDL_STS_INSUFFICIENT_RESOURCES 0x16 + #define CQE_STATUS_COMPL_MASK 0xFFFF #define CQE_STATUS_COMPL_SHIFT 0 /* bits 0 - 15 */ #define CQE_STATUS_EXTD_MASK 0xFFFF @@ -1791,7 +1793,7 @@ struct be_nic_res_desc { u8 acpi_params; u8 wol_param; u16 rsvd7; - u32 rsvd8[3]; + u32 rsvd8[7]; } __packed; struct be_cmd_req_get_func_config { diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 100b528b9bd0..2c38cc402119 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -855,11 +855,11 @@ static struct sk_buff *be_xmit_workarounds(struct be_adapter *adapter, unsigned int eth_hdr_len; struct iphdr *ip; - /* Lancer ASIC has a bug wherein packets that are 32 bytes or less + /* Lancer, SH-R ASICs have a bug wherein Packets that are 32 bytes or less * may cause a transmit stall on that port. So the work-around is to - * pad such packets to a 36-byte length. + * pad short packets (<= 32 bytes) to a 36-byte length. */ - if (unlikely(lancer_chip(adapter) && skb->len <= 32)) { + if (unlikely(!BEx_chip(adapter) && skb->len <= 32)) { if (skb_padto(skb, 36)) goto tx_drop; skb->len = 36; @@ -1013,18 +1013,40 @@ static int be_vid_config(struct be_adapter *adapter) status = be_cmd_vlan_config(adapter, adapter->if_handle, vids, num, 1, 0); - /* Set to VLAN promisc mode as setting VLAN filter failed */ if (status) { - dev_info(&adapter->pdev->dev, "Exhausted VLAN HW filters.\n"); - dev_info(&adapter->pdev->dev, "Disabling HW VLAN filtering.\n"); - goto set_vlan_promisc; + /* Set to VLAN promisc mode as setting VLAN filter failed */ + if (status == MCC_ADDL_STS_INSUFFICIENT_RESOURCES) + goto set_vlan_promisc; + dev_err(&adapter->pdev->dev, + "Setting HW VLAN filtering failed.\n"); + } else { + if (adapter->flags & BE_FLAGS_VLAN_PROMISC) { + /* hw VLAN filtering re-enabled. */ + status = be_cmd_rx_filter(adapter, + BE_FLAGS_VLAN_PROMISC, OFF); + if (!status) { + dev_info(&adapter->pdev->dev, + "Disabling VLAN Promiscuous mode.\n"); + adapter->flags &= ~BE_FLAGS_VLAN_PROMISC; + dev_info(&adapter->pdev->dev, + "Re-Enabling HW VLAN filtering\n"); + } + } } return status; set_vlan_promisc: - status = be_cmd_vlan_config(adapter, adapter->if_handle, - NULL, 0, 1, 1); + dev_warn(&adapter->pdev->dev, "Exhausted VLAN HW filters.\n"); + + status = be_cmd_rx_filter(adapter, BE_FLAGS_VLAN_PROMISC, ON); + if (!status) { + dev_info(&adapter->pdev->dev, "Enable VLAN Promiscuous mode\n"); + dev_info(&adapter->pdev->dev, "Disabling HW VLAN filtering\n"); + adapter->flags |= BE_FLAGS_VLAN_PROMISC; + } else + dev_err(&adapter->pdev->dev, + "Failed to enable VLAN Promiscuous mode.\n"); return status; } @@ -1033,10 +1055,6 @@ static int be_vlan_add_vid(struct net_device *netdev, __be16 proto, u16 vid) struct be_adapter *adapter = netdev_priv(netdev); int status = 0; - if (!lancer_chip(adapter) && !be_physfn(adapter)) { - status = -EINVAL; - goto ret; - } /* Packets with VID 0 are always received by Lancer by default */ if (lancer_chip(adapter) && vid == 0) @@ -1059,11 +1077,6 @@ static int be_vlan_rem_vid(struct net_device *netdev, __be16 proto, u16 vid) struct be_adapter *adapter = netdev_priv(netdev); int status = 0; - if (!lancer_chip(adapter) && !be_physfn(adapter)) { - status = -EINVAL; - goto ret; - } - /* Packets with VID 0 are always received by Lancer by default */ if (lancer_chip(adapter) && vid == 0) goto ret; @@ -1188,8 +1201,8 @@ static int be_get_vf_config(struct net_device *netdev, int vf, vi->vf = vf; vi->tx_rate = vf_cfg->tx_rate; - vi->vlan = vf_cfg->vlan_tag; - vi->qos = 0; + vi->vlan = vf_cfg->vlan_tag & VLAN_VID_MASK; + vi->qos = vf_cfg->vlan_tag >> VLAN_PRIO_SHIFT; memcpy(&vi->mac, vf_cfg->mac_addr, ETH_ALEN); return 0; @@ -1199,28 +1212,29 @@ static int be_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos) { struct be_adapter *adapter = netdev_priv(netdev); + struct be_vf_cfg *vf_cfg = &adapter->vf_cfg[vf]; int status = 0; if (!sriov_enabled(adapter)) return -EPERM; - if (vf >= adapter->num_vfs || vlan > 4095) + if (vf >= adapter->num_vfs || vlan > 4095 || qos > 7) return -EINVAL; - if (vlan) { - if (adapter->vf_cfg[vf].vlan_tag != vlan) { + if (vlan || qos) { + vlan |= qos << VLAN_PRIO_SHIFT; + if (vf_cfg->vlan_tag != vlan) { /* If this is new value, program it. Else skip. */ - adapter->vf_cfg[vf].vlan_tag = vlan; - - status = be_cmd_set_hsw_config(adapter, vlan, - vf + 1, adapter->vf_cfg[vf].if_handle, 0); + vf_cfg->vlan_tag = vlan; + status = be_cmd_set_hsw_config(adapter, vlan, vf + 1, + vf_cfg->if_handle, 0); } } else { /* Reset Transparent Vlan Tagging. */ - adapter->vf_cfg[vf].vlan_tag = 0; - vlan = adapter->vf_cfg[vf].def_vid; + vf_cfg->vlan_tag = 0; + vlan = vf_cfg->def_vid; status = be_cmd_set_hsw_config(adapter, vlan, vf + 1, - adapter->vf_cfg[vf].if_handle, 0); + vf_cfg->if_handle, 0); } @@ -2963,6 +2977,8 @@ static void BEx_get_resources(struct be_adapter *adapter, if (adapter->function_mode & FLEX10_MODE) res->max_vlans = BE_NUM_VLANS_SUPPORTED/8; + else if (adapter->function_mode & UMC_ENABLED) + res->max_vlans = BE_UMC_NUM_VLANS_SUPPORTED; else res->max_vlans = BE_NUM_VLANS_SUPPORTED; res->max_mcast_mac = BE_MAX_MC; diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/net/ethernet/freescale/gianfar_ptp.c index 098f133908ae..e006a09ba899 100644 --- a/drivers/net/ethernet/freescale/gianfar_ptp.c +++ b/drivers/net/ethernet/freescale/gianfar_ptp.c @@ -452,7 +452,9 @@ static int gianfar_ptp_probe(struct platform_device *dev) err = -ENODEV; etsects->caps = ptp_gianfar_caps; - etsects->cksel = DEFAULT_CKSEL; + + if (get_of_u32(node, "fsl,cksel", &etsects->cksel)) + etsects->cksel = DEFAULT_CKSEL; if (get_of_u32(node, "fsl,tclk-period", &etsects->tclk_period) || get_of_u32(node, "fsl,tmr-prsc", &etsects->tmr_prsc) || diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c index 0c524fa9f811..cfef7fc32cdd 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c @@ -701,8 +701,7 @@ i40e_status i40e_asq_send_command(struct i40e_hw *hw, details = I40E_ADMINQ_DETAILS(hw->aq.asq, hw->aq.asq.next_to_use); if (cmd_details) { - memcpy(details, cmd_details, - sizeof(struct i40e_asq_cmd_details)); + *details = *cmd_details; /* If the cmd_details are defined copy the cookie. The * cpu_to_le32 is not needed here because the data is ignored @@ -760,7 +759,7 @@ i40e_status i40e_asq_send_command(struct i40e_hw *hw, desc_on_ring = I40E_ADMINQ_DESC(hw->aq.asq, hw->aq.asq.next_to_use); /* if the desc is available copy the temp desc to the right place */ - memcpy(desc_on_ring, desc, sizeof(struct i40e_aq_desc)); + *desc_on_ring = *desc; /* if buff is not NULL assume indirect command */ if (buff != NULL) { @@ -807,7 +806,7 @@ i40e_status i40e_asq_send_command(struct i40e_hw *hw, /* if ready, copy the desc back to temp */ if (i40e_asq_done(hw)) { - memcpy(desc, desc_on_ring, sizeof(struct i40e_aq_desc)); + *desc = *desc_on_ring; if (buff != NULL) memcpy(buff, dma_buff->va, buff_size); retval = le16_to_cpu(desc->retval); diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index c21df7bc3b1d..1e4ea134975a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -507,7 +507,7 @@ i40e_status i40e_aq_get_link_info(struct i40e_hw *hw, /* save link status information */ if (link) - memcpy(link, hw_link_info, sizeof(struct i40e_link_status)); + *link = *hw_link_info; /* flag cleared so helper functions don't call AQ again */ hw->phy.get_link_info = false; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 601d482694ea..221aa4795017 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -101,10 +101,10 @@ int i40e_allocate_dma_mem_d(struct i40e_hw *hw, struct i40e_dma_mem *mem, mem->size = ALIGN(size, alignment); mem->va = dma_zalloc_coherent(&pf->pdev->dev, mem->size, &mem->pa, GFP_KERNEL); - if (mem->va) - return 0; + if (!mem->va) + return -ENOMEM; - return -ENOMEM; + return 0; } /** @@ -136,10 +136,10 @@ int i40e_allocate_virt_mem_d(struct i40e_hw *hw, struct i40e_virt_mem *mem, mem->size = size; mem->va = kzalloc(size, GFP_KERNEL); - if (mem->va) - return 0; + if (!mem->va) + return -ENOMEM; - return -ENOMEM; + return 0; } /** @@ -174,8 +174,7 @@ static int i40e_get_lump(struct i40e_pf *pf, struct i40e_lump_tracking *pile, u16 needed, u16 id) { int ret = -ENOMEM; - int i = 0; - int j = 0; + int i, j; if (!pile || needed == 0 || id >= I40E_PILE_VALID_BIT) { dev_info(&pf->pdev->dev, @@ -186,7 +185,7 @@ static int i40e_get_lump(struct i40e_pf *pf, struct i40e_lump_tracking *pile, /* start the linear search with an imperfect hint */ i = pile->search_hint; - while (i < pile->num_entries && ret < 0) { + while (i < pile->num_entries) { /* skip already allocated entries */ if (pile->list[i] & I40E_PILE_VALID_BIT) { i++; @@ -205,6 +204,7 @@ static int i40e_get_lump(struct i40e_pf *pf, struct i40e_lump_tracking *pile, pile->list[i+j] = id | I40E_PILE_VALID_BIT; ret = i; pile->search_hint = i + j; + break; } else { /* not enough, so skip over it and continue looking */ i += j; @@ -1388,7 +1388,7 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) bool add_happened = false; int filter_list_len = 0; u32 changed_flags = 0; - i40e_status ret = 0; + i40e_status aq_ret = 0; struct i40e_pf *pf; int num_add = 0; int num_del = 0; @@ -1449,28 +1449,28 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) /* flush a full buffer */ if (num_del == filter_list_len) { - ret = i40e_aq_remove_macvlan(&pf->hw, + aq_ret = i40e_aq_remove_macvlan(&pf->hw, vsi->seid, del_list, num_del, NULL); num_del = 0; memset(del_list, 0, sizeof(*del_list)); - if (ret) + if (aq_ret) dev_info(&pf->pdev->dev, "ignoring delete macvlan error, err %d, aq_err %d while flushing a full buffer\n", - ret, + aq_ret, pf->hw.aq.asq_last_status); } } if (num_del) { - ret = i40e_aq_remove_macvlan(&pf->hw, vsi->seid, + aq_ret = i40e_aq_remove_macvlan(&pf->hw, vsi->seid, del_list, num_del, NULL); num_del = 0; - if (ret) + if (aq_ret) dev_info(&pf->pdev->dev, "ignoring delete macvlan error, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); + aq_ret, pf->hw.aq.asq_last_status); } kfree(del_list); @@ -1515,32 +1515,30 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) /* flush a full buffer */ if (num_add == filter_list_len) { - ret = i40e_aq_add_macvlan(&pf->hw, - vsi->seid, - add_list, - num_add, - NULL); + aq_ret = i40e_aq_add_macvlan(&pf->hw, vsi->seid, + add_list, num_add, + NULL); num_add = 0; - if (ret) + if (aq_ret) break; memset(add_list, 0, sizeof(*add_list)); } } if (num_add) { - ret = i40e_aq_add_macvlan(&pf->hw, vsi->seid, - add_list, num_add, NULL); + aq_ret = i40e_aq_add_macvlan(&pf->hw, vsi->seid, + add_list, num_add, NULL); num_add = 0; } kfree(add_list); add_list = NULL; - if (add_happened && (!ret)) { + if (add_happened && (!aq_ret)) { /* do nothing */; - } else if (add_happened && (ret)) { + } else if (add_happened && (aq_ret)) { dev_info(&pf->pdev->dev, "add filter failed, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); + aq_ret, pf->hw.aq.asq_last_status); if ((pf->hw.aq.asq_last_status == I40E_AQ_RC_ENOSPC) && !test_bit(__I40E_FILTER_OVERFLOW_PROMISC, &vsi->state)) { @@ -1556,28 +1554,27 @@ int i40e_sync_vsi_filters(struct i40e_vsi *vsi) if (changed_flags & IFF_ALLMULTI) { bool cur_multipromisc; cur_multipromisc = !!(vsi->current_netdev_flags & IFF_ALLMULTI); - ret = i40e_aq_set_vsi_multicast_promiscuous(&vsi->back->hw, - vsi->seid, - cur_multipromisc, - NULL); - if (ret) + aq_ret = i40e_aq_set_vsi_multicast_promiscuous(&vsi->back->hw, + vsi->seid, + cur_multipromisc, + NULL); + if (aq_ret) dev_info(&pf->pdev->dev, "set multi promisc failed, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); + aq_ret, pf->hw.aq.asq_last_status); } if ((changed_flags & IFF_PROMISC) || promisc_forced_on) { bool cur_promisc; cur_promisc = (!!(vsi->current_netdev_flags & IFF_PROMISC) || test_bit(__I40E_FILTER_OVERFLOW_PROMISC, &vsi->state)); - ret = i40e_aq_set_vsi_unicast_promiscuous(&vsi->back->hw, - vsi->seid, - cur_promisc, - NULL); - if (ret) + aq_ret = i40e_aq_set_vsi_unicast_promiscuous(&vsi->back->hw, + vsi->seid, + cur_promisc, NULL); + if (aq_ret) dev_info(&pf->pdev->dev, "set uni promisc failed, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); + aq_ret, pf->hw.aq.asq_last_status); } clear_bit(__I40E_CONFIG_BUSY, &vsi->state); @@ -1790,6 +1787,8 @@ int i40e_vsi_add_vlan(struct i40e_vsi *vsi, s16 vid) * i40e_vsi_kill_vlan - Remove vsi membership for given vlan * @vsi: the vsi being configured * @vid: vlan id to be removed (0 = untagged only , -1 = any) + * + * Return: 0 on success or negative otherwise **/ int i40e_vsi_kill_vlan(struct i40e_vsi *vsi, s16 vid) { @@ -1863,37 +1862,39 @@ int i40e_vsi_kill_vlan(struct i40e_vsi *vsi, s16 vid) * i40e_vlan_rx_add_vid - Add a vlan id filter to HW offload * @netdev: network interface to be adjusted * @vid: vlan id to be added + * + * net_device_ops implementation for adding vlan ids **/ static int i40e_vlan_rx_add_vid(struct net_device *netdev, __always_unused __be16 proto, u16 vid) { struct i40e_netdev_priv *np = netdev_priv(netdev); struct i40e_vsi *vsi = np->vsi; - int ret; + int ret = 0; if (vid > 4095) - return 0; + return -EINVAL; + + netdev_info(netdev, "adding %pM vid=%d\n", netdev->dev_addr, vid); - netdev_info(vsi->netdev, "adding %pM vid=%d\n", - netdev->dev_addr, vid); /* If the network stack called us with vid = 0, we should * indicate to i40e_vsi_add_vlan() that we want to receive * any traffic (i.e. with any vlan tag, or untagged) */ ret = i40e_vsi_add_vlan(vsi, vid ? vid : I40E_VLAN_ANY); - if (!ret) { - if (vid < VLAN_N_VID) - set_bit(vid, vsi->active_vlans); - } + if (!ret && (vid < VLAN_N_VID)) + set_bit(vid, vsi->active_vlans); - return 0; + return ret; } /** * i40e_vlan_rx_kill_vid - Remove a vlan id filter from HW offload * @netdev: network interface to be adjusted * @vid: vlan id to be removed + * + * net_device_ops implementation for adding vlan ids **/ static int i40e_vlan_rx_kill_vid(struct net_device *netdev, __always_unused __be16 proto, u16 vid) @@ -1901,15 +1902,16 @@ static int i40e_vlan_rx_kill_vid(struct net_device *netdev, struct i40e_netdev_priv *np = netdev_priv(netdev); struct i40e_vsi *vsi = np->vsi; - netdev_info(vsi->netdev, "removing %pM vid=%d\n", - netdev->dev_addr, vid); + netdev_info(netdev, "removing %pM vid=%d\n", netdev->dev_addr, vid); + /* return code is ignored as there is nothing a user * can do about failure to remove and a log message was - * already printed from another function + * already printed from the other function */ i40e_vsi_kill_vlan(vsi, vid); clear_bit(vid, vsi->active_vlans); + return 0; } @@ -1936,10 +1938,10 @@ static void i40e_restore_vlan(struct i40e_vsi *vsi) * @vsi: the vsi being adjusted * @vid: the vlan id to set as a PVID **/ -i40e_status i40e_vsi_add_pvid(struct i40e_vsi *vsi, u16 vid) +int i40e_vsi_add_pvid(struct i40e_vsi *vsi, u16 vid) { struct i40e_vsi_context ctxt; - i40e_status ret; + i40e_status aq_ret; vsi->info.valid_sections = cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID); vsi->info.pvid = cpu_to_le16(vid); @@ -1948,14 +1950,15 @@ i40e_status i40e_vsi_add_pvid(struct i40e_vsi *vsi, u16 vid) ctxt.seid = vsi->seid; memcpy(&ctxt.info, &vsi->info, sizeof(vsi->info)); - ret = i40e_aq_update_vsi_params(&vsi->back->hw, &ctxt, NULL); - if (ret) { + aq_ret = i40e_aq_update_vsi_params(&vsi->back->hw, &ctxt, NULL); + if (aq_ret) { dev_info(&vsi->back->pdev->dev, "%s: update vsi failed, aq_err=%d\n", __func__, vsi->back->hw.aq.asq_last_status); + return -ENOENT; } - return ret; + return 0; } /** @@ -3326,7 +3329,8 @@ static void i40e_pf_unquiesce_all_vsi(struct i40e_pf *pf) **/ static u8 i40e_dcb_get_num_tc(struct i40e_dcbx_config *dcbcfg) { - int num_tc = 0, i; + u8 num_tc = 0; + int i; /* Scan the ETS Config Priority Table to find * traffic class enabled for a given priority @@ -3341,9 +3345,7 @@ static u8 i40e_dcb_get_num_tc(struct i40e_dcbx_config *dcbcfg) /* Traffic class index starts from zero so * increment to return the actual count */ - num_tc++; - - return num_tc; + return num_tc + 1; } /** @@ -3451,28 +3453,27 @@ static int i40e_vsi_get_bw_info(struct i40e_vsi *vsi) struct i40e_aqc_query_vsi_bw_config_resp bw_config = {0}; struct i40e_pf *pf = vsi->back; struct i40e_hw *hw = &pf->hw; + i40e_status aq_ret; u32 tc_bw_max; - int ret; int i; /* Get the VSI level BW configuration */ - ret = i40e_aq_query_vsi_bw_config(hw, vsi->seid, &bw_config, NULL); - if (ret) { + aq_ret = i40e_aq_query_vsi_bw_config(hw, vsi->seid, &bw_config, NULL); + if (aq_ret) { dev_info(&pf->pdev->dev, "couldn't get pf vsi bw config, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); - return ret; + aq_ret, pf->hw.aq.asq_last_status); + return -EINVAL; } /* Get the VSI level BW configuration per TC */ - ret = i40e_aq_query_vsi_ets_sla_config(hw, vsi->seid, - &bw_ets_config, - NULL); - if (ret) { + aq_ret = i40e_aq_query_vsi_ets_sla_config(hw, vsi->seid, &bw_ets_config, + NULL); + if (aq_ret) { dev_info(&pf->pdev->dev, "couldn't get pf vsi ets bw config, err %d, aq_err %d\n", - ret, pf->hw.aq.asq_last_status); - return ret; + aq_ret, pf->hw.aq.asq_last_status); + return -EINVAL; } if (bw_config.tc_valid_bits != bw_ets_config.tc_valid_bits) { @@ -3494,7 +3495,8 @@ static int i40e_vsi_get_bw_info(struct i40e_vsi *vsi) /* 3 bits out of 4 for each TC */ vsi->bw_ets_max_quanta[i] = (u8)((tc_bw_max >> (i*4)) & 0x7); } - return ret; + + return 0; } /** @@ -3505,30 +3507,30 @@ static int i40e_vsi_get_bw_info(struct i40e_vsi *vsi) * * Returns 0 on success, negative value on failure **/ -static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, - u8 enabled_tc, +static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc, u8 *bw_share) { struct i40e_aqc_configure_vsi_tc_bw_data bw_data; - int i, ret = 0; + i40e_status aq_ret; + int i; bw_data.tc_valid_bits = enabled_tc; for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) bw_data.tc_bw_credits[i] = bw_share[i]; - ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, vsi->seid, - &bw_data, NULL); - if (ret) { + aq_ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, vsi->seid, &bw_data, + NULL); + if (aq_ret) { dev_info(&vsi->back->pdev->dev, "%s: AQ command Config VSI BW allocation per TC failed = %d\n", __func__, vsi->back->hw.aq.asq_last_status); - return ret; + return -EINVAL; } for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) vsi->info.qs_handle[i] = bw_data.qs_handles[i]; - return ret; + return 0; } /** diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 48cbc833b051..86d51429a189 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -1607,6 +1607,9 @@ static int igb_integrated_phy_loopback(struct igb_adapter *adapter) igb_write_phy_reg(hw, I347AT4_PAGE_SELECT, 0); igb_write_phy_reg(hw, PHY_CONTROL, 0x4140); } + } else if (hw->phy.type == e1000_phy_82580) { + /* enable MII loopback */ + igb_write_phy_reg(hw, I82580_PHY_LBK_CTRL, 0x8041); } /* add small delay to avoid loopback test failure */ diff --git a/drivers/net/ethernet/marvell/skge.c b/drivers/net/ethernet/marvell/skge.c index 1a9c4f6269ea..ecc7f7b696b8 100644 --- a/drivers/net/ethernet/marvell/skge.c +++ b/drivers/net/ethernet/marvell/skge.c @@ -3086,13 +3086,16 @@ static struct sk_buff *skge_rx_get(struct net_device *dev, PCI_DMA_FROMDEVICE); skge_rx_reuse(e, skge->rx_buf_size); } else { + struct skge_element ee; struct sk_buff *nskb; nskb = netdev_alloc_skb_ip_align(dev, skge->rx_buf_size); if (!nskb) goto resubmit; - skb = e->skb; + ee = *e; + + skb = ee.skb; prefetch(skb->data); if (skge_rx_setup(skge, e, nskb, skge->rx_buf_size) < 0) { @@ -3101,8 +3104,8 @@ static struct sk_buff *skge_rx_get(struct net_device *dev, } pci_unmap_single(skge->hw->pdev, - dma_unmap_addr(e, mapaddr), - dma_unmap_len(e, maplen), + dma_unmap_addr(&ee, mapaddr), + dma_unmap_len(&ee, maplen), PCI_DMA_FROMDEVICE); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 5472cbd34028..6ca30739625f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -180,28 +180,32 @@ static int verify_block_sig(struct mlx5_cmd_prot_block *block) return 0; } -static void calc_block_sig(struct mlx5_cmd_prot_block *block, u8 token) +static void calc_block_sig(struct mlx5_cmd_prot_block *block, u8 token, + int csum) { block->token = token; - block->ctrl_sig = ~xor8_buf(block->rsvd0, sizeof(*block) - sizeof(block->data) - 2); - block->sig = ~xor8_buf(block, sizeof(*block) - 1); + if (csum) { + block->ctrl_sig = ~xor8_buf(block->rsvd0, sizeof(*block) - + sizeof(block->data) - 2); + block->sig = ~xor8_buf(block, sizeof(*block) - 1); + } } -static void calc_chain_sig(struct mlx5_cmd_msg *msg, u8 token) +static void calc_chain_sig(struct mlx5_cmd_msg *msg, u8 token, int csum) { struct mlx5_cmd_mailbox *next = msg->next; while (next) { - calc_block_sig(next->buf, token); + calc_block_sig(next->buf, token, csum); next = next->next; } } -static void set_signature(struct mlx5_cmd_work_ent *ent) +static void set_signature(struct mlx5_cmd_work_ent *ent, int csum) { ent->lay->sig = ~xor8_buf(ent->lay, sizeof(*ent->lay)); - calc_chain_sig(ent->in, ent->token); - calc_chain_sig(ent->out, ent->token); + calc_chain_sig(ent->in, ent->token, csum); + calc_chain_sig(ent->out, ent->token, csum); } static void poll_timeout(struct mlx5_cmd_work_ent *ent) @@ -539,8 +543,7 @@ static void cmd_work_handler(struct work_struct *work) lay->type = MLX5_PCI_CMD_XPORT; lay->token = ent->token; lay->status_own = CMD_OWNER_HW; - if (!cmd->checksum_disabled) - set_signature(ent); + set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ktime_get_ts(&ent->ts1); @@ -773,8 +776,6 @@ static int mlx5_copy_from_msg(void *to, struct mlx5_cmd_msg *from, int size) copy = min_t(int, size, MLX5_CMD_DATA_BLOCK_SIZE); block = next->buf; - if (xor8_buf(block, sizeof(*block)) != 0xff) - return -EINVAL; memcpy(to, block->data, copy); to += copy; @@ -1361,6 +1362,7 @@ int mlx5_cmd_init(struct mlx5_core_dev *dev) goto err_map; } + cmd->checksum_disabled = 1; cmd->max_reg_cmds = (1 << cmd->log_sz) - 1; cmd->bitmask = (1 << cmd->max_reg_cmds) - 1; @@ -1510,7 +1512,7 @@ int mlx5_cmd_status_to_err(struct mlx5_outbox_hdr *hdr) case MLX5_CMD_STAT_BAD_SYS_STATE_ERR: return -EIO; case MLX5_CMD_STAT_BAD_RES_ERR: return -EINVAL; case MLX5_CMD_STAT_RES_BUSY: return -EBUSY; - case MLX5_CMD_STAT_LIM_ERR: return -EINVAL; + case MLX5_CMD_STAT_LIM_ERR: return -ENOMEM; case MLX5_CMD_STAT_BAD_RES_STATE_ERR: return -EINVAL; case MLX5_CMD_STAT_IX_ERR: return -EINVAL; case MLX5_CMD_STAT_NO_RES_ERR: return -EAGAIN; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index 443cc4d7b024..2231d93cc7ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -366,9 +366,11 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx, goto err_in; } + snprintf(eq->name, MLX5_MAX_EQ_NAME, "%s@pci:%s", + name, pci_name(dev->pdev)); eq->eqn = out.eq_number; err = request_irq(table->msix_arr[vecidx].vector, mlx5_msix_handler, 0, - name, eq); + eq->name, eq); if (err) goto err_eq; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index b47739b0b5f6..bc0f5fb66e24 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -165,9 +165,7 @@ static int handle_hca_cap(struct mlx5_core_dev *dev) struct mlx5_cmd_set_hca_cap_mbox_in *set_ctx = NULL; struct mlx5_cmd_query_hca_cap_mbox_in query_ctx; struct mlx5_cmd_set_hca_cap_mbox_out set_out; - struct mlx5_profile *prof = dev->profile; u64 flags; - int csum = 1; int err; memset(&query_ctx, 0, sizeof(query_ctx)); @@ -197,20 +195,14 @@ static int handle_hca_cap(struct mlx5_core_dev *dev) memcpy(&set_ctx->hca_cap, &query_out->hca_cap, sizeof(set_ctx->hca_cap)); - if (prof->mask & MLX5_PROF_MASK_CMDIF_CSUM) { - csum = !!prof->cmdif_csum; - flags = be64_to_cpu(set_ctx->hca_cap.flags); - if (csum) - flags |= MLX5_DEV_CAP_FLAG_CMDIF_CSUM; - else - flags &= ~MLX5_DEV_CAP_FLAG_CMDIF_CSUM; - - set_ctx->hca_cap.flags = cpu_to_be64(flags); - } - if (dev->profile->mask & MLX5_PROF_MASK_QP_SIZE) set_ctx->hca_cap.log_max_qp = dev->profile->log_max_qp; + flags = be64_to_cpu(query_out->hca_cap.flags); + /* disable checksum */ + flags &= ~MLX5_DEV_CAP_FLAG_CMDIF_CSUM; + + set_ctx->hca_cap.flags = cpu_to_be64(flags); memset(&set_out, 0, sizeof(set_out)); set_ctx->hca_cap.log_uar_page_sz = cpu_to_be16(PAGE_SHIFT - 12); set_ctx->hdr.opcode = cpu_to_be16(MLX5_CMD_OP_SET_HCA_CAP); @@ -225,9 +217,6 @@ static int handle_hca_cap(struct mlx5_core_dev *dev) if (err) goto query_ex; - if (!csum) - dev->cmd.checksum_disabled = 1; - query_ex: kfree(query_out); kfree(set_ctx); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index 3a2408d44820..7b12acf210f8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -90,6 +90,10 @@ struct mlx5_manage_pages_outbox { __be64 pas[0]; }; +enum { + MAX_RECLAIM_TIME_MSECS = 5000, +}; + static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u16 func_id) { struct rb_root *root = &dev->priv.page_root; @@ -279,6 +283,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u32 func_id, int npages, int err; int i; + if (nclaimed) + *nclaimed = 0; + memset(&in, 0, sizeof(in)); outlen = sizeof(*out) + npages * sizeof(out->pas[0]); out = mlx5_vzalloc(outlen); @@ -388,20 +395,25 @@ static int optimal_reclaimed_pages(void) int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev) { - unsigned long end = jiffies + msecs_to_jiffies(5000); + unsigned long end = jiffies + msecs_to_jiffies(MAX_RECLAIM_TIME_MSECS); struct fw_page *fwp; struct rb_node *p; + int nclaimed = 0; int err; do { p = rb_first(&dev->priv.page_root); if (p) { fwp = rb_entry(p, struct fw_page, rb_node); - err = reclaim_pages(dev, fwp->func_id, optimal_reclaimed_pages(), NULL); + err = reclaim_pages(dev, fwp->func_id, + optimal_reclaimed_pages(), + &nclaimed); if (err) { mlx5_core_warn(dev, "failed reclaiming pages (%d)\n", err); return err; } + if (nclaimed) + end = jiffies + msecs_to_jiffies(MAX_RECLAIM_TIME_MSECS); } if (time_after(jiffies, end)) { mlx5_core_warn(dev, "FW did not return all pages. giving up...\n"); diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c index 83c2091c9c23..bd1a2d2bc2ae 100644 --- a/drivers/net/ethernet/moxa/moxart_ether.c +++ b/drivers/net/ethernet/moxa/moxart_ether.c @@ -543,7 +543,7 @@ static const struct of_device_id moxart_mac_match[] = { { } }; -struct __initdata platform_driver moxart_mac_driver = { +static struct platform_driver moxart_mac_driver = { .probe = moxart_mac_probe, .remove = moxart_remove, .driver = { diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c index 4d7ad0074d1c..ebe4c86e5230 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c @@ -1794,3 +1794,11 @@ const struct ethtool_ops qlcnic_sriov_vf_ethtool_ops = { .set_msglevel = qlcnic_set_msglevel, .get_msglevel = qlcnic_get_msglevel, }; + +const struct ethtool_ops qlcnic_ethtool_failed_ops = { + .get_settings = qlcnic_get_settings, + .get_drvinfo = qlcnic_get_drvinfo, + .set_msglevel = qlcnic_set_msglevel, + .get_msglevel = qlcnic_get_msglevel, + .set_dump = qlcnic_set_dump, +}; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index c4c5023e1fdf..21d00a0449a1 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -431,6 +431,9 @@ static void qlcnic_82xx_cancel_idc_work(struct qlcnic_adapter *adapter) while (test_and_set_bit(__QLCNIC_RESETTING, &adapter->state)) usleep_range(10000, 11000); + if (!adapter->fw_work.work.func) + return; + cancel_delayed_work_sync(&adapter->fw_work); } @@ -2275,8 +2278,9 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->portnum = adapter->ahw->pci_func; err = qlcnic_start_firmware(adapter); if (err) { - dev_err(&pdev->dev, "Loading fw failed.Please Reboot\n"); - goto err_out_free_hw; + dev_err(&pdev->dev, "Loading fw failed.Please Reboot\n" + "\t\tIf reboot doesn't help, try flashing the card\n"); + goto err_out_maintenance_mode; } qlcnic_get_multiq_capability(adapter); @@ -2408,6 +2412,22 @@ err_out_disable_pdev: pci_set_drvdata(pdev, NULL); pci_disable_device(pdev); return err; + +err_out_maintenance_mode: + netdev->netdev_ops = &qlcnic_netdev_failed_ops; + SET_ETHTOOL_OPS(netdev, &qlcnic_ethtool_failed_ops); + err = register_netdev(netdev); + + if (err) { + dev_err(&pdev->dev, "Failed to register net device\n"); + qlcnic_clr_all_drv_state(adapter, 0); + goto err_out_free_hw; + } + + pci_set_drvdata(pdev, adapter); + qlcnic_add_sysfs(adapter); + + return 0; } static void qlcnic_remove(struct pci_dev *pdev) @@ -2518,8 +2538,16 @@ static int qlcnic_resume(struct pci_dev *pdev) static int qlcnic_open(struct net_device *netdev) { struct qlcnic_adapter *adapter = netdev_priv(netdev); + u32 state; int err; + state = QLC_SHARED_REG_RD32(adapter, QLCNIC_CRB_DEV_STATE); + if (state == QLCNIC_DEV_FAILED || state == QLCNIC_DEV_BADBAD) { + netdev_err(netdev, "%s: Device is in FAILED state\n", __func__); + + return -EIO; + } + netif_carrier_off(netdev); err = qlcnic_attach(adapter); @@ -3228,6 +3256,13 @@ void qlcnic_82xx_dev_request_reset(struct qlcnic_adapter *adapter, u32 key) return; state = QLC_SHARED_REG_RD32(adapter, QLCNIC_CRB_DEV_STATE); + if (state == QLCNIC_DEV_FAILED || state == QLCNIC_DEV_BADBAD) { + netdev_err(adapter->netdev, "%s: Device is in FAILED state\n", + __func__); + qlcnic_api_unlock(adapter); + + return; + } if (state == QLCNIC_DEV_READY) { QLC_SHARED_REG_WR32(adapter, QLCNIC_CRB_DEV_STATE, diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c index 330d9a8774ad..686f460b1502 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c @@ -397,6 +397,7 @@ static int qlcnic_pci_sriov_disable(struct qlcnic_adapter *adapter) { struct net_device *netdev = adapter->netdev; + rtnl_lock(); if (netif_running(netdev)) __qlcnic_down(adapter, netdev); @@ -407,12 +408,15 @@ static int qlcnic_pci_sriov_disable(struct qlcnic_adapter *adapter) /* After disabling SRIOV re-init the driver in default mode configure opmode based on op_mode of function */ - if (qlcnic_83xx_configure_opmode(adapter)) + if (qlcnic_83xx_configure_opmode(adapter)) { + rtnl_unlock(); return -EIO; + } if (netif_running(netdev)) __qlcnic_up(adapter, netdev); + rtnl_unlock(); return 0; } @@ -533,6 +537,7 @@ static int qlcnic_pci_sriov_enable(struct qlcnic_adapter *adapter, int num_vfs) return -EIO; } + rtnl_lock(); if (netif_running(netdev)) __qlcnic_down(adapter, netdev); @@ -555,6 +560,7 @@ static int qlcnic_pci_sriov_enable(struct qlcnic_adapter *adapter, int num_vfs) __qlcnic_up(adapter, netdev); error: + rtnl_unlock(); return err; } diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c index c6165d05cc13..019f4377307f 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c @@ -1272,6 +1272,7 @@ void qlcnic_remove_sysfs_entries(struct qlcnic_adapter *adapter) void qlcnic_create_diag_entries(struct qlcnic_adapter *adapter) { struct device *dev = &adapter->pdev->dev; + u32 state; if (device_create_bin_file(dev, &bin_attr_port_stats)) dev_info(dev, "failed to create port stats sysfs entry"); @@ -1285,8 +1286,13 @@ void qlcnic_create_diag_entries(struct qlcnic_adapter *adapter) if (device_create_bin_file(dev, &bin_attr_mem)) dev_info(dev, "failed to create mem sysfs entry\n"); + state = QLC_SHARED_REG_RD32(adapter, QLCNIC_CRB_DEV_STATE); + if (state == QLCNIC_DEV_FAILED || state == QLCNIC_DEV_BADBAD) + return; + if (device_create_bin_file(dev, &bin_attr_pci_config)) dev_info(dev, "failed to create pci config sysfs entry"); + if (device_create_file(dev, &dev_attr_beacon)) dev_info(dev, "failed to create beacon sysfs entry"); @@ -1307,6 +1313,7 @@ void qlcnic_create_diag_entries(struct qlcnic_adapter *adapter) void qlcnic_remove_diag_entries(struct qlcnic_adapter *adapter) { struct device *dev = &adapter->pdev->dev; + u32 state; device_remove_bin_file(dev, &bin_attr_port_stats); @@ -1315,6 +1322,11 @@ void qlcnic_remove_diag_entries(struct qlcnic_adapter *adapter) device_remove_file(dev, &dev_attr_diag_mode); device_remove_bin_file(dev, &bin_attr_crb); device_remove_bin_file(dev, &bin_attr_mem); + + state = QLC_SHARED_REG_RD32(adapter, QLCNIC_CRB_DEV_STATE); + if (state == QLCNIC_DEV_FAILED || state == QLCNIC_DEV_BADBAD) + return; + device_remove_bin_file(dev, &bin_attr_pci_config); device_remove_file(dev, &dev_attr_beacon); if (!(adapter->flags & QLCNIC_ESWITCH_ENABLED)) diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c index 10093f0c4c0f..6bc5db703920 100644 --- a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c +++ b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c @@ -740,8 +740,8 @@ int ql_core_dump(struct ql_adapter *qdev, struct ql_mpi_coredump *mpi_coredump) int i; if (!mpi_coredump) { - netif_err(qdev, drv, qdev->ndev, "No memory available\n"); - return -ENOMEM; + netif_err(qdev, drv, qdev->ndev, "No memory allocated\n"); + return -EINVAL; } /* Try to get the spinlock, but dont worry if diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_mpi.c b/drivers/net/ethernet/qlogic/qlge/qlge_mpi.c index ff2bf8a4e247..7ad146080c36 100644 --- a/drivers/net/ethernet/qlogic/qlge/qlge_mpi.c +++ b/drivers/net/ethernet/qlogic/qlge/qlge_mpi.c @@ -1274,7 +1274,7 @@ void ql_mpi_reset_work(struct work_struct *work) return; } - if (!ql_core_dump(qdev, qdev->mpi_coredump)) { + if (qdev->mpi_coredump && !ql_core_dump(qdev, qdev->mpi_coredump)) { netif_err(qdev, drv, qdev->ndev, "Core is dumped!\n"); qdev->core_is_dumped = 1; queue_delayed_work(qdev->workqueue, diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c index 128d7cdf9eb2..c082562dbf4e 100644 --- a/drivers/net/ethernet/sfc/mcdi.c +++ b/drivers/net/ethernet/sfc/mcdi.c @@ -27,10 +27,10 @@ /* A reboot/assertion causes the MCDI status word to be set after the * command word is set or a REBOOT event is sent. If we notice a reboot - * via these mechanisms then wait 20ms for the status word to be set. + * via these mechanisms then wait 250ms for the status word to be set. */ #define MCDI_STATUS_DELAY_US 100 -#define MCDI_STATUS_DELAY_COUNT 200 +#define MCDI_STATUS_DELAY_COUNT 2500 #define MCDI_STATUS_SLEEP_MS \ (MCDI_STATUS_DELAY_US * MCDI_STATUS_DELAY_COUNT / 1000) @@ -800,9 +800,6 @@ static void efx_mcdi_ev_death(struct efx_nic *efx, int rc) } else { int count; - /* Nobody was waiting for an MCDI request, so trigger a reset */ - efx_schedule_reset(efx, RESET_TYPE_MC_FAILURE); - /* Consume the status word since efx_mcdi_rpc_finish() won't */ for (count = 0; count < MCDI_STATUS_DELAY_COUNT; ++count) { if (efx_mcdi_poll_reboot(efx)) @@ -810,6 +807,9 @@ static void efx_mcdi_ev_death(struct efx_nic *efx, int rc) udelay(MCDI_STATUS_DELAY_US); } mcdi->new_epoch = true; + + /* Nobody was waiting for an MCDI request, so trigger a reset */ + efx_schedule_reset(efx, RESET_TYPE_MC_FAILURE); } spin_unlock(&mcdi->iface_lock); diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c index c8f088ab5fdf..bdf697b184ae 100644 --- a/drivers/net/ethernet/via/via-rhine.c +++ b/drivers/net/ethernet/via/via-rhine.c @@ -32,7 +32,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #define DRV_NAME "via-rhine" -#define DRV_VERSION "1.5.0" +#define DRV_VERSION "1.5.1" #define DRV_RELDATE "2010-10-09" #include @@ -1704,7 +1704,12 @@ static netdev_tx_t rhine_start_tx(struct sk_buff *skb, cpu_to_le32(TXDESC | (skb->len >= ETH_ZLEN ? skb->len : ETH_ZLEN)); if (unlikely(vlan_tx_tag_present(skb))) { - rp->tx_ring[entry].tx_status = cpu_to_le32((vlan_tx_tag_get(skb)) << 16); + u16 vid_pcp = vlan_tx_tag_get(skb); + + /* drop CFI/DEI bit, register needs VID and PCP */ + vid_pcp = (vid_pcp & VLAN_VID_MASK) | + ((vid_pcp & VLAN_PRIO_MASK) >> 1); + rp->tx_ring[entry].tx_status = cpu_to_le32((vid_pcp) << 16); /* request tagging */ rp->tx_ring[entry].desc_length |= cpu_to_le32(0x020000); } diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c index b88121f240ca..0029148077a9 100644 --- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -297,6 +297,12 @@ static int temac_dma_bd_init(struct net_device *ndev) lp->rx_bd_p + (sizeof(*lp->rx_bd_v) * (RX_BD_NUM - 1))); lp->dma_out(lp, TX_CURDESC_PTR, lp->tx_bd_p); + /* Init descriptor indexes */ + lp->tx_bd_ci = 0; + lp->tx_bd_next = 0; + lp->tx_bd_tail = 0; + lp->rx_bd_ci = 0; + return 0; out: diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c index a34d6bf5e43b..cc70ecfc7062 100644 --- a/drivers/net/slip/slip.c +++ b/drivers/net/slip/slip.c @@ -429,11 +429,13 @@ static void slip_write_wakeup(struct tty_struct *tty) if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev)) return; + spin_lock(&sl->lock); if (sl->xleft <= 0) { /* Now serial buffer is almost free & we can start * transmission of another packet */ sl->dev->stats.tx_packets++; clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); + spin_unlock(&sl->lock); sl_unlock(sl); return; } @@ -441,6 +443,7 @@ static void slip_write_wakeup(struct tty_struct *tty) actual = tty->ops->write(tty, sl->xhead, sl->xleft); sl->xleft -= actual; sl->xhead += actual; + spin_unlock(&sl->lock); } static void sl_tx_timeout(struct net_device *dev) diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index 2dbb9460349d..c6867f926cff 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -303,7 +303,7 @@ static void dm9601_set_multicast(struct net_device *net) rx_ctl |= 0x02; } else if (net->flags & IFF_ALLMULTI || netdev_mc_count(net) > DM_MAX_MCAST) { - rx_ctl |= 0x04; + rx_ctl |= 0x08; } else if (!netdev_mc_empty(net)) { struct netdev_hw_addr *ha; diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 6312332afeba..3d6aaf79d8b2 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -714,7 +714,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ - {QMI_FIXED_INTF(0x1e2d, 0x12d1, 4)}, /* Cinterion PLxx */ + {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 7b331e613e02..bf94e10a37c8 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1241,7 +1241,9 @@ static int build_dma_sg(const struct sk_buff *skb, struct urb *urb) if (num_sgs == 1) return 0; - urb->sg = kmalloc(num_sgs * sizeof(struct scatterlist), GFP_ATOMIC); + /* reserve one for zero packet */ + urb->sg = kmalloc((num_sgs + 1) * sizeof(struct scatterlist), + GFP_ATOMIC); if (!urb->sg) return -ENOMEM; @@ -1305,7 +1307,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb, if (build_dma_sg(skb, urb) < 0) goto drop; } - entry->length = length = urb->transfer_buffer_length; + length = urb->transfer_buffer_length; /* don't assume the hardware handles USB_ZERO_PACKET * NOTE: strictly conforming cdc-ether devices should expect @@ -1317,15 +1319,18 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb, if (length % dev->maxpacket == 0) { if (!(info->flags & FLAG_SEND_ZLP)) { if (!(info->flags & FLAG_MULTI_PACKET)) { - urb->transfer_buffer_length++; - if (skb_tailroom(skb)) { + length++; + if (skb_tailroom(skb) && !urb->num_sgs) { skb->data[skb->len] = 0; __skb_put(skb, 1); - } + } else if (urb->num_sgs) + sg_set_buf(&urb->sg[urb->num_sgs++], + dev->padding_pkt, 1); } } else urb->transfer_flags |= URB_ZERO_PACKET; } + entry->length = urb->transfer_buffer_length = length; spin_lock_irqsave(&dev->txq.lock, flags); retval = usb_autopm_get_interface_async(dev->intf); @@ -1509,6 +1514,7 @@ void usbnet_disconnect (struct usb_interface *intf) usb_kill_urb(dev->interrupt); usb_free_urb(dev->interrupt); + kfree(dev->padding_pkt); free_netdev(net); } @@ -1679,9 +1685,16 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) /* initialize max rx_qlen and tx_qlen */ usbnet_update_max_qlen(dev); + if (dev->can_dma_sg && !(info->flags & FLAG_SEND_ZLP) && + !(info->flags & FLAG_MULTI_PACKET)) { + dev->padding_pkt = kzalloc(1, GFP_KERNEL); + if (!dev->padding_pkt) + goto out4; + } + status = register_netdev (net); if (status) - goto out4; + goto out5; netif_info(dev, probe, dev->net, "register '%s' at usb-%s-%s, %s, %pM\n", udev->dev.driver->name, @@ -1699,6 +1712,8 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) return 0; +out5: + kfree(dev->padding_pkt); out4: usb_free_urb(dev->interrupt); out3: diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index d1292fe746bc..2ef5b6219f3f 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -952,8 +952,7 @@ void vxlan_sock_release(struct vxlan_sock *vs) spin_lock(&vn->sock_lock); hlist_del_rcu(&vs->hlist); - smp_wmb(); - vs->sock->sk->sk_user_data = NULL; + rcu_assign_sk_user_data(vs->sock->sk, NULL); vxlan_notify_del_rx_port(sk); spin_unlock(&vn->sock_lock); @@ -1048,8 +1047,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) port = inet_sk(sk)->inet_sport; - smp_read_barrier_depends(); - vs = (struct vxlan_sock *)sk->sk_user_data; + vs = rcu_dereference_sk_user_data(sk); if (!vs) goto drop; @@ -2302,8 +2300,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port, atomic_set(&vs->refcnt, 1); vs->rcv = rcv; vs->data = data; - smp_wmb(); - vs->sock->sk->sk_user_data = vs; + rcu_assign_sk_user_data(vs->sock->sk, vs); spin_lock(&vn->sock_lock); hlist_add_head_rcu(&vs->hlist, vs_head(net, port)); diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c index 4ee472a5a4e4..ab9e3a8410bc 100644 --- a/drivers/net/wireless/ath/ath9k/recv.c +++ b/drivers/net/wireless/ath/ath9k/recv.c @@ -1269,13 +1269,6 @@ static void ath9k_antenna_check(struct ath_softc *sc, if (!(ah->caps.hw_caps & ATH9K_HW_CAP_ANT_DIV_COMB)) return; - /* - * All MPDUs in an aggregate will use the same LNA - * as the first MPDU. - */ - if (rs->rs_isaggr && !rs->rs_firstaggr) - return; - /* * Change the default rx antenna if rx diversity * chooses the other antenna 3 times in a row. diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 35b515fe3ffa..5ac713d2ff5d 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -399,6 +399,7 @@ static struct ath_buf* ath_clone_txbuf(struct ath_softc *sc, struct ath_buf *bf) tbf->bf_buf_addr = bf->bf_buf_addr; memcpy(tbf->bf_desc, bf->bf_desc, sc->sc_ah->caps.tx_desc_len); tbf->bf_state = bf->bf_state; + tbf->bf_state.stale = false; return tbf; } @@ -1389,11 +1390,15 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid, u16 *ssn) { struct ath_atx_tid *txtid; + struct ath_txq *txq; struct ath_node *an; u8 density; an = (struct ath_node *)sta->drv_priv; txtid = ATH_AN_2_TID(an, tid); + txq = txtid->ac->txq; + + ath_txq_lock(sc, txq); /* update ampdu factor/density, they may have changed. This may happen * in HT IBSS when a beacon with HT-info is received after the station @@ -1417,6 +1422,8 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta, memset(txtid->tx_buf, 0, sizeof(txtid->tx_buf)); txtid->baw_head = txtid->baw_tail = 0; + ath_txq_unlock_complete(sc, txq); + return 0; } @@ -1555,8 +1562,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw, __skb_unlink(bf->bf_mpdu, tid_q); list_add_tail(&bf->list, &bf_q); ath_set_rates(tid->an->vif, tid->an->sta, bf); - ath_tx_addto_baw(sc, tid, bf); - bf->bf_state.bf_type &= ~BUF_AGGR; + if (bf_isampdu(bf)) { + ath_tx_addto_baw(sc, tid, bf); + bf->bf_state.bf_type &= ~BUF_AGGR; + } if (bf_tail) bf_tail->bf_next = bf; @@ -1950,7 +1959,9 @@ static void ath_tx_txqaddbuf(struct ath_softc *sc, struct ath_txq *txq, if (bf_is_ampdu_not_probing(bf)) txq->axq_ampdu_depth++; - bf = bf->bf_lastbf->bf_next; + bf_last = bf->bf_lastbf; + bf = bf_last->bf_next; + bf_last->bf_next = NULL; } } } diff --git a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh_sdmmc.c b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh_sdmmc.c index 64f4a2bc8dde..c3462b75bd08 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh_sdmmc.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh_sdmmc.c @@ -464,8 +464,6 @@ static struct sdio_driver brcmf_sdmmc_driver = { static int brcmf_sdio_pd_probe(struct platform_device *pdev) { - int ret; - brcmf_dbg(SDIO, "Enter\n"); brcmfmac_sdio_pdata = pdev->dev.platform_data; @@ -473,11 +471,7 @@ static int brcmf_sdio_pd_probe(struct platform_device *pdev) if (brcmfmac_sdio_pdata->power_on) brcmfmac_sdio_pdata->power_on(); - ret = sdio_register_driver(&brcmf_sdmmc_driver); - if (ret) - brcmf_err("sdio_register_driver failed: %d\n", ret); - - return ret; + return 0; } static int brcmf_sdio_pd_remove(struct platform_device *pdev) @@ -500,6 +494,15 @@ static struct platform_driver brcmf_sdio_pd = { } }; +void brcmf_sdio_register(void) +{ + int ret; + + ret = sdio_register_driver(&brcmf_sdmmc_driver); + if (ret) + brcmf_err("sdio_register_driver failed: %d\n", ret); +} + void brcmf_sdio_exit(void) { brcmf_dbg(SDIO, "Enter\n"); @@ -510,18 +513,13 @@ void brcmf_sdio_exit(void) sdio_unregister_driver(&brcmf_sdmmc_driver); } -void brcmf_sdio_init(void) +void __init brcmf_sdio_init(void) { int ret; brcmf_dbg(SDIO, "Enter\n"); ret = platform_driver_probe(&brcmf_sdio_pd, brcmf_sdio_pd_probe); - if (ret == -ENODEV) { - brcmf_dbg(SDIO, "No platform data available, registering without.\n"); - ret = sdio_register_driver(&brcmf_sdmmc_driver); - } - - if (ret) - brcmf_err("driver registration failed: %d\n", ret); + if (ret == -ENODEV) + brcmf_dbg(SDIO, "No platform data available.\n"); } diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h b/drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h index f7c1985844e4..74156f84180c 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h @@ -156,10 +156,11 @@ extern int brcmf_bus_start(struct device *dev); #ifdef CONFIG_BRCMFMAC_SDIO extern void brcmf_sdio_exit(void); extern void brcmf_sdio_init(void); +extern void brcmf_sdio_register(void); #endif #ifdef CONFIG_BRCMFMAC_USB extern void brcmf_usb_exit(void); -extern void brcmf_usb_init(void); +extern void brcmf_usb_register(void); #endif #endif /* _BRCMF_BUS_H_ */ diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_linux.c b/drivers/net/wireless/brcm80211/brcmfmac/dhd_linux.c index e067aec1fbf1..40e7f854e10f 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_linux.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_linux.c @@ -1231,21 +1231,23 @@ u32 brcmf_get_chip_info(struct brcmf_if *ifp) return bus->chip << 4 | bus->chiprev; } -static void brcmf_driver_init(struct work_struct *work) +static void brcmf_driver_register(struct work_struct *work) { - brcmf_debugfs_init(); - #ifdef CONFIG_BRCMFMAC_SDIO - brcmf_sdio_init(); + brcmf_sdio_register(); #endif #ifdef CONFIG_BRCMFMAC_USB - brcmf_usb_init(); + brcmf_usb_register(); #endif } -static DECLARE_WORK(brcmf_driver_work, brcmf_driver_init); +static DECLARE_WORK(brcmf_driver_work, brcmf_driver_register); static int __init brcmfmac_module_init(void) { + brcmf_debugfs_init(); +#ifdef CONFIG_BRCMFMAC_SDIO + brcmf_sdio_init(); +#endif if (!schedule_work(&brcmf_driver_work)) return -EBUSY; diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/brcm80211/brcmfmac/usb.c index 39e01a7c8556..f4aea47e0730 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c @@ -1539,7 +1539,7 @@ void brcmf_usb_exit(void) brcmf_release_fw(&fw_image_list); } -void brcmf_usb_init(void) +void brcmf_usb_register(void) { brcmf_dbg(USB, "Enter\n"); INIT_LIST_HEAD(&fw_image_list); diff --git a/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c b/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c index 3a6544710c8a..edc5d105ff98 100644 --- a/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c +++ b/drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c @@ -457,6 +457,8 @@ static int brcms_ops_start(struct ieee80211_hw *hw) if (err != 0) brcms_err(wl->wlc->hw->d11core, "%s: brcms_up() returned %d\n", __func__, err); + + bcma_core_pci_power_save(wl->wlc->hw->d11core->bus, true); return err; } @@ -479,6 +481,8 @@ static void brcms_ops_stop(struct ieee80211_hw *hw) return; } + bcma_core_pci_power_save(wl->wlc->hw->d11core->bus, false); + /* put driver in down state */ spin_lock_bh(&wl->lock); brcms_down(wl); diff --git a/drivers/net/wireless/cw1200/cw1200_spi.c b/drivers/net/wireless/cw1200/cw1200_spi.c index f5e6b489ed32..899cad34ccd3 100644 --- a/drivers/net/wireless/cw1200/cw1200_spi.c +++ b/drivers/net/wireless/cw1200/cw1200_spi.c @@ -42,7 +42,6 @@ struct hwbus_priv { spinlock_t lock; /* Serialize all bus operations */ wait_queue_head_t wq; int claimed; - int irq_disabled; }; #define SDIO_TO_SPI_ADDR(addr) ((addr & 0x1f)>>2) @@ -238,8 +237,6 @@ static irqreturn_t cw1200_spi_irq_handler(int irq, void *dev_id) struct hwbus_priv *self = dev_id; if (self->core) { - disable_irq_nosync(self->func->irq); - self->irq_disabled = 1; cw1200_irq_handler(self->core); return IRQ_HANDLED; } else { @@ -253,9 +250,10 @@ static int cw1200_spi_irq_subscribe(struct hwbus_priv *self) pr_debug("SW IRQ subscribe\n"); - ret = request_any_context_irq(self->func->irq, cw1200_spi_irq_handler, - IRQF_TRIGGER_HIGH, - "cw1200_wlan_irq", self); + ret = request_threaded_irq(self->func->irq, NULL, + cw1200_spi_irq_handler, + IRQF_TRIGGER_HIGH | IRQF_ONESHOT, + "cw1200_wlan_irq", self); if (WARN_ON(ret < 0)) goto exit; @@ -273,22 +271,13 @@ exit: static int cw1200_spi_irq_unsubscribe(struct hwbus_priv *self) { + int ret = 0; + pr_debug("SW IRQ unsubscribe\n"); disable_irq_wake(self->func->irq); free_irq(self->func->irq, self); - return 0; -} - -static int cw1200_spi_irq_enable(struct hwbus_priv *self, int enable) -{ - /* Disables are handled by the interrupt handler */ - if (enable && self->irq_disabled) { - enable_irq(self->func->irq); - self->irq_disabled = 0; - } - - return 0; + return ret; } static int cw1200_spi_off(const struct cw1200_platform_data_spi *pdata) @@ -368,7 +357,6 @@ static struct hwbus_ops cw1200_spi_hwbus_ops = { .unlock = cw1200_spi_unlock, .align_size = cw1200_spi_align_size, .power_mgmt = cw1200_spi_pm, - .irq_enable = cw1200_spi_irq_enable, }; /* Probe Function to be called by SPI stack when device is discovered */ diff --git a/drivers/net/wireless/cw1200/fwio.c b/drivers/net/wireless/cw1200/fwio.c index 0b2061bbc68b..acdff0f7f952 100644 --- a/drivers/net/wireless/cw1200/fwio.c +++ b/drivers/net/wireless/cw1200/fwio.c @@ -485,7 +485,7 @@ int cw1200_load_firmware(struct cw1200_common *priv) /* Enable interrupt signalling */ priv->hwbus_ops->lock(priv->hwbus_priv); - ret = __cw1200_irq_enable(priv, 2); + ret = __cw1200_irq_enable(priv, 1); priv->hwbus_ops->unlock(priv->hwbus_priv); if (ret < 0) goto unsubscribe; diff --git a/drivers/net/wireless/cw1200/hwbus.h b/drivers/net/wireless/cw1200/hwbus.h index 51dfb3a90735..8b2fc831c3de 100644 --- a/drivers/net/wireless/cw1200/hwbus.h +++ b/drivers/net/wireless/cw1200/hwbus.h @@ -28,7 +28,6 @@ struct hwbus_ops { void (*unlock)(struct hwbus_priv *self); size_t (*align_size)(struct hwbus_priv *self, size_t size); int (*power_mgmt)(struct hwbus_priv *self, bool suspend); - int (*irq_enable)(struct hwbus_priv *self, int enable); }; #endif /* CW1200_HWBUS_H */ diff --git a/drivers/net/wireless/cw1200/hwio.c b/drivers/net/wireless/cw1200/hwio.c index 41bd7615ccaa..ff230b7aeedd 100644 --- a/drivers/net/wireless/cw1200/hwio.c +++ b/drivers/net/wireless/cw1200/hwio.c @@ -273,21 +273,6 @@ int __cw1200_irq_enable(struct cw1200_common *priv, int enable) u16 val16; int ret; - /* We need to do this hack because the SPI layer can sleep on I/O - and the general path involves I/O to the device in interrupt - context. - - However, the initial enable call needs to go to the hardware. - - We don't worry about shutdown because we do a full reset which - clears the interrupt enabled bits. - */ - if (priv->hwbus_ops->irq_enable) { - ret = priv->hwbus_ops->irq_enable(priv->hwbus_priv, enable); - if (ret || enable < 2) - return ret; - } - if (HIF_8601_SILICON == priv->hw_type) { ret = __cw1200_reg_read_32(priv, ST90TDS_CONFIG_REG_ID, &val32); if (ret < 0) { diff --git a/drivers/net/wireless/mwifiex/11n_aggr.c b/drivers/net/wireless/mwifiex/11n_aggr.c index 21c688264708..1214c587fd08 100644 --- a/drivers/net/wireless/mwifiex/11n_aggr.c +++ b/drivers/net/wireless/mwifiex/11n_aggr.c @@ -150,7 +150,7 @@ mwifiex_11n_form_amsdu_txpd(struct mwifiex_private *priv, */ int mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv, - struct mwifiex_ra_list_tbl *pra_list, int headroom, + struct mwifiex_ra_list_tbl *pra_list, int ptrindex, unsigned long ra_list_flags) __releases(&priv->wmm.ra_list_spinlock) { @@ -160,6 +160,7 @@ mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv, int pad = 0, ret; struct mwifiex_tx_param tx_param; struct txpd *ptx_pd = NULL; + int headroom = adapter->iface_type == MWIFIEX_USB ? 0 : INTF_HEADER_LEN; skb_src = skb_peek(&pra_list->skb_head); if (!skb_src) { diff --git a/drivers/net/wireless/mwifiex/11n_aggr.h b/drivers/net/wireless/mwifiex/11n_aggr.h index 900e1c62a0cc..892098d6a696 100644 --- a/drivers/net/wireless/mwifiex/11n_aggr.h +++ b/drivers/net/wireless/mwifiex/11n_aggr.h @@ -26,7 +26,7 @@ int mwifiex_11n_deaggregate_pkt(struct mwifiex_private *priv, struct sk_buff *skb); int mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv, - struct mwifiex_ra_list_tbl *ptr, int headroom, + struct mwifiex_ra_list_tbl *ptr, int ptr_index, unsigned long flags) __releases(&priv->wmm.ra_list_spinlock); diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c index 2d761477d15e..a6c46f3b6e3a 100644 --- a/drivers/net/wireless/mwifiex/cmdevt.c +++ b/drivers/net/wireless/mwifiex/cmdevt.c @@ -1155,7 +1155,7 @@ int mwifiex_ret_802_11_hs_cfg(struct mwifiex_private *priv, uint32_t conditions = le32_to_cpu(phs_cfg->params.hs_config.conditions); if (phs_cfg->action == cpu_to_le16(HS_ACTIVATE) && - adapter->iface_type == MWIFIEX_SDIO) { + adapter->iface_type != MWIFIEX_USB) { mwifiex_hs_activated_event(priv, true); return 0; } else { @@ -1167,8 +1167,7 @@ int mwifiex_ret_802_11_hs_cfg(struct mwifiex_private *priv, } if (conditions != HS_CFG_CANCEL) { adapter->is_hs_configured = true; - if (adapter->iface_type == MWIFIEX_USB || - adapter->iface_type == MWIFIEX_PCIE) + if (adapter->iface_type == MWIFIEX_USB) mwifiex_hs_activated_event(priv, true); } else { adapter->is_hs_configured = false; diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c index 2472d4b7f00e..1c70b8d09227 100644 --- a/drivers/net/wireless/mwifiex/usb.c +++ b/drivers/net/wireless/mwifiex/usb.c @@ -447,9 +447,6 @@ static int mwifiex_usb_suspend(struct usb_interface *intf, pm_message_t message) */ adapter->is_suspended = true; - for (i = 0; i < adapter->priv_num; i++) - netif_carrier_off(adapter->priv[i]->netdev); - if (atomic_read(&card->rx_cmd_urb_pending) && card->rx_cmd.urb) usb_kill_urb(card->rx_cmd.urb); @@ -509,10 +506,6 @@ static int mwifiex_usb_resume(struct usb_interface *intf) MWIFIEX_RX_CMD_BUF_SIZE); } - for (i = 0; i < adapter->priv_num; i++) - if (adapter->priv[i]->media_connected) - netif_carrier_on(adapter->priv[i]->netdev); - /* Disable Host Sleep */ if (adapter->hs_activated) mwifiex_cancel_hs(mwifiex_get_priv(adapter, diff --git a/drivers/net/wireless/mwifiex/wmm.c b/drivers/net/wireless/mwifiex/wmm.c index 2e8f9cdea54d..95fa3599b407 100644 --- a/drivers/net/wireless/mwifiex/wmm.c +++ b/drivers/net/wireless/mwifiex/wmm.c @@ -1239,8 +1239,7 @@ mwifiex_dequeue_tx_packet(struct mwifiex_adapter *adapter) if (enable_tx_amsdu && mwifiex_is_amsdu_allowed(priv, tid) && mwifiex_is_11n_aggragation_possible(priv, ptr, adapter->tx_buf_size)) - mwifiex_11n_aggregate_pkt(priv, ptr, INTF_HEADER_LEN, - ptr_index, flags); + mwifiex_11n_aggregate_pkt(priv, ptr, ptr_index, flags); /* ra_list_spinlock has been freed in mwifiex_11n_aggregate_pkt() */ else diff --git a/drivers/net/wireless/p54/p54usb.c b/drivers/net/wireless/p54/p54usb.c index b9deef66cf4b..e328d3058c41 100644 --- a/drivers/net/wireless/p54/p54usb.c +++ b/drivers/net/wireless/p54/p54usb.c @@ -83,6 +83,7 @@ static struct usb_device_id p54u_table[] = { {USB_DEVICE(0x06a9, 0x000e)}, /* Westell 802.11g USB (A90-211WG-01) */ {USB_DEVICE(0x06b9, 0x0121)}, /* Thomson SpeedTouch 121g */ {USB_DEVICE(0x0707, 0xee13)}, /* SMC 2862W-G version 2 */ + {USB_DEVICE(0x07aa, 0x0020)}, /* Corega WLUSB2GTST USB */ {USB_DEVICE(0x0803, 0x4310)}, /* Zoom 4410a */ {USB_DEVICE(0x083a, 0x4521)}, /* Siemens Gigaset USB Adapter 54 version 2 */ {USB_DEVICE(0x083a, 0x4531)}, /* T-Com Sinus 154 data II */ @@ -979,6 +980,7 @@ static int p54u_load_firmware(struct ieee80211_hw *dev, if (err) { dev_err(&priv->udev->dev, "(p54usb) cannot load firmware %s " "(%d)!\n", p54u_fwlist[i].fw, err); + usb_put_dev(udev); } return err; diff --git a/drivers/net/wireless/rtlwifi/wifi.h b/drivers/net/wireless/rtlwifi/wifi.h index cc03e7c87cbe..703258742d28 100644 --- a/drivers/net/wireless/rtlwifi/wifi.h +++ b/drivers/net/wireless/rtlwifi/wifi.h @@ -2057,7 +2057,7 @@ struct rtl_priv { that it points to the data allocated beyond this structure like: rtl_pci_priv or rtl_usb_priv */ - u8 priv[0]; + u8 priv[0] __aligned(sizeof(void *)); }; #define rtl_priv(hw) (((struct rtl_priv *)(hw)->priv)) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index a53782ef1540..b45bce20ad76 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -24,6 +24,12 @@ struct backend_info { struct xenbus_device *dev; struct xenvif *vif; + + /* This is the state that will be reflected in xenstore when any + * active hotplug script completes. + */ + enum xenbus_state state; + enum xenbus_state frontend_state; struct xenbus_watch hotplug_status_watch; u8 have_hotplug_status_watch:1; @@ -136,6 +142,8 @@ static int netback_probe(struct xenbus_device *dev, if (err) goto fail; + be->state = XenbusStateInitWait; + /* This kicks hotplug scripts, so do it immediately. */ backend_create_xenvif(be); @@ -208,24 +216,113 @@ static void backend_create_xenvif(struct backend_info *be) kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE); } - -static void disconnect_backend(struct xenbus_device *dev) +static void backend_disconnect(struct backend_info *be) { - struct backend_info *be = dev_get_drvdata(&dev->dev); - if (be->vif) xenvif_disconnect(be->vif); } -static void destroy_backend(struct xenbus_device *dev) +static void backend_connect(struct backend_info *be) { - struct backend_info *be = dev_get_drvdata(&dev->dev); + if (be->vif) + connect(be); +} - if (be->vif) { - kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE); - xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status"); - xenvif_free(be->vif); - be->vif = NULL; +static inline void backend_switch_state(struct backend_info *be, + enum xenbus_state state) +{ + struct xenbus_device *dev = be->dev; + + pr_debug("%s -> %s\n", dev->nodename, xenbus_strstate(state)); + be->state = state; + + /* If we are waiting for a hotplug script then defer the + * actual xenbus state change. + */ + if (!be->have_hotplug_status_watch) + xenbus_switch_state(dev, state); +} + +/* Handle backend state transitions: + * + * The backend state starts in InitWait and the following transitions are + * allowed. + * + * InitWait -> Connected + * + * ^ \ | + * | \ | + * | \ | + * | \ | + * | \ | + * | \ | + * | V V + * + * Closed <-> Closing + * + * The state argument specifies the eventual state of the backend and the + * function transitions to that state via the shortest path. + */ +static void set_backend_state(struct backend_info *be, + enum xenbus_state state) +{ + while (be->state != state) { + switch (be->state) { + case XenbusStateClosed: + switch (state) { + case XenbusStateInitWait: + case XenbusStateConnected: + pr_info("%s: prepare for reconnect\n", + be->dev->nodename); + backend_switch_state(be, XenbusStateInitWait); + break; + case XenbusStateClosing: + backend_switch_state(be, XenbusStateClosing); + break; + default: + BUG(); + } + break; + case XenbusStateInitWait: + switch (state) { + case XenbusStateConnected: + backend_connect(be); + backend_switch_state(be, XenbusStateConnected); + break; + case XenbusStateClosing: + case XenbusStateClosed: + backend_switch_state(be, XenbusStateClosing); + break; + default: + BUG(); + } + break; + case XenbusStateConnected: + switch (state) { + case XenbusStateInitWait: + case XenbusStateClosing: + case XenbusStateClosed: + backend_disconnect(be); + backend_switch_state(be, XenbusStateClosing); + break; + default: + BUG(); + } + break; + case XenbusStateClosing: + switch (state) { + case XenbusStateInitWait: + case XenbusStateConnected: + case XenbusStateClosed: + backend_switch_state(be, XenbusStateClosed); + break; + default: + BUG(); + } + break; + default: + BUG(); + } } } @@ -237,40 +334,33 @@ static void frontend_changed(struct xenbus_device *dev, { struct backend_info *be = dev_get_drvdata(&dev->dev); - pr_debug("frontend state %s\n", xenbus_strstate(frontend_state)); + pr_debug("%s -> %s\n", dev->otherend, xenbus_strstate(frontend_state)); be->frontend_state = frontend_state; switch (frontend_state) { case XenbusStateInitialising: - if (dev->state == XenbusStateClosed) { - pr_info("%s: prepare for reconnect\n", dev->nodename); - xenbus_switch_state(dev, XenbusStateInitWait); - } + set_backend_state(be, XenbusStateInitWait); break; case XenbusStateInitialised: break; case XenbusStateConnected: - if (dev->state == XenbusStateConnected) - break; - if (be->vif) - connect(be); + set_backend_state(be, XenbusStateConnected); break; case XenbusStateClosing: - disconnect_backend(dev); - xenbus_switch_state(dev, XenbusStateClosing); + set_backend_state(be, XenbusStateClosing); break; case XenbusStateClosed: - xenbus_switch_state(dev, XenbusStateClosed); + set_backend_state(be, XenbusStateClosed); if (xenbus_dev_is_online(dev)) break; - destroy_backend(dev); /* fall through if not online */ case XenbusStateUnknown: + set_backend_state(be, XenbusStateClosed); device_unregister(&dev->dev); break; @@ -363,7 +453,9 @@ static void hotplug_status_changed(struct xenbus_watch *watch, if (IS_ERR(str)) return; if (len == sizeof("connected")-1 && !memcmp(str, "connected", len)) { - xenbus_switch_state(be->dev, XenbusStateConnected); + /* Complete any pending state change */ + xenbus_switch_state(be->dev, be->state); + /* Not interested in this watch anymore. */ unregister_hotplug_status_watch(be); } @@ -393,12 +485,8 @@ static void connect(struct backend_info *be) err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, hotplug_status_changed, "%s/%s", dev->nodename, "hotplug-status"); - if (err) { - /* Switch now, since we can't do a watch. */ - xenbus_switch_state(dev, XenbusStateConnected); - } else { + if (!err) be->have_hotplug_status_watch = 1; - } netif_wake_queue(be->vif->dev); } diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index 9d2009a9004d..78cc76053328 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -74,10 +74,4 @@ config OF_MTD depends on MTD def_bool y -config OF_RESERVED_MEM - depends on OF_FLATTREE && (DMA_CMA || (HAVE_GENERIC_DMA_COHERENT && HAVE_MEMBLOCK)) - def_bool y - help - Initialization code for DMA reserved memory - endmenu # OF diff --git a/drivers/of/Makefile b/drivers/of/Makefile index ed9660adad77..efd05102c405 100644 --- a/drivers/of/Makefile +++ b/drivers/of/Makefile @@ -9,4 +9,3 @@ obj-$(CONFIG_OF_MDIO) += of_mdio.o obj-$(CONFIG_OF_PCI) += of_pci.o obj-$(CONFIG_OF_PCI_IRQ) += of_pci_irq.o obj-$(CONFIG_OF_MTD) += of_mtd.o -obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o diff --git a/drivers/of/base.c b/drivers/of/base.c index 865d3f66c86b..7d4c70f859e3 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -303,10 +303,8 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread) struct device_node *cpun, *cpus; cpus = of_find_node_by_path("/cpus"); - if (!cpus) { - pr_warn("Missing cpus node, bailing out\n"); + if (!cpus) return NULL; - } for_each_child_of_node(cpus, cpun) { if (of_node_cmp(cpun->type, "cpu")) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 229dd9d69e18..a4fa9ad31b8f 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -18,7 +18,6 @@ #include #include #include -#include #include /* for COMMAND_LINE_SIZE */ #ifdef CONFIG_PPC @@ -803,14 +802,3 @@ void __init unflatten_device_tree(void) } #endif /* CONFIG_OF_EARLY_FLATTREE */ - -/* Feed entire flattened device tree into the random pool */ -static int __init add_fdt_randomness(void) -{ - if (initial_boot_params) - add_device_randomness(initial_boot_params, - be32_to_cpu(initial_boot_params->totalsize)); - - return 0; -} -core_initcall(add_fdt_randomness); diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c deleted file mode 100644 index 0fe40c7d6904..000000000000 --- a/drivers/of/of_reserved_mem.c +++ /dev/null @@ -1,173 +0,0 @@ -/* - * Device tree based initialization code for reserved memory. - * - * Copyright (c) 2013 Samsung Electronics Co., Ltd. - * http://www.samsung.com - * Author: Marek Szyprowski - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of the - * License or (at your optional) any later version of the license. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define MAX_RESERVED_REGIONS 16 -struct reserved_mem { - phys_addr_t base; - unsigned long size; - struct cma *cma; - char name[32]; -}; -static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS]; -static int reserved_mem_count; - -static int __init fdt_scan_reserved_mem(unsigned long node, const char *uname, - int depth, void *data) -{ - struct reserved_mem *rmem = &reserved_mem[reserved_mem_count]; - phys_addr_t base, size; - int is_cma, is_reserved; - unsigned long len; - const char *status; - __be32 *prop; - - is_cma = IS_ENABLED(CONFIG_DMA_CMA) && - of_flat_dt_is_compatible(node, "linux,contiguous-memory-region"); - is_reserved = of_flat_dt_is_compatible(node, "reserved-memory-region"); - - if (!is_reserved && !is_cma) { - /* ignore node and scan next one */ - return 0; - } - - status = of_get_flat_dt_prop(node, "status", &len); - if (status && strcmp(status, "okay") != 0) { - /* ignore disabled node nad scan next one */ - return 0; - } - - prop = of_get_flat_dt_prop(node, "reg", &len); - if (!prop || (len < (dt_root_size_cells + dt_root_addr_cells) * - sizeof(__be32))) { - pr_err("Reserved mem: node %s, incorrect \"reg\" property\n", - uname); - /* ignore node and scan next one */ - return 0; - } - base = dt_mem_next_cell(dt_root_addr_cells, &prop); - size = dt_mem_next_cell(dt_root_size_cells, &prop); - - if (!size) { - /* ignore node and scan next one */ - return 0; - } - - pr_info("Reserved mem: found %s, memory base %lx, size %ld MiB\n", - uname, (unsigned long)base, (unsigned long)size / SZ_1M); - - if (reserved_mem_count == ARRAY_SIZE(reserved_mem)) - return -ENOSPC; - - rmem->base = base; - rmem->size = size; - strlcpy(rmem->name, uname, sizeof(rmem->name)); - - if (is_cma) { - struct cma *cma; - if (dma_contiguous_reserve_area(size, base, 0, &cma) == 0) { - rmem->cma = cma; - reserved_mem_count++; - if (of_get_flat_dt_prop(node, - "linux,default-contiguous-region", - NULL)) - dma_contiguous_set_default(cma); - } - } else if (is_reserved) { - if (memblock_remove(base, size) == 0) - reserved_mem_count++; - else - pr_err("Failed to reserve memory for %s\n", uname); - } - - return 0; -} - -static struct reserved_mem *get_dma_memory_region(struct device *dev) -{ - struct device_node *node; - const char *name; - int i; - - node = of_parse_phandle(dev->of_node, "memory-region", 0); - if (!node) - return NULL; - - name = kbasename(node->full_name); - for (i = 0; i < reserved_mem_count; i++) - if (strcmp(name, reserved_mem[i].name) == 0) - return &reserved_mem[i]; - return NULL; -} - -/** - * of_reserved_mem_device_init() - assign reserved memory region to given device - * - * This function assign memory region pointed by "memory-region" device tree - * property to the given device. - */ -void of_reserved_mem_device_init(struct device *dev) -{ - struct reserved_mem *region = get_dma_memory_region(dev); - if (!region) - return; - - if (region->cma) { - dev_set_cma_area(dev, region->cma); - pr_info("Assigned CMA %s to %s device\n", region->name, - dev_name(dev)); - } else { - if (dma_declare_coherent_memory(dev, region->base, region->base, - region->size, DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE) != 0) - pr_info("Declared reserved memory %s to %s device\n", - region->name, dev_name(dev)); - } -} - -/** - * of_reserved_mem_device_release() - release reserved memory device structures - * - * This function releases structures allocated for memory region handling for - * the given device. - */ -void of_reserved_mem_device_release(struct device *dev) -{ - struct reserved_mem *region = get_dma_memory_region(dev); - if (!region && !region->cma) - dma_release_declared_memory(dev); -} - -/** - * early_init_dt_scan_reserved_mem() - create reserved memory regions - * - * This function grabs memory from early allocator for device exclusive use - * defined in device tree structures. It should be called by arch specific code - * once the early allocator (memblock) has been activated and all other - * subsystems have already allocated/reserved memory. - */ -void __init early_init_dt_scan_reserved_mem(void) -{ - of_scan_flat_dt_by_path("/memory/reserved-memory", - fdt_scan_reserved_mem, NULL); -} diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 9b439ac63d8e..f6dcde220821 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -21,7 +21,6 @@ #include #include #include -#include #include const struct of_device_id of_default_bus_match_table[] = { @@ -219,8 +218,6 @@ static struct platform_device *of_platform_device_create_pdata( dev->dev.bus = &platform_bus_type; dev->dev.platform_data = platform_data; - of_reserved_mem_device_init(&dev->dev); - /* We do not fill the DMA ops for platform devices by default. * This is currently the responsibility of the platform code * to do such, possibly using a device notifier @@ -228,7 +225,6 @@ static struct platform_device *of_platform_device_create_pdata( if (of_device_add(dev) != 0) { platform_device_put(dev); - of_reserved_mem_device_release(&dev->dev); return NULL; } diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 0b7d23b4ad95..be12fbfcae10 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -994,14 +994,16 @@ void acpiphp_enumerate_slots(struct pci_bus *bus) /* * This bridge should have been registered as a hotplug function - * under its parent, so the context has to be there. If not, we - * are in deep goo. + * under its parent, so the context should be there, unless the + * parent is going to be handled by pciehp, in which case this + * bridge is not interesting to us either. */ mutex_lock(&acpiphp_context_lock); context = acpiphp_get_context(handle); - if (WARN_ON(!context)) { + if (!context) { mutex_unlock(&acpiphp_context_lock); put_device(&bus->dev); + pci_dev_put(bridge->pci_dev); kfree(bridge); return; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e8ccf6c0f08a..bdd64b1b4817 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1155,8 +1155,14 @@ static void pci_enable_bridge(struct pci_dev *dev) pci_enable_bridge(dev->bus->self); - if (pci_is_enabled(dev)) + if (pci_is_enabled(dev)) { + if (!dev->is_busmaster) { + dev_warn(&dev->dev, "driver skip pci_set_master, fix it!\n"); + pci_set_master(dev); + } return; + } + retval = pci_enable_device(dev); if (retval) dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", diff --git a/drivers/pinctrl/pinconf.c b/drivers/pinctrl/pinconf.c index a138965c01cb..b8fcc38c0d11 100644 --- a/drivers/pinctrl/pinconf.c +++ b/drivers/pinctrl/pinconf.c @@ -490,7 +490,7 @@ exit: * are values that should match the pinctrl-maps * reflects the new config and is driver dependant */ -static int pinconf_dbg_config_write(struct file *file, +static ssize_t pinconf_dbg_config_write(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) { struct pinctrl_maps *maps_node; @@ -508,7 +508,7 @@ static int pinconf_dbg_config_write(struct file *file, int i; /* Get userspace string and assure termination */ - buf_size = min(count, (size_t)(sizeof(buf)-1)); + buf_size = min(count, sizeof(buf) - 1); if (copy_from_user(buf, user_buf, buf_size)) return -EFAULT; buf[buf_size] = 0; diff --git a/drivers/pinctrl/pinctrl-exynos.c b/drivers/pinctrl/pinctrl-exynos.c index 2689f8d01a1e..155b1b3a0e7a 100644 --- a/drivers/pinctrl/pinctrl-exynos.c +++ b/drivers/pinctrl/pinctrl-exynos.c @@ -663,18 +663,18 @@ static void exynos_pinctrl_resume(struct samsung_pinctrl_drv_data *drvdata) /* pin banks of s5pv210 pin-controller */ static struct samsung_pin_bank s5pv210_pin_bank[] = { EXYNOS_PIN_BANK_EINTG(8, 0x000, "gpa0", 0x00), - EXYNOS_PIN_BANK_EINTG(6, 0x020, "gpa1", 0x04), + EXYNOS_PIN_BANK_EINTG(4, 0x020, "gpa1", 0x04), EXYNOS_PIN_BANK_EINTG(8, 0x040, "gpb", 0x08), EXYNOS_PIN_BANK_EINTG(5, 0x060, "gpc0", 0x0c), EXYNOS_PIN_BANK_EINTG(5, 0x080, "gpc1", 0x10), EXYNOS_PIN_BANK_EINTG(4, 0x0a0, "gpd0", 0x14), - EXYNOS_PIN_BANK_EINTG(4, 0x0c0, "gpd1", 0x18), - EXYNOS_PIN_BANK_EINTG(5, 0x0e0, "gpe0", 0x1c), - EXYNOS_PIN_BANK_EINTG(8, 0x100, "gpe1", 0x20), - EXYNOS_PIN_BANK_EINTG(6, 0x120, "gpf0", 0x24), + EXYNOS_PIN_BANK_EINTG(6, 0x0c0, "gpd1", 0x18), + EXYNOS_PIN_BANK_EINTG(8, 0x0e0, "gpe0", 0x1c), + EXYNOS_PIN_BANK_EINTG(5, 0x100, "gpe1", 0x20), + EXYNOS_PIN_BANK_EINTG(8, 0x120, "gpf0", 0x24), EXYNOS_PIN_BANK_EINTG(8, 0x140, "gpf1", 0x28), EXYNOS_PIN_BANK_EINTG(8, 0x160, "gpf2", 0x2c), - EXYNOS_PIN_BANK_EINTG(8, 0x180, "gpf3", 0x30), + EXYNOS_PIN_BANK_EINTG(6, 0x180, "gpf3", 0x30), EXYNOS_PIN_BANK_EINTG(7, 0x1a0, "gpg0", 0x34), EXYNOS_PIN_BANK_EINTG(7, 0x1c0, "gpg1", 0x38), EXYNOS_PIN_BANK_EINTG(7, 0x1e0, "gpg2", 0x3c), diff --git a/drivers/pinctrl/pinctrl-palmas.c b/drivers/pinctrl/pinctrl-palmas.c index 82638fac3cfa..30c4d356cb33 100644 --- a/drivers/pinctrl/pinctrl-palmas.c +++ b/drivers/pinctrl/pinctrl-palmas.c @@ -891,9 +891,10 @@ static int palmas_pinconf_set(struct pinctrl_dev *pctldev, param = pinconf_to_config_param(configs[i]); param_val = pinconf_to_config_argument(configs[i]); + if (param == PIN_CONFIG_BIAS_PULL_PIN_DEFAULT) + continue; + switch (param) { - case PIN_CONFIG_BIAS_PULL_PIN_DEFAULT: - return 0; case PIN_CONFIG_BIAS_DISABLE: case PIN_CONFIG_BIAS_PULL_UP: case PIN_CONFIG_BIAS_PULL_DOWN: diff --git a/drivers/pinctrl/pinctrl-tegra114.c b/drivers/pinctrl/pinctrl-tegra114.c index 622c4854977e..93c9e3899d5e 100644 --- a/drivers/pinctrl/pinctrl-tegra114.c +++ b/drivers/pinctrl/pinctrl-tegra114.c @@ -3,7 +3,7 @@ * * Copyright (c) 2012-2013, NVIDIA CORPORATION. All rights reserved. * - * Arthur: Pritesh Raithatha + * Author: Pritesh Raithatha * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -2763,7 +2763,6 @@ static struct platform_driver tegra114_pinctrl_driver = { }; module_platform_driver(tegra114_pinctrl_driver); -MODULE_ALIAS("platform:tegra114-pinctrl"); MODULE_AUTHOR("Pritesh Raithatha "); -MODULE_DESCRIPTION("NVIDIA Tegra114 pincontrol driver"); +MODULE_DESCRIPTION("NVIDIA Tegra114 pinctrl driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c index 9215ed72bece..d654f831410d 100644 --- a/drivers/platform/x86/intel_scu_ipc.c +++ b/drivers/platform/x86/intel_scu_ipc.c @@ -25,7 +25,7 @@ #include #include #include -#include +#include #include /* IPC defines the following message types */ @@ -579,7 +579,7 @@ static struct pci_driver ipc_driver = { static int __init intel_scu_ipc_init(void) { - platform = mrst_identify_cpu(); + platform = intel_mid_identify_cpu(); if (platform == 0) return -ENODEV; return pci_register_driver(&ipc_driver); diff --git a/drivers/regulator/da9063-regulator.c b/drivers/regulator/da9063-regulator.c index 1a7816390773..b9f2653e4ef9 100644 --- a/drivers/regulator/da9063-regulator.c +++ b/drivers/regulator/da9063-regulator.c @@ -709,7 +709,7 @@ static struct da9063_regulators_pdata *da9063_parse_regulators_dt( struct of_regulator_match **da9063_reg_matches) { da9063_reg_matches = NULL; - return PTR_ERR(-ENODEV); + return ERR_PTR(-ENODEV); } #endif diff --git a/drivers/regulator/palmas-regulator.c b/drivers/regulator/palmas-regulator.c index 488dfe7ce9a6..7e2b165972e6 100644 --- a/drivers/regulator/palmas-regulator.c +++ b/drivers/regulator/palmas-regulator.c @@ -201,13 +201,7 @@ static unsigned int palmas_smps_ramp_delay[4] = {0, 10000, 5000, 2500}; #define SMPS_CTRL_MODE_ECO 0x02 #define SMPS_CTRL_MODE_PWM 0x03 -/* These values are derived from the data sheet. And are the number of steps - * where there is a voltage change, the ranges at beginning and end of register - * max/min values where there are no change are ommitted. - * - * So they are basically (maxV-minV)/stepV - */ -#define PALMAS_SMPS_NUM_VOLTAGES 117 +#define PALMAS_SMPS_NUM_VOLTAGES 122 #define PALMAS_SMPS10_NUM_VOLTAGES 2 #define PALMAS_LDO_NUM_VOLTAGES 50 @@ -979,6 +973,7 @@ static int palmas_regulators_probe(struct platform_device *pdev) pmic->desc[id].min_uV = 900000; pmic->desc[id].uV_step = 50000; pmic->desc[id].linear_min_sel = 1; + pmic->desc[id].enable_time = 500; pmic->desc[id].vsel_reg = PALMAS_BASE_TO_REG(PALMAS_LDO_BASE, palmas_regs_info[id].vsel_addr); @@ -997,6 +992,11 @@ static int palmas_regulators_probe(struct platform_device *pdev) pmic->desc[id].min_uV = 450000; pmic->desc[id].uV_step = 25000; } + + /* LOD6 in vibrator mode will have enable time 2000us */ + if (pdata && pdata->ldo6_vibrator && + (id == PALMAS_REG_LDO6)) + pmic->desc[id].enable_time = 2000; } else { pmic->desc[id].n_voltages = 1; pmic->desc[id].ops = &palmas_ops_extreg; diff --git a/drivers/regulator/ti-abb-regulator.c b/drivers/regulator/ti-abb-regulator.c index d8e3e1262bc2..20c271d49dcb 100644 --- a/drivers/regulator/ti-abb-regulator.c +++ b/drivers/regulator/ti-abb-regulator.c @@ -279,8 +279,12 @@ static int ti_abb_set_opp(struct regulator_dev *rdev, struct ti_abb *abb, ti_abb_rmw(regs->opp_sel_mask, info->opp_sel, regs->control_reg, abb->base); - /* program LDO VBB vset override if needed */ - if (abb->ldo_base) + /* + * program LDO VBB vset override if needed for !bypass mode + * XXX: Do not switch sequence - for !bypass, LDO override reset *must* + * be performed *before* switch to bias mode else VBB glitches. + */ + if (abb->ldo_base && info->opp_sel != TI_ABB_NOMINAL_OPP) ti_abb_program_ldovbb(dev, abb, info); /* Initiate ABB ldo change */ @@ -295,6 +299,14 @@ static int ti_abb_set_opp(struct regulator_dev *rdev, struct ti_abb *abb, if (ret) goto out; + /* + * Reset LDO VBB vset override bypass mode + * XXX: Do not switch sequence - for bypass, LDO override reset *must* + * be performed *after* switch to bypass else VBB glitches. + */ + if (abb->ldo_base && info->opp_sel == TI_ABB_NOMINAL_OPP) + ti_abb_program_ldovbb(dev, abb, info); + out: return ret; } diff --git a/drivers/regulator/wm831x-ldo.c b/drivers/regulator/wm831x-ldo.c index 1432b26ef2e9..2205fbc2c37b 100644 --- a/drivers/regulator/wm831x-ldo.c +++ b/drivers/regulator/wm831x-ldo.c @@ -63,7 +63,7 @@ static irqreturn_t wm831x_ldo_uv_irq(int irq, void *data) */ static const struct regulator_linear_range wm831x_gp_ldo_ranges[] = { - { .min_uV = 900000, .max_uV = 1650000, .min_sel = 0, .max_sel = 14, + { .min_uV = 900000, .max_uV = 1600000, .min_sel = 0, .max_sel = 14, .uV_step = 50000 }, { .min_uV = 1700000, .max_uV = 3300000, .min_sel = 15, .max_sel = 31, .uV_step = 100000 }, @@ -332,7 +332,7 @@ static struct platform_driver wm831x_gp_ldo_driver = { */ static const struct regulator_linear_range wm831x_aldo_ranges[] = { - { .min_uV = 1000000, .max_uV = 1650000, .min_sel = 0, .max_sel = 12, + { .min_uV = 1000000, .max_uV = 1600000, .min_sel = 0, .max_sel = 12, .uV_step = 50000 }, { .min_uV = 1700000, .max_uV = 3500000, .min_sel = 13, .max_sel = 31, .uV_step = 100000 }, diff --git a/drivers/regulator/wm8350-regulator.c b/drivers/regulator/wm8350-regulator.c index 835b5f0f344e..61ca9292a429 100644 --- a/drivers/regulator/wm8350-regulator.c +++ b/drivers/regulator/wm8350-regulator.c @@ -543,7 +543,7 @@ static int wm8350_dcdc_set_suspend_mode(struct regulator_dev *rdev, } static const struct regulator_linear_range wm8350_ldo_ranges[] = { - { .min_uV = 900000, .max_uV = 1750000, .min_sel = 0, .max_sel = 15, + { .min_uV = 900000, .max_uV = 1650000, .min_sel = 0, .max_sel = 15, .uV_step = 50000 }, { .min_uV = 1800000, .max_uV = 3300000, .min_sel = 16, .max_sel = 31, .uV_step = 100000 }, diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 72c5cdbe0791..544be722937c 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -72,6 +72,7 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm) } else err = -EINVAL; + pm_stay_awake(rtc->dev.parent); mutex_unlock(&rtc->ops_lock); /* A timer might have just expired */ schedule_work(&rtc->irqwork); @@ -113,6 +114,7 @@ int rtc_set_mmss(struct rtc_device *rtc, unsigned long secs) err = -EINVAL; } + pm_stay_awake(rtc->dev.parent); mutex_unlock(&rtc->ops_lock); /* A timer might have just expired */ schedule_work(&rtc->irqwork); @@ -771,9 +773,10 @@ static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer) alarm.time = rtc_ktime_to_tm(timer->node.expires); alarm.enabled = 1; err = __rtc_set_alarm(rtc, &alarm); - if (err == -ETIME) + if (err == -ETIME) { + pm_stay_awake(rtc->dev.parent); schedule_work(&rtc->irqwork); - else if (err) { + } else if (err) { timerqueue_del(&rtc->timerqueue, &timer->node); timer->enabled = 0; return err; @@ -818,8 +821,10 @@ static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer) alarm.time = rtc_ktime_to_tm(next->expires); alarm.enabled = 1; err = __rtc_set_alarm(rtc, &alarm); - if (err == -ETIME) + if (err == -ETIME) { + pm_stay_awake(rtc->dev.parent); schedule_work(&rtc->irqwork); + } } } @@ -845,7 +850,6 @@ void rtc_timer_do_work(struct work_struct *work) mutex_lock(&rtc->ops_lock); again: - pm_relax(rtc->dev.parent); __rtc_read_time(rtc, &tm); now = rtc_tm_to_ktime(tm); while ((next = timerqueue_getnext(&rtc->timerqueue))) { @@ -880,6 +884,7 @@ again: } else rtc_alarm_disable(rtc); + pm_relax(rtc->dev.parent); mutex_unlock(&rtc->ops_lock); } diff --git a/drivers/rtc/rtc-mrst.c b/drivers/rtc/rtc-mrst.c index 578baf9d9725..315209d9b407 100644 --- a/drivers/rtc/rtc-mrst.c +++ b/drivers/rtc/rtc-mrst.c @@ -38,8 +38,8 @@ #include #include -#include -#include +#include +#include struct mrst_rtc { struct rtc_device *rtc; diff --git a/drivers/rtc/rtc-pl031.c b/drivers/rtc/rtc-pl031.c index 0f0609b1aa2c..e3b25712b659 100644 --- a/drivers/rtc/rtc-pl031.c +++ b/drivers/rtc/rtc-pl031.c @@ -371,6 +371,7 @@ static int pl031_probe(struct amba_device *adev, const struct amba_id *id) } } + device_init_wakeup(&adev->dev, 1); ldata->rtc = rtc_device_register("pl031", &adev->dev, ops, THIS_MODULE); if (IS_ERR(ldata->rtc)) { @@ -384,8 +385,6 @@ static int pl031_probe(struct amba_device *adev, const struct amba_id *id) goto out_no_irq; } - device_init_wakeup(&adev->dev, 1); - return 0; out_no_irq: diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 8cd34bf644b3..77df9cb00688 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -145,9 +145,11 @@ bool __init sclp_has_linemode(void) if (sccb->header.response_code != 0x20) return 0; - if (sccb->sclp_send_mask & (EVTYP_MSG_MASK | EVTYP_PMSGCMD_MASK)) - return 1; - return 0; + if (!(sccb->sclp_send_mask & (EVTYP_OPCMD_MASK | EVTYP_PMSGCMD_MASK))) + return 0; + if (!(sccb->sclp_receive_mask & (EVTYP_MSG_MASK | EVTYP_PMSGCMD_MASK))) + return 0; + return 1; } bool __init sclp_has_vt220(void) diff --git a/drivers/s390/char/tty3270.c b/drivers/s390/char/tty3270.c index a0f47c83fd62..3f4ca4e09a4c 100644 --- a/drivers/s390/char/tty3270.c +++ b/drivers/s390/char/tty3270.c @@ -810,7 +810,7 @@ static void tty3270_resize_work(struct work_struct *work) struct winsize ws; screen = tty3270_alloc_screen(tp->n_rows, tp->n_cols); - if (!screen) + if (IS_ERR(screen)) return; /* Switch to new output size */ spin_lock_bh(&tp->view.lock); diff --git a/drivers/spi/spi-atmel.c b/drivers/spi/spi-atmel.c index fd7cc566095a..d4ac60b4a56e 100644 --- a/drivers/spi/spi-atmel.c +++ b/drivers/spi/spi-atmel.c @@ -1583,7 +1583,7 @@ static int atmel_spi_probe(struct platform_device *pdev) /* Initialize the hardware */ ret = clk_prepare_enable(clk); if (ret) - goto out_unmap_regs; + goto out_free_irq; spi_writel(as, CR, SPI_BIT(SWRST)); spi_writel(as, CR, SPI_BIT(SWRST)); /* AT91SAM9263 Rev B workaround */ if (as->caps.has_wdrbt) { @@ -1614,6 +1614,7 @@ out_free_dma: spi_writel(as, CR, SPI_BIT(SWRST)); spi_writel(as, CR, SPI_BIT(SWRST)); /* AT91SAM9263 Rev B workaround */ clk_disable_unprepare(clk); +out_free_irq: free_irq(irq, master); out_unmap_regs: iounmap(as->regs); diff --git a/drivers/spi/spi-clps711x.c b/drivers/spi/spi-clps711x.c index 5655acf55bfe..6416798828e7 100644 --- a/drivers/spi/spi-clps711x.c +++ b/drivers/spi/spi-clps711x.c @@ -226,7 +226,6 @@ static int spi_clps711x_probe(struct platform_device *pdev) dev_name(&pdev->dev), hw); if (ret) { dev_err(&pdev->dev, "Can't request IRQ\n"); - clk_put(hw->spi_clk); goto clk_out; } @@ -247,7 +246,6 @@ err_out: gpio_free(hw->chipselect[i]); spi_master_put(master); - kfree(master); return ret; } @@ -263,7 +261,6 @@ static int spi_clps711x_remove(struct platform_device *pdev) gpio_free(hw->chipselect[i]); spi_unregister_master(master); - kfree(master); return 0; } diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 6cd07d13ecab..4e44575bd87a 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -476,15 +476,9 @@ static int dspi_probe(struct platform_device *pdev) master->bus_num = bus_num; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (!res) { - dev_err(&pdev->dev, "can't get platform resource\n"); - ret = -EINVAL; - goto out_master_put; - } - dspi->base = devm_ioremap_resource(&pdev->dev, res); - if (!dspi->base) { - ret = -EINVAL; + if (IS_ERR(dspi->base)) { + ret = PTR_ERR(dspi->base); goto out_master_put; } diff --git a/drivers/spi/spi-mpc512x-psc.c b/drivers/spi/spi-mpc512x-psc.c index dbc5e999a1f5..6adf4e35816d 100644 --- a/drivers/spi/spi-mpc512x-psc.c +++ b/drivers/spi/spi-mpc512x-psc.c @@ -522,8 +522,10 @@ static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr, psc_num = master->bus_num; snprintf(clk_name, sizeof(clk_name), "psc%d_mclk", psc_num); clk = devm_clk_get(dev, clk_name); - if (IS_ERR(clk)) + if (IS_ERR(clk)) { + ret = PTR_ERR(clk); goto free_irq; + } ret = clk_prepare_enable(clk); if (ret) goto free_irq; diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c index 2eb06ee0b326..c1a50674c1e3 100644 --- a/drivers/spi/spi-pxa2xx.c +++ b/drivers/spi/spi-pxa2xx.c @@ -546,8 +546,17 @@ static irqreturn_t ssp_int(int irq, void *dev_id) if (pm_runtime_suspended(&drv_data->pdev->dev)) return IRQ_NONE; - sccr1_reg = read_SSCR1(reg); + /* + * If the device is not yet in RPM suspended state and we get an + * interrupt that is meant for another device, check if status bits + * are all set to one. That means that the device is already + * powered off. + */ status = read_SSSR(reg); + if (status == ~0) + return IRQ_NONE; + + sccr1_reg = read_SSCR1(reg); /* Ignore possible writes if we don't need to write */ if (!(sccr1_reg & SSCR1_TIE)) diff --git a/drivers/spi/spi-s3c64xx.c b/drivers/spi/spi-s3c64xx.c index 512b8893893b..a80376dc3a10 100644 --- a/drivers/spi/spi-s3c64xx.c +++ b/drivers/spi/spi-s3c64xx.c @@ -1428,6 +1428,8 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) S3C64XX_SPI_INT_TX_OVERRUN_EN | S3C64XX_SPI_INT_TX_UNDERRUN_EN, sdd->regs + S3C64XX_SPI_INT_EN); + pm_runtime_enable(&pdev->dev); + if (spi_register_master(master)) { dev_err(&pdev->dev, "cannot register SPI master\n"); ret = -EBUSY; @@ -1440,8 +1442,6 @@ static int s3c64xx_spi_probe(struct platform_device *pdev) mem_res, sdd->rx_dma.dmach, sdd->tx_dma.dmach); - pm_runtime_enable(&pdev->dev); - return 0; err3: diff --git a/drivers/spi/spi-sh-hspi.c b/drivers/spi/spi-sh-hspi.c index 0b68cb592fa4..e488a90a98b8 100644 --- a/drivers/spi/spi-sh-hspi.c +++ b/drivers/spi/spi-sh-hspi.c @@ -296,6 +296,8 @@ static int hspi_probe(struct platform_device *pdev) goto error1; } + pm_runtime_enable(&pdev->dev); + master->num_chipselect = 1; master->bus_num = pdev->id; master->setup = hspi_setup; @@ -309,8 +311,6 @@ static int hspi_probe(struct platform_device *pdev) goto error1; } - pm_runtime_enable(&pdev->dev); - return 0; error1: diff --git a/drivers/staging/comedi/drivers/ni_65xx.c b/drivers/staging/comedi/drivers/ni_65xx.c index 3ba4c5712dff..853f62b2b1a9 100644 --- a/drivers/staging/comedi/drivers/ni_65xx.c +++ b/drivers/staging/comedi/drivers/ni_65xx.c @@ -369,28 +369,23 @@ static int ni_65xx_dio_insn_bits(struct comedi_device *dev, { const struct ni_65xx_board *board = comedi_board(dev); struct ni_65xx_private *devpriv = dev->private; - unsigned base_bitfield_channel; - const unsigned max_ports_per_bitfield = 5; + int base_bitfield_channel; unsigned read_bits = 0; - unsigned j; + int last_port_offset = ni_65xx_port_by_channel(s->n_chan - 1); + int port_offset; base_bitfield_channel = CR_CHAN(insn->chanspec); - for (j = 0; j < max_ports_per_bitfield; ++j) { - const unsigned port_offset = - ni_65xx_port_by_channel(base_bitfield_channel) + j; - const unsigned port = - sprivate(s)->base_port + port_offset; - unsigned base_port_channel; + for (port_offset = ni_65xx_port_by_channel(base_bitfield_channel); + port_offset <= last_port_offset; port_offset++) { + unsigned port = sprivate(s)->base_port + port_offset; + int base_port_channel = port_offset * ni_65xx_channels_per_port; unsigned port_mask, port_data, port_read_bits; - int bitshift; - if (port >= ni_65xx_total_num_ports(board)) + int bitshift = base_port_channel - base_bitfield_channel; + + if (bitshift >= 32) break; - base_port_channel = port_offset * ni_65xx_channels_per_port; port_mask = data[0]; port_data = data[1]; - bitshift = base_port_channel - base_bitfield_channel; - if (bitshift >= 32 || bitshift <= -32) - break; if (bitshift > 0) { port_mask >>= bitshift; port_data >>= bitshift; diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index 47c5888461ff..a2e52a0c53c9 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -41,7 +41,6 @@ struct imx_drm_device { struct list_head encoder_list; struct list_head connector_list; struct mutex mutex; - int references; int pipes; struct drm_fbdev_cma *fbhelper; }; @@ -241,8 +240,6 @@ struct drm_device *imx_drm_device_get(void) } } - imxdrm->references++; - return imxdrm->drm; unwind_crtc: @@ -280,8 +277,6 @@ void imx_drm_device_put(void) list_for_each_entry(enc, &imxdrm->encoder_list, list) module_put(enc->owner); - imxdrm->references--; - mutex_unlock(&imxdrm->mutex); } EXPORT_SYMBOL_GPL(imx_drm_device_put); @@ -485,7 +480,7 @@ int imx_drm_add_crtc(struct drm_crtc *crtc, mutex_lock(&imxdrm->mutex); - if (imxdrm->references) { + if (imxdrm->drm->open_count) { ret = -EBUSY; goto err_busy; } @@ -564,7 +559,7 @@ int imx_drm_add_encoder(struct drm_encoder *encoder, mutex_lock(&imxdrm->mutex); - if (imxdrm->references) { + if (imxdrm->drm->open_count) { ret = -EBUSY; goto err_busy; } @@ -709,7 +704,7 @@ int imx_drm_add_connector(struct drm_connector *connector, mutex_lock(&imxdrm->mutex); - if (imxdrm->references) { + if (imxdrm->drm->open_count) { ret = -EBUSY; goto err_busy; } diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c index 2644edf438c1..c8b43442dc74 100644 --- a/drivers/staging/lustre/lustre/obdecho/echo_client.c +++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c @@ -1387,7 +1387,7 @@ echo_copyout_lsm (struct lov_stripe_md *lsm, void *_ulsm, int ulsm_nob) if (nob > ulsm_nob) return (-EINVAL); - if (copy_to_user (ulsm, lsm, sizeof(ulsm))) + if (copy_to_user (ulsm, lsm, sizeof(*ulsm))) return (-EFAULT); for (i = 0; i < lsm->lsm_stripe_count; i++) { diff --git a/drivers/staging/octeon-usb/cvmx-usb.c b/drivers/staging/octeon-usb/cvmx-usb.c index d7b3c82b5ead..45dfe94199ae 100644 --- a/drivers/staging/octeon-usb/cvmx-usb.c +++ b/drivers/staging/octeon-usb/cvmx-usb.c @@ -604,7 +604,7 @@ int cvmx_usb_initialize(struct cvmx_usb_state *state, int usb_port_number, } } - memset(usb, 0, sizeof(usb)); + memset(usb, 0, sizeof(*usb)); usb->init_flags = flags; /* Initialize the USB state structure */ diff --git a/drivers/staging/rtl8188eu/core/rtw_mp.c b/drivers/staging/rtl8188eu/core/rtw_mp.c index c7ff2e4d1f23..9832dcbbd07f 100644 --- a/drivers/staging/rtl8188eu/core/rtw_mp.c +++ b/drivers/staging/rtl8188eu/core/rtw_mp.c @@ -907,7 +907,7 @@ u32 mp_query_psd(struct adapter *pAdapter, u8 *data) sscanf(data, "pts =%d, start =%d, stop =%d", &psd_pts, &psd_start, &psd_stop); } - _rtw_memset(data, '\0', sizeof(data)); + _rtw_memset(data, '\0', sizeof(*data)); i = psd_start; while (i < psd_stop) { diff --git a/drivers/staging/rtl8188eu/hal/rtl8188e_dm.c b/drivers/staging/rtl8188eu/hal/rtl8188e_dm.c index 9c2e7a20c09e..ec0028d4e61a 100644 --- a/drivers/staging/rtl8188eu/hal/rtl8188e_dm.c +++ b/drivers/staging/rtl8188eu/hal/rtl8188e_dm.c @@ -57,7 +57,7 @@ static void Init_ODM_ComInfo_88E(struct adapter *Adapter) u8 cut_ver, fab_ver; /* Init Value */ - _rtw_memset(dm_odm, 0, sizeof(dm_odm)); + _rtw_memset(dm_odm, 0, sizeof(*dm_odm)); dm_odm->Adapter = Adapter; diff --git a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c index cd4100fb3645..95953ebc0279 100644 --- a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c +++ b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c @@ -6973,7 +6973,7 @@ static int rtw_mp_ctx(struct net_device *dev, stop = strncmp(extra, "stop", 4); sscanf(extra, "count =%d, pkt", &count); - _rtw_memset(extra, '\0', sizeof(extra)); + _rtw_memset(extra, '\0', sizeof(*extra)); if (stop == 0) { bStartTest = 0; /* To set Stop */ diff --git a/drivers/staging/rtl8188eu/os_dep/usb_intf.c b/drivers/staging/rtl8188eu/os_dep/usb_intf.c index d3078d200e50..9ca3180ebaa0 100644 --- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c +++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c @@ -54,6 +54,7 @@ static struct usb_device_id rtw_usb_id_tbl[] = { /*=== Customer ID ===*/ /****** 8188EUS ********/ {USB_DEVICE(0x8179, 0x07B8)}, /* Abocom - Abocom */ + {USB_DEVICE(0x2001, 0x330F)}, /* DLink DWA-125 REV D1 */ {} /* Terminating entry */ }; diff --git a/drivers/staging/rtl8192u/r819xU_cmdpkt.c b/drivers/staging/rtl8192u/r819xU_cmdpkt.c index 5bc361b16d4c..56144014b7c9 100644 --- a/drivers/staging/rtl8192u/r819xU_cmdpkt.c +++ b/drivers/staging/rtl8192u/r819xU_cmdpkt.c @@ -37,6 +37,8 @@ rt_status SendTxCommandPacket(struct net_device *dev, void *pData, u32 DataLen) /* Get TCB and local buffer from common pool. (It is shared by CmdQ, MgntQ, and USB coalesce DataQ) */ skb = dev_alloc_skb(USB_HWDESC_HEADER_LEN + DataLen + 4); + if (!skb) + return RT_STATUS_FAILURE; memcpy((unsigned char *)(skb->cb), &dev, sizeof(dev)); tcb_desc = (cb_desc *)(skb->cb + MAX_DEV_ADDR_SIZE); tcb_desc->queue_index = TXCMD_QUEUE; diff --git a/drivers/staging/vt6656/iwctl.c b/drivers/staging/vt6656/iwctl.c index d0cf7d8a20e5..8872e0f84f40 100644 --- a/drivers/staging/vt6656/iwctl.c +++ b/drivers/staging/vt6656/iwctl.c @@ -1634,6 +1634,9 @@ int iwctl_siwencodeext(struct net_device *dev, struct iw_request_info *info, if (pMgmt == NULL) return -EFAULT; + if (!(pDevice->flags & DEVICE_FLAGS_OPENED)) + return -ENODEV; + buf = kzalloc(sizeof(struct viawget_wpa_param), GFP_KERNEL); if (buf == NULL) return -ENOMEM; diff --git a/drivers/staging/vt6656/main_usb.c b/drivers/staging/vt6656/main_usb.c index 536971786ae8..6f9d28182445 100644 --- a/drivers/staging/vt6656/main_usb.c +++ b/drivers/staging/vt6656/main_usb.c @@ -1098,6 +1098,8 @@ static int device_close(struct net_device *dev) memset(pMgmt->abyCurrBSSID, 0, 6); pMgmt->eCurrState = WMAC_STATE_IDLE; + pDevice->flags &= ~DEVICE_FLAGS_OPENED; + device_free_tx_bufs(pDevice); device_free_rx_bufs(pDevice); device_free_int_bufs(pDevice); @@ -1109,7 +1111,6 @@ static int device_close(struct net_device *dev) usb_free_urb(pDevice->pInterruptURB); BSSvClearNodeDBTable(pDevice, 0); - pDevice->flags &=(~DEVICE_FLAGS_OPENED); DBG_PRT(MSG_LEVEL_DEBUG, KERN_INFO "device_close2 \n"); diff --git a/drivers/staging/vt6656/rxtx.c b/drivers/staging/vt6656/rxtx.c index fb743a8811bb..14f3e852215d 100644 --- a/drivers/staging/vt6656/rxtx.c +++ b/drivers/staging/vt6656/rxtx.c @@ -148,6 +148,8 @@ static void *s_vGetFreeContext(struct vnt_private *pDevice) DBG_PRT(MSG_LEVEL_DEBUG, KERN_INFO"GetFreeContext()\n"); for (ii = 0; ii < pDevice->cbTD; ii++) { + if (!pDevice->apTD[ii]) + return NULL; pContext = pDevice->apTD[ii]; if (pContext->bBoolInUse == false) { pContext->bBoolInUse = true; diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 35b61f7d6c63..38e44b9abf0f 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -753,7 +753,8 @@ static void iscsit_unmap_iovec(struct iscsi_cmd *cmd) static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn) { - struct iscsi_cmd *cmd; + LIST_HEAD(ack_list); + struct iscsi_cmd *cmd, *cmd_p; conn->exp_statsn = exp_statsn; @@ -761,19 +762,23 @@ static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn) return; spin_lock_bh(&conn->cmd_lock); - list_for_each_entry(cmd, &conn->conn_cmd_list, i_conn_node) { + list_for_each_entry_safe(cmd, cmd_p, &conn->conn_cmd_list, i_conn_node) { spin_lock(&cmd->istate_lock); if ((cmd->i_state == ISTATE_SENT_STATUS) && iscsi_sna_lt(cmd->stat_sn, exp_statsn)) { cmd->i_state = ISTATE_REMOVE; spin_unlock(&cmd->istate_lock); - iscsit_add_cmd_to_immediate_queue(cmd, conn, - cmd->i_state); + list_move_tail(&cmd->i_conn_node, &ack_list); continue; } spin_unlock(&cmd->istate_lock); } spin_unlock_bh(&conn->cmd_lock); + + list_for_each_entry_safe(cmd, cmd_p, &ack_list, i_conn_node) { + list_del(&cmd->i_conn_node); + iscsit_free_cmd(cmd, false); + } } static int iscsit_allocate_iovecs(struct iscsi_cmd *cmd) diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index 14d1aed5af1d..ef6d836a4d09 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -1192,7 +1192,7 @@ get_target: */ alloc_tags: tag_num = max_t(u32, ISCSIT_MIN_TAGS, queue_depth); - tag_num += ISCSIT_EXTRA_TAGS; + tag_num += (tag_num / 2) + ISCSIT_EXTRA_TAGS; tag_size = sizeof(struct iscsi_cmd) + conn->conn_transport->priv_size; ret = transport_alloc_session_tags(sess->se_sess, tag_num, tag_size); diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index f2de28e178fd..b0cac0c342e1 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -736,7 +736,7 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown) * Fallthrough */ case ISCSI_OP_SCSI_TMFUNC: - rc = transport_generic_free_cmd(&cmd->se_cmd, 1); + rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown); if (!rc && shutdown && se_cmd && se_cmd->se_sess) { __iscsit_free_cmd(cmd, true, shutdown); target_put_sess_cmd(se_cmd->se_sess, se_cmd); @@ -752,7 +752,7 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown) se_cmd = &cmd->se_cmd; __iscsit_free_cmd(cmd, true, shutdown); - rc = transport_generic_free_cmd(&cmd->se_cmd, 1); + rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown); if (!rc && shutdown && se_cmd->se_sess) { __iscsit_free_cmd(cmd, true, shutdown); target_put_sess_cmd(se_cmd->se_sess, se_cmd); diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 6c17295e8d7c..4714c6f8da4b 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -349,7 +349,16 @@ static sense_reason_t compare_and_write_post(struct se_cmd *cmd) { struct se_device *dev = cmd->se_dev; - cmd->se_cmd_flags |= SCF_COMPARE_AND_WRITE_POST; + /* + * Only set SCF_COMPARE_AND_WRITE_POST to force a response fall-through + * within target_complete_ok_work() if the command was successfully + * sent to the backend driver. + */ + spin_lock_irq(&cmd->t_state_lock); + if ((cmd->transport_state & CMD_T_SENT) && !cmd->scsi_status) + cmd->se_cmd_flags |= SCF_COMPARE_AND_WRITE_POST; + spin_unlock_irq(&cmd->t_state_lock); + /* * Unlock ->caw_sem originally obtained during sbc_compare_and_write() * before the original READ I/O submission. @@ -363,7 +372,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd) { struct se_device *dev = cmd->se_dev; struct scatterlist *write_sg = NULL, *sg; - unsigned char *buf, *addr; + unsigned char *buf = NULL, *addr; struct sg_mapping_iter m; unsigned int offset = 0, len; unsigned int nlbas = cmd->t_task_nolb; @@ -378,6 +387,15 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd) */ if (!cmd->t_data_sg || !cmd->t_bidi_data_sg) return TCM_NO_SENSE; + /* + * Immediately exit + release dev->caw_sem if command has already + * been failed with a non-zero SCSI status. + */ + if (cmd->scsi_status) { + pr_err("compare_and_write_callback: non zero scsi_status:" + " 0x%02x\n", cmd->scsi_status); + goto out; + } buf = kzalloc(cmd->data_length, GFP_KERNEL); if (!buf) { @@ -508,6 +526,12 @@ sbc_compare_and_write(struct se_cmd *cmd) cmd->transport_complete_callback = NULL; return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; } + /* + * Reset cmd->data_length to individual block_size in order to not + * confuse backend drivers that depend on this value matching the + * size of the I/O being submitted. + */ + cmd->data_length = cmd->t_task_nolb * dev->dev_attrib.block_size; ret = cmd->execute_rw(cmd, cmd->t_bidi_data_sg, cmd->t_bidi_data_nents, DMA_FROM_DEVICE); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 84747cc1aac0..81e945eefbbd 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -236,17 +236,24 @@ int transport_alloc_session_tags(struct se_session *se_sess, { int rc; - se_sess->sess_cmd_map = kzalloc(tag_num * tag_size, GFP_KERNEL); + se_sess->sess_cmd_map = kzalloc(tag_num * tag_size, + GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); if (!se_sess->sess_cmd_map) { - pr_err("Unable to allocate se_sess->sess_cmd_map\n"); - return -ENOMEM; + se_sess->sess_cmd_map = vzalloc(tag_num * tag_size); + if (!se_sess->sess_cmd_map) { + pr_err("Unable to allocate se_sess->sess_cmd_map\n"); + return -ENOMEM; + } } rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num); if (rc < 0) { pr_err("Unable to init se_sess->sess_tag_pool," " tag_num: %u\n", tag_num); - kfree(se_sess->sess_cmd_map); + if (is_vmalloc_addr(se_sess->sess_cmd_map)) + vfree(se_sess->sess_cmd_map); + else + kfree(se_sess->sess_cmd_map); se_sess->sess_cmd_map = NULL; return -ENOMEM; } @@ -412,7 +419,10 @@ void transport_free_session(struct se_session *se_sess) { if (se_sess->sess_cmd_map) { percpu_ida_destroy(&se_sess->sess_tag_pool); - kfree(se_sess->sess_cmd_map); + if (is_vmalloc_addr(se_sess->sess_cmd_map)) + vfree(se_sess->sess_cmd_map); + else + kfree(se_sess->sess_cmd_map); } kmem_cache_free(se_sess_cache, se_sess); } diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c index 4d22e7d2adca..3da4fd10b9f8 100644 --- a/drivers/target/target_core_xcopy.c +++ b/drivers/target/target_core_xcopy.c @@ -298,8 +298,8 @@ static int target_xcopy_parse_segdesc_02(struct se_cmd *se_cmd, struct xcopy_op (unsigned long long)xop->dst_lba); if (dc != 0) { - xop->dbl = (desc[29] << 16) & 0xff; - xop->dbl |= (desc[30] << 8) & 0xff; + xop->dbl = (desc[29] & 0xff) << 16; + xop->dbl |= (desc[30] & 0xff) << 8; xop->dbl |= desc[31] & 0xff; pr_debug("XCOPY seg desc 0x02: DC=1 w/ dbl: %u\n", xop->dbl); diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index e61c36cbb866..c193af6a628f 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -636,6 +636,7 @@ struct console xenboot_console = { .name = "xenboot", .write = xenboot_write_console, .flags = CON_PRINTBUFFER | CON_BOOT | CON_ANYTIME, + .index = -1, }; #endif /* CONFIG_EARLY_PRINTK */ diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index c9a9ddd1d0bc..7a744b69c3d1 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1758,8 +1758,7 @@ static void n_tty_set_termios(struct tty_struct *tty, struct ktermios *old) canon_change = (old->c_lflag ^ tty->termios.c_lflag) & ICANON; if (canon_change) { bitmap_zero(ldata->read_flags, N_TTY_BUF_SIZE); - ldata->line_start = 0; - ldata->canon_head = ldata->read_tail; + ldata->line_start = ldata->canon_head = ldata->read_tail; ldata->erasing = 0; ldata->lnext = 0; } @@ -2184,28 +2183,34 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, if (!input_available_p(tty, 0)) { if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) { - retval = -EIO; - break; - } - if (tty_hung_up_p(file)) - break; - if (!timeout) - break; - if (file->f_flags & O_NONBLOCK) { - retval = -EAGAIN; - break; - } - if (signal_pending(current)) { - retval = -ERESTARTSYS; - break; - } - n_tty_set_room(tty); - up_read(&tty->termios_rwsem); + up_read(&tty->termios_rwsem); + tty_flush_to_ldisc(tty); + down_read(&tty->termios_rwsem); + if (!input_available_p(tty, 0)) { + retval = -EIO; + break; + } + } else { + if (tty_hung_up_p(file)) + break; + if (!timeout) + break; + if (file->f_flags & O_NONBLOCK) { + retval = -EAGAIN; + break; + } + if (signal_pending(current)) { + retval = -ERESTARTSYS; + break; + } + n_tty_set_room(tty); + up_read(&tty->termios_rwsem); - timeout = schedule_timeout(timeout); + timeout = schedule_timeout(timeout); - down_read(&tty->termios_rwsem); - continue; + down_read(&tty->termios_rwsem); + continue; + } } __set_current_state(TASK_RUNNING); diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index a0ebbc9ce5cd..042aa077b5b3 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -1912,9 +1912,6 @@ static int serial_imx_probe_dt(struct imx_port *sport, sport->devdata = of_id->data; - if (of_device_is_stdout_path(np)) - add_preferred_console(imx_reg.cons->name, sport->port.line, 0); - return 0; } #else diff --git a/drivers/tty/serial/pch_uart.c b/drivers/tty/serial/pch_uart.c index 52379e56a31e..44077c0b7670 100644 --- a/drivers/tty/serial/pch_uart.c +++ b/drivers/tty/serial/pch_uart.c @@ -667,30 +667,21 @@ static int pop_tx_x(struct eg20t_port *priv, unsigned char *buf) static int dma_push_rx(struct eg20t_port *priv, int size) { - struct tty_struct *tty; int room; struct uart_port *port = &priv->port; struct tty_port *tport = &port->state->port; - port = &priv->port; - tty = tty_port_tty_get(tport); - if (!tty) { - dev_dbg(priv->port.dev, "%s:tty is busy now", __func__); - return 0; - } - room = tty_buffer_request_room(tport, size); if (room < size) dev_warn(port->dev, "Rx overrun: dropping %u bytes\n", size - room); if (!room) - return room; + return 0; tty_insert_flip_string(tport, sg_virt(&priv->sg_rx), size); port->icount.rx += room; - tty_kref_put(tty); return room; } @@ -1098,6 +1089,8 @@ static void pch_uart_err_ir(struct eg20t_port *priv, unsigned int lsr) if (tty == NULL) { for (i = 0; error_msg[i] != NULL; i++) dev_err(&priv->pdev->dev, error_msg[i]); + } else { + tty_kref_put(tty); } } diff --git a/drivers/tty/serial/serial-tegra.c b/drivers/tty/serial/serial-tegra.c index d0d972f7e43e..0489a2bdcdf9 100644 --- a/drivers/tty/serial/serial-tegra.c +++ b/drivers/tty/serial/serial-tegra.c @@ -732,7 +732,7 @@ static irqreturn_t tegra_uart_isr(int irq, void *data) static void tegra_uart_stop_rx(struct uart_port *u) { struct tegra_uart_port *tup = to_tegra_uport(u); - struct tty_struct *tty = tty_port_tty_get(&tup->uport.state->port); + struct tty_struct *tty; struct tty_port *port = &u->state->port; struct dma_tx_state state; unsigned long ier; @@ -744,6 +744,8 @@ static void tegra_uart_stop_rx(struct uart_port *u) if (!tup->rx_in_progress) return; + tty = tty_port_tty_get(&tup->uport.state->port); + tegra_uart_wait_sym_time(tup, 1); /* wait a character interval */ ier = tup->ier_shadow; diff --git a/drivers/tty/serial/vt8500_serial.c b/drivers/tty/serial/vt8500_serial.c index 93b697a0de65..15ad6fcda88b 100644 --- a/drivers/tty/serial/vt8500_serial.c +++ b/drivers/tty/serial/vt8500_serial.c @@ -561,12 +561,13 @@ static int vt8500_serial_probe(struct platform_device *pdev) if (!mmres || !irqres) return -ENODEV; - if (np) + if (np) { port = of_alias_get_id(np, "serial"); if (port >= VT8500_MAX_PORTS) port = -1; - else + } else { port = -1; + } if (port < 0) { /* calculate the port id */ diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c index 03ba081c5772..6fd60fece6b4 100644 --- a/drivers/tty/tty_ioctl.c +++ b/drivers/tty/tty_ioctl.c @@ -1201,6 +1201,9 @@ int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file, } return 0; case TCFLSH: + retval = tty_check_change(tty); + if (retval) + return retval; return __tty_perform_flush(tty, arg); default: /* Try the mode commands */ diff --git a/drivers/usb/chipidea/Kconfig b/drivers/usb/chipidea/Kconfig index 4a851e15e58c..77b47d82c9a6 100644 --- a/drivers/usb/chipidea/Kconfig +++ b/drivers/usb/chipidea/Kconfig @@ -1,6 +1,6 @@ config USB_CHIPIDEA tristate "ChipIdea Highspeed Dual Role Controller" - depends on (USB_EHCI_HCD && USB_GADGET) || (USB_EHCI_HCD && !USB_GADGET) || (!USB_EHCI_HCD && USB_GADGET) + depends on ((USB_EHCI_HCD && USB_GADGET) || (USB_EHCI_HCD && !USB_GADGET) || (!USB_EHCI_HCD && USB_GADGET)) && HAS_DMA help Say Y here if your system has a dual role high speed USB controller based on ChipIdea silicon IP. Currently, only the diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c index 74d998d9b45b..be822a2c1776 100644 --- a/drivers/usb/chipidea/ci_hdrc_imx.c +++ b/drivers/usb/chipidea/ci_hdrc_imx.c @@ -131,7 +131,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "usbmisc init failed, ret=%d\n", ret); - goto err_clk; + goto err_phy; } } @@ -143,7 +143,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev) dev_err(&pdev->dev, "Can't register ci_hdrc platform device, err=%d\n", ret); - goto err_clk; + goto err_phy; } if (data->usbmisc_data) { @@ -164,6 +164,9 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev) disable_device: ci_hdrc_remove_device(data->ci_pdev); +err_phy: + if (data->phy) + usb_phy_shutdown(data->phy); err_clk: clk_disable_unprepare(data->clk); return ret; diff --git a/drivers/usb/chipidea/ci_hdrc_pci.c b/drivers/usb/chipidea/ci_hdrc_pci.c index 042320a6c6c7..d514332ac081 100644 --- a/drivers/usb/chipidea/ci_hdrc_pci.c +++ b/drivers/usb/chipidea/ci_hdrc_pci.c @@ -129,7 +129,12 @@ static DEFINE_PCI_DEVICE_TABLE(ci_hdrc_pci_id_table) = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0829), .driver_data = (kernel_ulong_t)&penwell_pci_platdata, }, - { 0, 0, 0, 0, 0, 0, 0 /* end: all zeroes */ } + { + /* Intel Clovertrail */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xe006), + .driver_data = (kernel_ulong_t)&penwell_pci_platdata, + }, + { 0 } /* end: all zeroes */ }; MODULE_DEVICE_TABLE(pci, ci_hdrc_pci_id_table); diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c index 94626409559a..23763dcec069 100644 --- a/drivers/usb/chipidea/core.c +++ b/drivers/usb/chipidea/core.c @@ -605,6 +605,7 @@ static int ci_hdrc_remove(struct platform_device *pdev) dbg_remove_files(ci); free_irq(ci->irq, ci); ci_role_destroy(ci); + kfree(ci->hw_bank.regmap); return 0; } diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c index 6f96795dd20c..64d7a6d9a1ad 100644 --- a/drivers/usb/chipidea/host.c +++ b/drivers/usb/chipidea/host.c @@ -100,8 +100,10 @@ static void host_stop(struct ci_hdrc *ci) { struct usb_hcd *hcd = ci->hcd; - usb_remove_hcd(hcd); - usb_put_hcd(hcd); + if (hcd) { + usb_remove_hcd(hcd); + usb_put_hcd(hcd); + } if (ci->platdata->reg_vbus) regulator_disable(ci->platdata->reg_vbus); } diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c index 6b4c2f2eb946..9333083dd111 100644 --- a/drivers/usb/chipidea/udc.c +++ b/drivers/usb/chipidea/udc.c @@ -1600,6 +1600,8 @@ static void destroy_eps(struct ci_hdrc *ci) for (i = 0; i < ci->hw_ep_max; i++) { struct ci_hw_ep *hwep = &ci->ci_hw_ep[i]; + if (hwep->pending_td) + free_pending_td(hwep); dma_pool_free(ci->qh_pool, hwep->qh.ptr, hwep->qh.dma); } } @@ -1667,13 +1669,13 @@ static int ci_udc_stop(struct usb_gadget *gadget, if (ci->platdata->notify_event) ci->platdata->notify_event(ci, CI_HDRC_CONTROLLER_STOPPED_EVENT); - ci->driver = NULL; spin_unlock_irqrestore(&ci->lock, flags); _gadget_stop_activity(&ci->gadget); spin_lock_irqsave(&ci->lock, flags); pm_runtime_put(&ci->gadget.dev); } + ci->driver = NULL; spin_unlock_irqrestore(&ci->lock, flags); return 0; diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 737e3c19967b..71dc5d768fa5 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -742,6 +742,22 @@ static int check_ctrlrecip(struct dev_state *ps, unsigned int requesttype, if ((index & ~USB_DIR_IN) == 0) return 0; ret = findintfep(ps->dev, index); + if (ret < 0) { + /* + * Some not fully compliant Win apps seem to get + * index wrong and have the endpoint number here + * rather than the endpoint address (with the + * correct direction). Win does let this through, + * so we'll not reject it here but leave it to + * the device to not break KVM. But we warn. + */ + ret = findintfep(ps->dev, index ^ 0x80); + if (ret >= 0) + dev_info(&ps->dev->dev, + "%s: process %i (%s) requesting ep %02x but needs %02x\n", + __func__, task_pid_nr(current), + current->comm, index, index ^ 0x80); + } if (ret >= 0) ret = checkintf(ps, ret); break; diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index dde4c83516a1..e6b682c6c236 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3426,6 +3426,9 @@ static int usb_req_set_sel(struct usb_device *udev, enum usb3_link_state state) unsigned long long u2_pel; int ret; + if (udev->state != USB_STATE_CONFIGURED) + return 0; + /* Convert SEL and PEL stored in ns to us */ u1_sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); u1_pel = DIV_ROUND_UP(udev->u1_params.pel, 1000); diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 5b44cd47da5b..01fe36273f3b 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -97,6 +97,9 @@ static const struct usb_device_id usb_quirk_list[] = { /* Alcor Micro Corp. Hub */ { USB_DEVICE(0x058f, 0x9254), .driver_info = USB_QUIRK_RESET_RESUME }, + /* MicroTouch Systems touchscreen */ + { USB_DEVICE(0x0596, 0x051e), .driver_info = USB_QUIRK_RESET_RESUME }, + /* appletouch */ { USB_DEVICE(0x05ac, 0x021a), .driver_info = USB_QUIRK_RESET_RESUME }, @@ -130,6 +133,9 @@ static const struct usb_device_id usb_quirk_list[] = { /* Broadcom BCM92035DGROM BT dongle */ { USB_DEVICE(0x0a5c, 0x2021), .driver_info = USB_QUIRK_RESET_RESUME }, + /* MAYA44USB sound device */ + { USB_DEVICE(0x0a92, 0x0091), .driver_info = USB_QUIRK_RESET_RESUME }, + /* Action Semiconductor flash disk */ { USB_DEVICE(0x10d6, 0x2200), .driver_info = USB_QUIRK_STRING_FETCH_255 }, diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index 997ebe420bc9..2e252aae51ca 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -29,6 +29,7 @@ #define PCI_VENDOR_ID_SYNOPSYS 0x16c3 #define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3 0xabcd #define PCI_DEVICE_ID_INTEL_BYT 0x0f37 +#define PCI_DEVICE_ID_INTEL_MRFLD 0x119e struct dwc3_pci { struct device *dev; @@ -189,6 +190,7 @@ static DEFINE_PCI_DEVICE_TABLE(dwc3_pci_id_table) = { PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3), }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_BYT), }, + { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MRFLD), }, { } /* Terminating Entry */ }; MODULE_DEVICE_TABLE(pci, dwc3_pci_id_table); diff --git a/drivers/usb/gadget/f_fs.c b/drivers/usb/gadget/f_fs.c index 1a66c5baa0d1..44cf775a8627 100644 --- a/drivers/usb/gadget/f_fs.c +++ b/drivers/usb/gadget/f_fs.c @@ -1034,37 +1034,19 @@ struct ffs_sb_fill_data { struct ffs_file_perms perms; umode_t root_mode; const char *dev_name; - union { - /* set by ffs_fs_mount(), read by ffs_sb_fill() */ - void *private_data; - /* set by ffs_sb_fill(), read by ffs_fs_mount */ - struct ffs_data *ffs_data; - }; + struct ffs_data *ffs_data; }; static int ffs_sb_fill(struct super_block *sb, void *_data, int silent) { struct ffs_sb_fill_data *data = _data; struct inode *inode; - struct ffs_data *ffs; + struct ffs_data *ffs = data->ffs_data; ENTER(); - /* Initialise data */ - ffs = ffs_data_new(); - if (unlikely(!ffs)) - goto Enomem; - ffs->sb = sb; - ffs->dev_name = kstrdup(data->dev_name, GFP_KERNEL); - if (unlikely(!ffs->dev_name)) - goto Enomem; - ffs->file_perms = data->perms; - ffs->private_data = data->private_data; - - /* used by the caller of this function */ - data->ffs_data = ffs; - + data->ffs_data = NULL; sb->s_fs_info = ffs; sb->s_blocksize = PAGE_CACHE_SIZE; sb->s_blocksize_bits = PAGE_CACHE_SHIFT; @@ -1080,17 +1062,14 @@ static int ffs_sb_fill(struct super_block *sb, void *_data, int silent) &data->perms); sb->s_root = d_make_root(inode); if (unlikely(!sb->s_root)) - goto Enomem; + return -ENOMEM; /* EP0 file */ if (unlikely(!ffs_sb_create_file(sb, "ep0", ffs, &ffs_ep0_operations, NULL))) - goto Enomem; + return -ENOMEM; return 0; - -Enomem: - return -ENOMEM; } static int ffs_fs_parse_opts(struct ffs_sb_fill_data *data, char *opts) @@ -1193,6 +1172,7 @@ ffs_fs_mount(struct file_system_type *t, int flags, struct dentry *rv; int ret; void *ffs_dev; + struct ffs_data *ffs; ENTER(); @@ -1200,18 +1180,30 @@ ffs_fs_mount(struct file_system_type *t, int flags, if (unlikely(ret < 0)) return ERR_PTR(ret); + ffs = ffs_data_new(); + if (unlikely(!ffs)) + return ERR_PTR(-ENOMEM); + ffs->file_perms = data.perms; + + ffs->dev_name = kstrdup(dev_name, GFP_KERNEL); + if (unlikely(!ffs->dev_name)) { + ffs_data_put(ffs); + return ERR_PTR(-ENOMEM); + } + ffs_dev = functionfs_acquire_dev_callback(dev_name); - if (IS_ERR(ffs_dev)) - return ffs_dev; + if (IS_ERR(ffs_dev)) { + ffs_data_put(ffs); + return ERR_CAST(ffs_dev); + } + ffs->private_data = ffs_dev; + data.ffs_data = ffs; - data.dev_name = dev_name; - data.private_data = ffs_dev; rv = mount_nodev(t, flags, &data, ffs_sb_fill); - - /* data.ffs_data is set by ffs_sb_fill */ - if (IS_ERR(rv)) + if (IS_ERR(rv) && data.ffs_data) { functionfs_release_dev_callback(data.ffs_data); - + ffs_data_put(data.ffs_data); + } return rv; } @@ -2264,6 +2256,8 @@ static int ffs_func_bind(struct usb_configuration *c, data->raw_descs + ret, (sizeof data->raw_descs) - ret, __ffs_func_bind_do_descs, func); + if (unlikely(ret < 0)) + goto error; } /* diff --git a/drivers/usb/gadget/pxa25x_udc.c b/drivers/usb/gadget/pxa25x_udc.c index cc9207473dbc..0ac6064aa3b8 100644 --- a/drivers/usb/gadget/pxa25x_udc.c +++ b/drivers/usb/gadget/pxa25x_udc.c @@ -2054,7 +2054,7 @@ static struct pxa25x_udc memory = { /* * probe - binds to the platform device */ -static int __init pxa25x_udc_probe(struct platform_device *pdev) +static int pxa25x_udc_probe(struct platform_device *pdev) { struct pxa25x_udc *dev = &memory; int retval, irq; @@ -2203,7 +2203,7 @@ static void pxa25x_udc_shutdown(struct platform_device *_dev) pullup_off(); } -static int __exit pxa25x_udc_remove(struct platform_device *pdev) +static int pxa25x_udc_remove(struct platform_device *pdev) { struct pxa25x_udc *dev = platform_get_drvdata(pdev); @@ -2294,7 +2294,8 @@ static int pxa25x_udc_resume(struct platform_device *dev) static struct platform_driver udc_driver = { .shutdown = pxa25x_udc_shutdown, - .remove = __exit_p(pxa25x_udc_remove), + .probe = pxa25x_udc_probe, + .remove = pxa25x_udc_remove, .suspend = pxa25x_udc_suspend, .resume = pxa25x_udc_resume, .driver = { @@ -2303,7 +2304,7 @@ static struct platform_driver udc_driver = { }, }; -module_platform_driver_probe(udc_driver, pxa25x_udc_probe); +module_platform_driver(udc_driver); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_AUTHOR("Frank Becker, Robert Schwebel, David Brownell"); diff --git a/drivers/usb/gadget/s3c-hsotg.c b/drivers/usb/gadget/s3c-hsotg.c index 6bddf1aa2347..a8a99e4748d5 100644 --- a/drivers/usb/gadget/s3c-hsotg.c +++ b/drivers/usb/gadget/s3c-hsotg.c @@ -543,7 +543,7 @@ static int s3c_hsotg_write_fifo(struct s3c_hsotg *hsotg, * FIFO, requests of >512 cause the endpoint to get stuck with a * fragment of the end of the transfer in it. */ - if (can_write > 512) + if (can_write > 512 && !periodic) can_write = 512; /* diff --git a/drivers/usb/host/ehci-fsl.c b/drivers/usb/host/ehci-fsl.c index 4449f565d6c6..f2407b2e8a99 100644 --- a/drivers/usb/host/ehci-fsl.c +++ b/drivers/usb/host/ehci-fsl.c @@ -130,7 +130,7 @@ static int usb_hcd_fsl_probe(const struct hc_driver *driver, } /* Enable USB controller, 83xx or 8536 */ - if (pdata->have_sysif_regs) + if (pdata->have_sysif_regs && pdata->controller_ver < FSL_USB_VER_1_6) setbits32(hcd->regs + FSL_SOC_USB_CTRL, 0x4); /* Don't need to set host mode here. It will be done by tdi_reset() */ @@ -232,15 +232,9 @@ static int ehci_fsl_setup_phy(struct usb_hcd *hcd, case FSL_USB2_PHY_ULPI: if (pdata->have_sysif_regs && pdata->controller_ver) { /* controller version 1.6 or above */ + clrbits32(non_ehci + FSL_SOC_USB_CTRL, UTMI_PHY_EN); setbits32(non_ehci + FSL_SOC_USB_CTRL, - ULPI_PHY_CLK_SEL); - /* - * Due to controller issue of PHY_CLK_VALID in ULPI - * mode, we set USB_CTRL_USB_EN before checking - * PHY_CLK_VALID, otherwise PHY_CLK_VALID doesn't work. - */ - clrsetbits_be32(non_ehci + FSL_SOC_USB_CTRL, - UTMI_PHY_EN, USB_CTRL_USB_EN); + ULPI_PHY_CLK_SEL | USB_CTRL_USB_EN); } portsc |= PORT_PTS_ULPI; break; @@ -270,8 +264,9 @@ static int ehci_fsl_setup_phy(struct usb_hcd *hcd, if (pdata->have_sysif_regs && pdata->controller_ver && (phy_mode == FSL_USB2_PHY_ULPI)) { /* check PHY_CLK_VALID to get phy clk valid */ - if (!spin_event_timeout(in_be32(non_ehci + FSL_SOC_USB_CTRL) & - PHY_CLK_VALID, FSL_USB_PHY_CLK_TIMEOUT, 0)) { + if (!(spin_event_timeout(in_be32(non_ehci + FSL_SOC_USB_CTRL) & + PHY_CLK_VALID, FSL_USB_PHY_CLK_TIMEOUT, 0) || + in_be32(non_ehci + FSL_SOC_USB_PRICTRL))) { printk(KERN_WARNING "fsl-ehci: USB PHY clock invalid\n"); return -EINVAL; } diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c index 6bd299e61f58..854c2ec7b699 100644 --- a/drivers/usb/host/ehci-pci.c +++ b/drivers/usb/host/ehci-pci.c @@ -361,7 +361,7 @@ static struct pci_driver ehci_pci_driver = { .remove = usb_hcd_pci_remove, .shutdown = usb_hcd_pci_shutdown, -#ifdef CONFIG_PM_SLEEP +#ifdef CONFIG_PM .driver = { .pm = &usb_hcd_pci_pm_ops }, diff --git a/drivers/usb/host/imx21-hcd.c b/drivers/usb/host/imx21-hcd.c index 60a5de505ca1..adb01d950a16 100644 --- a/drivers/usb/host/imx21-hcd.c +++ b/drivers/usb/host/imx21-hcd.c @@ -824,13 +824,13 @@ static int imx21_hc_urb_enqueue_isoc(struct usb_hcd *hcd, i = DIV_ROUND_UP(wrap_frame( cur_frame - urb->start_frame), urb->interval); - if (urb->transfer_flags & URB_ISO_ASAP) { + + /* Treat underruns as if URB_ISO_ASAP was set */ + if ((urb->transfer_flags & URB_ISO_ASAP) || + i >= urb->number_of_packets) { urb->start_frame = wrap_frame(urb->start_frame + i * urb->interval); i = 0; - } else if (i >= urb->number_of_packets) { - ret = -EXDEV; - goto alloc_dmem_failed; } } } diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c index 8f6b695af6a4..604cad1bcf9c 100644 --- a/drivers/usb/host/ohci-hcd.c +++ b/drivers/usb/host/ohci-hcd.c @@ -216,31 +216,26 @@ static int ohci_urb_enqueue ( frame &= ~(ed->interval - 1); frame |= ed->branch; urb->start_frame = frame; + ed->last_iso = frame + ed->interval * (size - 1); } } else if (ed->type == PIPE_ISOCHRONOUS) { u16 next = ohci_frame_no(ohci) + 1; u16 frame = ed->last_iso + ed->interval; + u16 length = ed->interval * (size - 1); /* Behind the scheduling threshold? */ if (unlikely(tick_before(frame, next))) { - /* USB_ISO_ASAP: Round up to the first available slot */ + /* URB_ISO_ASAP: Round up to the first available slot */ if (urb->transfer_flags & URB_ISO_ASAP) { frame += (next - frame + ed->interval - 1) & -ed->interval; /* - * Not ASAP: Use the next slot in the stream. If - * the entire URB falls before the threshold, fail. + * Not ASAP: Use the next slot in the stream, + * no matter what. */ } else { - if (tick_before(frame + ed->interval * - (urb->number_of_packets - 1), next)) { - retval = -EXDEV; - usb_hcd_unlink_urb_from_ep(hcd, urb); - goto fail; - } - /* * Some OHCI hardware doesn't handle late TDs * correctly. After retiring them it proceeds @@ -251,9 +246,16 @@ static int ohci_urb_enqueue ( urb_priv->td_cnt = DIV_ROUND_UP( (u16) (next - frame), ed->interval); + if (urb_priv->td_cnt >= urb_priv->length) { + ++urb_priv->td_cnt; /* Mark it */ + ohci_dbg(ohci, "iso underrun %p (%u+%u < %u)\n", + urb, frame, length, + next); + } } } urb->start_frame = frame; + ed->last_iso = frame + length; } /* fill the TDs and link them to the ed; and diff --git a/drivers/usb/host/ohci-q.c b/drivers/usb/host/ohci-q.c index df4a6707322d..e7f577e63624 100644 --- a/drivers/usb/host/ohci-q.c +++ b/drivers/usb/host/ohci-q.c @@ -41,9 +41,13 @@ finish_urb(struct ohci_hcd *ohci, struct urb *urb, int status) __releases(ohci->lock) __acquires(ohci->lock) { - struct device *dev = ohci_to_hcd(ohci)->self.controller; + struct device *dev = ohci_to_hcd(ohci)->self.controller; + struct usb_host_endpoint *ep = urb->ep; + struct urb_priv *urb_priv; + // ASSERT (urb->hcpriv != 0); + restart: urb_free_priv (ohci, urb->hcpriv); urb->hcpriv = NULL; if (likely(status == -EINPROGRESS)) @@ -80,6 +84,21 @@ __acquires(ohci->lock) ohci->hc_control &= ~(OHCI_CTRL_PLE|OHCI_CTRL_IE); ohci_writel (ohci, ohci->hc_control, &ohci->regs->control); } + + /* + * An isochronous URB that is sumitted too late won't have any TDs + * (marked by the fact that the td_cnt value is larger than the + * actual number of TDs). If the next URB on this endpoint is like + * that, give it back now. + */ + if (!list_empty(&ep->urb_list)) { + urb = list_first_entry(&ep->urb_list, struct urb, urb_list); + urb_priv = urb->hcpriv; + if (urb_priv->td_cnt > urb_priv->length) { + status = 0; + goto restart; + } + } } @@ -546,7 +565,6 @@ td_fill (struct ohci_hcd *ohci, u32 info, td->hwCBP = cpu_to_hc32 (ohci, data & 0xFFFFF000); *ohci_hwPSWp(ohci, td, 0) = cpu_to_hc16 (ohci, (data & 0x0FFF) | 0xE000); - td->ed->last_iso = info & 0xffff; } else { td->hwCBP = cpu_to_hc32 (ohci, data); } @@ -996,7 +1014,7 @@ rescan_this: urb_priv->td_cnt++; /* if URB is done, clean up */ - if (urb_priv->td_cnt == urb_priv->length) { + if (urb_priv->td_cnt >= urb_priv->length) { modified = completed = 1; finish_urb(ohci, urb, 0); } @@ -1086,7 +1104,7 @@ static void takeback_td(struct ohci_hcd *ohci, struct td *td) urb_priv->td_cnt++; /* If all this urb's TDs are done, call complete() */ - if (urb_priv->td_cnt == urb_priv->length) + if (urb_priv->td_cnt >= urb_priv->length) finish_urb(ohci, urb, status); /* clean schedule: unlink EDs that are no longer busy */ diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c index 2c76ef1320ea..08ef2829a7e2 100644 --- a/drivers/usb/host/pci-quirks.c +++ b/drivers/usb/host/pci-quirks.c @@ -799,7 +799,7 @@ void usb_enable_intel_xhci_ports(struct pci_dev *xhci_pdev) * switchable ports. */ pci_write_config_dword(xhci_pdev, USB_INTEL_USB3_PSSEN, - cpu_to_le32(ports_available)); + ports_available); pci_read_config_dword(xhci_pdev, USB_INTEL_USB3_PSSEN, &ports_available); @@ -821,7 +821,7 @@ void usb_enable_intel_xhci_ports(struct pci_dev *xhci_pdev) * host. */ pci_write_config_dword(xhci_pdev, USB_INTEL_XUSB2PR, - cpu_to_le32(ports_available)); + ports_available); pci_read_config_dword(xhci_pdev, USB_INTEL_XUSB2PR, &ports_available); diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c index c300bd2f7d1c..0f228c46eeda 100644 --- a/drivers/usb/host/uhci-pci.c +++ b/drivers/usb/host/uhci-pci.c @@ -293,7 +293,7 @@ static struct pci_driver uhci_pci_driver = { .remove = usb_hcd_pci_remove, .shutdown = uhci_shutdown, -#ifdef CONFIG_PM_SLEEP +#ifdef CONFIG_PM .driver = { .pm = &usb_hcd_pci_pm_ops }, diff --git a/drivers/usb/host/uhci-q.c b/drivers/usb/host/uhci-q.c index 041c6ddb695c..da6f56d996ce 100644 --- a/drivers/usb/host/uhci-q.c +++ b/drivers/usb/host/uhci-q.c @@ -1303,7 +1303,7 @@ static int uhci_submit_isochronous(struct uhci_hcd *uhci, struct urb *urb, } /* Fell behind? */ - if (uhci_frame_before_eq(frame, next)) { + if (!uhci_frame_before_eq(next, frame)) { /* USB_ISO_ASAP: Round up to the first available slot */ if (urb->transfer_flags & URB_ISO_ASAP) @@ -1311,13 +1311,17 @@ static int uhci_submit_isochronous(struct uhci_hcd *uhci, struct urb *urb, -qh->period; /* - * Not ASAP: Use the next slot in the stream. If - * the entire URB falls before the threshold, fail. + * Not ASAP: Use the next slot in the stream, + * no matter what. */ else if (!uhci_frame_before_eq(next, frame + (urb->number_of_packets - 1) * qh->period)) - return -EXDEV; + dev_dbg(uhci_dev(uhci), "iso underrun %p (%u+%u < %u)\n", + urb, frame, + (urb->number_of_packets - 1) * + qh->period, + next); } } diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index fae697ed0b70..e8b4c56dcf62 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -287,7 +287,7 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend) if (virt_dev->eps[i].ring && virt_dev->eps[i].ring->dequeue) xhci_queue_stop_endpoint(xhci, slot_id, i, suspend); } - cmd->command_trb = xhci->cmd_ring->enqueue; + cmd->command_trb = xhci_find_next_enqueue(xhci->cmd_ring); list_add_tail(&cmd->cmd_list, &virt_dev->cmd_list); xhci_queue_stop_endpoint(xhci, slot_id, 0, suspend); xhci_ring_cmd_db(xhci); @@ -552,11 +552,15 @@ void xhci_del_comp_mod_timer(struct xhci_hcd *xhci, u32 status, u16 wIndex) * - Mark a port as being done with device resume, * and ring the endpoint doorbells. * - Stop the Synopsys redriver Compliance Mode polling. + * - Drop and reacquire the xHCI lock, in order to wait for port resume. */ static u32 xhci_get_port_status(struct usb_hcd *hcd, struct xhci_bus_state *bus_state, __le32 __iomem **port_array, - u16 wIndex, u32 raw_port_status) + u16 wIndex, u32 raw_port_status, + unsigned long flags) + __releases(&xhci->lock) + __acquires(&xhci->lock) { struct xhci_hcd *xhci = hcd_to_xhci(hcd); u32 status = 0; @@ -591,21 +595,42 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd, return 0xffffffff; if (time_after_eq(jiffies, bus_state->resume_done[wIndex])) { + int time_left; + xhci_dbg(xhci, "Resume USB2 port %d\n", wIndex + 1); bus_state->resume_done[wIndex] = 0; clear_bit(wIndex, &bus_state->resuming_ports); + + set_bit(wIndex, &bus_state->rexit_ports); xhci_set_link_state(xhci, port_array, wIndex, XDEV_U0); - xhci_dbg(xhci, "set port %d resume\n", - wIndex + 1); - slot_id = xhci_find_slot_id_by_port(hcd, xhci, - wIndex + 1); - if (!slot_id) { - xhci_dbg(xhci, "slot_id is zero\n"); - return 0xffffffff; + + spin_unlock_irqrestore(&xhci->lock, flags); + time_left = wait_for_completion_timeout( + &bus_state->rexit_done[wIndex], + msecs_to_jiffies( + XHCI_MAX_REXIT_TIMEOUT)); + spin_lock_irqsave(&xhci->lock, flags); + + if (time_left) { + slot_id = xhci_find_slot_id_by_port(hcd, + xhci, wIndex + 1); + if (!slot_id) { + xhci_dbg(xhci, "slot_id is zero\n"); + return 0xffffffff; + } + xhci_ring_device(xhci, slot_id); + } else { + int port_status = xhci_readl(xhci, + port_array[wIndex]); + xhci_warn(xhci, "Port resume took longer than %i msec, port status = 0x%x\n", + XHCI_MAX_REXIT_TIMEOUT, + port_status); + status |= USB_PORT_STAT_SUSPEND; + clear_bit(wIndex, &bus_state->rexit_ports); } - xhci_ring_device(xhci, slot_id); + bus_state->port_c_suspend |= 1 << wIndex; bus_state->suspended_ports &= ~(1 << wIndex); } else { @@ -728,7 +753,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, break; } status = xhci_get_port_status(hcd, bus_state, port_array, - wIndex, temp); + wIndex, temp, flags); if (status == 0xffffffff) goto error; @@ -1132,18 +1157,6 @@ int xhci_bus_suspend(struct usb_hcd *hcd) t1 = xhci_port_state_to_neutral(t1); if (t1 != t2) xhci_writel(xhci, t2, port_array[port_index]); - - if (hcd->speed != HCD_USB3) { - /* enable remote wake up for USB 2.0 */ - __le32 __iomem *addr; - u32 tmp; - - /* Get the port power control register address. */ - addr = port_array[port_index] + PORTPMSC; - tmp = xhci_readl(xhci, addr); - tmp |= PORT_RWE; - xhci_writel(xhci, tmp, addr); - } } hcd->state = HC_STATE_SUSPENDED; bus_state->next_statechange = jiffies + msecs_to_jiffies(10); @@ -1222,20 +1235,6 @@ int xhci_bus_resume(struct usb_hcd *hcd) xhci_ring_device(xhci, slot_id); } else xhci_writel(xhci, temp, port_array[port_index]); - - if (hcd->speed != HCD_USB3) { - /* disable remote wake up for USB 2.0 */ - __le32 __iomem *addr; - u32 tmp; - - /* Add one to the port status register address to get - * the port power control register address. - */ - addr = port_array[port_index] + PORTPMSC; - tmp = xhci_readl(xhci, addr); - tmp &= ~PORT_RWE; - xhci_writel(xhci, tmp, addr); - } } (void) xhci_readl(xhci, &xhci->op_regs->command); diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 53b972c2a09f..83bcd13622c3 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -2428,6 +2428,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags) for (i = 0; i < USB_MAXCHILDREN; ++i) { xhci->bus_state[0].resume_done[i] = 0; xhci->bus_state[1].resume_done[i] = 0; + /* Only the USB 2.0 completions will ever be used. */ + init_completion(&xhci->bus_state[1].rexit_done[i]); } if (scratchpad_alloc(xhci, flags)) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index c2d495057eb5..b8dffd59eb25 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -35,6 +35,9 @@ #define PCI_VENDOR_ID_ETRON 0x1b6f #define PCI_DEVICE_ID_ASROCK_P67 0x7023 +#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31 +#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 + static const char hcd_name[] = "xhci_hcd"; /* called after powerup, by probe or system-pm "wakeup" */ @@ -69,6 +72,14 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) "QUIRK: Fresco Logic xHC needs configure" " endpoint cmd after reset endpoint"); } + if (pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK && + pdev->revision == 0x4) { + xhci->quirks |= XHCI_SLOW_SUSPEND; + xhci_dbg_trace(xhci, trace_xhci_dbg_quirks, + "QUIRK: Fresco Logic xHC revision %u" + "must be suspended extra slowly", + pdev->revision); + } /* Fresco Logic confirms: all revisions of this chip do not * support MSI, even though some of them claim to in their PCI * capabilities. @@ -110,6 +121,15 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_SPURIOUS_REBOOT; xhci->quirks |= XHCI_AVOID_BEI; } + if (pdev->vendor == PCI_VENDOR_ID_INTEL && + (pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI || + pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI)) { + /* Workaround for occasional spurious wakeups from S5 (or + * any other sleep) on Haswell machines with LPT and LPT-LP + * with the new Intel BIOS + */ + xhci->quirks |= XHCI_SPURIOUS_WAKEUP; + } if (pdev->vendor == PCI_VENDOR_ID_ETRON && pdev->device == PCI_DEVICE_ID_ASROCK_P67) { xhci->quirks |= XHCI_RESET_ON_RESUME; @@ -217,6 +237,11 @@ static void xhci_pci_remove(struct pci_dev *dev) usb_put_hcd(xhci->shared_hcd); } usb_hcd_pci_remove(dev); + + /* Workaround for spurious wakeups at shutdown with HSW */ + if (xhci->quirks & XHCI_SPURIOUS_WAKEUP) + pci_set_power_state(dev, PCI_D3hot); + kfree(xhci); } @@ -351,7 +376,7 @@ static struct pci_driver xhci_pci_driver = { /* suspend and resume implemented later */ .shutdown = usb_hcd_pci_shutdown, -#ifdef CONFIG_PM_SLEEP +#ifdef CONFIG_PM .driver = { .pm = &usb_hcd_pci_pm_ops }, diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 411da1fc7ae8..6bfbd80ec2b9 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -123,6 +123,16 @@ static int enqueue_is_link_trb(struct xhci_ring *ring) return TRB_TYPE_LINK_LE32(link->control); } +union xhci_trb *xhci_find_next_enqueue(struct xhci_ring *ring) +{ + /* Enqueue pointer can be left pointing to the link TRB, + * we must handle that + */ + if (TRB_TYPE_LINK_LE32(ring->enqueue->link.control)) + return ring->enq_seg->next->trbs; + return ring->enqueue; +} + /* Updates trb to point to the next TRB in the ring, and updates seg if the next * TRB is in a new segment. This does not skip over link TRBs, and it does not * effect the ring dequeue or enqueue pointers. @@ -859,8 +869,12 @@ remove_finished_td: /* Otherwise ring the doorbell(s) to restart queued transfers */ ring_doorbell_for_active_rings(xhci, slot_id, ep_index); } - ep->stopped_td = NULL; - ep->stopped_trb = NULL; + + /* Clear stopped_td and stopped_trb if endpoint is not halted */ + if (!(ep->ep_state & EP_HALTED)) { + ep->stopped_td = NULL; + ep->stopped_trb = NULL; + } /* * Drop the lock and complete the URBs in the cancelled TD list. @@ -1414,6 +1428,12 @@ static void handle_cmd_completion(struct xhci_hcd *xhci, inc_deq(xhci, xhci->cmd_ring); return; } + /* There is no command to handle if we get a stop event when the + * command ring is empty, event->cmd_trb points to the next + * unset command + */ + if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue) + return; } switch (le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3]) @@ -1743,6 +1763,19 @@ static void handle_port_status(struct xhci_hcd *xhci, } } + /* + * Check to see if xhci-hub.c is waiting on RExit to U0 transition (or + * RExit to a disconnect state). If so, let the the driver know it's + * out of the RExit state. + */ + if (!DEV_SUPERSPEED(temp) && + test_and_clear_bit(faked_port_index, + &bus_state->rexit_ports)) { + complete(&bus_state->rexit_done[faked_port_index]); + bogus_port_status = true; + goto cleanup; + } + if (hcd->speed != HCD_USB3) xhci_test_and_clear_bit(xhci, port_array, faked_port_index, PORT_PLC); diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 49b6edb84a79..6e0d886bcce5 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -730,6 +730,9 @@ void xhci_shutdown(struct usb_hcd *hcd) spin_lock_irq(&xhci->lock); xhci_halt(xhci); + /* Workaround for spurious wakeups at shutdown with HSW */ + if (xhci->quirks & XHCI_SPURIOUS_WAKEUP) + xhci_reset(xhci); spin_unlock_irq(&xhci->lock); xhci_cleanup_msix(xhci); @@ -737,6 +740,10 @@ void xhci_shutdown(struct usb_hcd *hcd) xhci_dbg_trace(xhci, trace_xhci_dbg_init, "xhci_shutdown completed - status = %x", xhci_readl(xhci, &xhci->op_regs->status)); + + /* Yet another workaround for spurious wakeups at shutdown with HSW */ + if (xhci->quirks & XHCI_SPURIOUS_WAKEUP) + pci_set_power_state(to_pci_dev(hcd->self.controller), PCI_D3hot); } #ifdef CONFIG_PM @@ -839,6 +846,7 @@ static void xhci_clear_command_ring(struct xhci_hcd *xhci) int xhci_suspend(struct xhci_hcd *xhci) { int rc = 0; + unsigned int delay = XHCI_MAX_HALT_USEC; struct usb_hcd *hcd = xhci_to_hcd(xhci); u32 command; @@ -861,8 +869,12 @@ int xhci_suspend(struct xhci_hcd *xhci) command = xhci_readl(xhci, &xhci->op_regs->command); command &= ~CMD_RUN; xhci_writel(xhci, command, &xhci->op_regs->command); + + /* Some chips from Fresco Logic need an extraordinary delay */ + delay *= (xhci->quirks & XHCI_SLOW_SUSPEND) ? 10 : 1; + if (xhci_handshake(xhci, &xhci->op_regs->status, - STS_HALT, STS_HALT, XHCI_MAX_HALT_USEC)) { + STS_HALT, STS_HALT, delay)) { xhci_warn(xhci, "WARN: xHC CMD_RUN timeout\n"); spin_unlock_irq(&xhci->lock); return -ETIMEDOUT; @@ -2598,15 +2610,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci, if (command) { cmd_completion = command->completion; cmd_status = &command->status; - command->command_trb = xhci->cmd_ring->enqueue; - - /* Enqueue pointer can be left pointing to the link TRB, - * we must handle that - */ - if (TRB_TYPE_LINK_LE32(command->command_trb->link.control)) - command->command_trb = - xhci->cmd_ring->enq_seg->next->trbs; - + command->command_trb = xhci_find_next_enqueue(xhci->cmd_ring); list_add_tail(&command->cmd_list, &virt_dev->cmd_list); } else { cmd_completion = &virt_dev->cmd_completion; @@ -2614,7 +2618,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci, } init_completion(cmd_completion); - cmd_trb = xhci->cmd_ring->dequeue; + cmd_trb = xhci_find_next_enqueue(xhci->cmd_ring); if (!ctx_change) ret = xhci_queue_configure_endpoint(xhci, in_ctx->dma, udev->slot_id, must_succeed); @@ -3439,14 +3443,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, struct usb_device *udev) /* Attempt to submit the Reset Device command to the command ring */ spin_lock_irqsave(&xhci->lock, flags); - reset_device_cmd->command_trb = xhci->cmd_ring->enqueue; - - /* Enqueue pointer can be left pointing to the link TRB, - * we must handle that - */ - if (TRB_TYPE_LINK_LE32(reset_device_cmd->command_trb->link.control)) - reset_device_cmd->command_trb = - xhci->cmd_ring->enq_seg->next->trbs; + reset_device_cmd->command_trb = xhci_find_next_enqueue(xhci->cmd_ring); list_add_tail(&reset_device_cmd->cmd_list, &virt_dev->cmd_list); ret = xhci_queue_reset_device(xhci, slot_id); @@ -3650,7 +3647,7 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev) union xhci_trb *cmd_trb; spin_lock_irqsave(&xhci->lock, flags); - cmd_trb = xhci->cmd_ring->dequeue; + cmd_trb = xhci_find_next_enqueue(xhci->cmd_ring); ret = xhci_queue_slot_control(xhci, TRB_ENABLE_SLOT, 0); if (ret) { spin_unlock_irqrestore(&xhci->lock, flags); @@ -3785,7 +3782,7 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev) slot_ctx->dev_info >> 27); spin_lock_irqsave(&xhci->lock, flags); - cmd_trb = xhci->cmd_ring->dequeue; + cmd_trb = xhci_find_next_enqueue(xhci->cmd_ring); ret = xhci_queue_address_device(xhci, virt_dev->in_ctx->dma, udev->slot_id); if (ret) { diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 46aa14894148..941d5f59e4dc 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1412,8 +1412,18 @@ struct xhci_bus_state { unsigned long resume_done[USB_MAXCHILDREN]; /* which ports have started to resume */ unsigned long resuming_ports; + /* Which ports are waiting on RExit to U0 transition. */ + unsigned long rexit_ports; + struct completion rexit_done[USB_MAXCHILDREN]; }; + +/* + * It can take up to 20 ms to transition from RExit to U0 on the + * Intel Lynx Point LP xHCI host. + */ +#define XHCI_MAX_REXIT_TIMEOUT (20 * 1000) + static inline unsigned int hcd_index(struct usb_hcd *hcd) { if (hcd->speed == HCD_USB3) @@ -1538,6 +1548,8 @@ struct xhci_hcd { #define XHCI_COMP_MODE_QUIRK (1 << 14) #define XHCI_AVOID_BEI (1 << 15) #define XHCI_PLAT (1 << 16) +#define XHCI_SLOW_SUSPEND (1 << 17) +#define XHCI_SPURIOUS_WAKEUP (1 << 18) unsigned int num_active_eps; unsigned int limit_active_eps; /* There are two roothubs to keep track of bus suspend info for */ @@ -1840,6 +1852,7 @@ int xhci_cancel_cmd(struct xhci_hcd *xhci, struct xhci_command *command, union xhci_trb *cmd_trb); void xhci_ring_ep_doorbell(struct xhci_hcd *xhci, unsigned int slot_id, unsigned int ep_index, unsigned int stream_id); +union xhci_trb *xhci_find_next_enqueue(struct xhci_ring *ring); /* xHCI roothub code */ void xhci_set_link_state(struct xhci_hcd *xhci, __le32 __iomem **port_array, diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig index e2b21c1d9c40..ba5f70f92888 100644 --- a/drivers/usb/misc/Kconfig +++ b/drivers/usb/misc/Kconfig @@ -246,6 +246,6 @@ config USB_EZUSB_FX2 config USB_HSIC_USB3503 tristate "USB3503 HSIC to USB20 Driver" depends on I2C - select REGMAP + select REGMAP_I2C help This option enables support for SMSC USB3503 HSIC to USB 2.0 Driver. diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c index 18e877ffe7b7..cd70cc886171 100644 --- a/drivers/usb/musb/musb_core.c +++ b/drivers/usb/musb/musb_core.c @@ -921,6 +921,52 @@ static void musb_generic_disable(struct musb *musb) } +/* + * Program the HDRC to start (enable interrupts, dma, etc.). + */ +void musb_start(struct musb *musb) +{ + void __iomem *regs = musb->mregs; + u8 devctl = musb_readb(regs, MUSB_DEVCTL); + + dev_dbg(musb->controller, "<== devctl %02x\n", devctl); + + /* Set INT enable registers, enable interrupts */ + musb->intrtxe = musb->epmask; + musb_writew(regs, MUSB_INTRTXE, musb->intrtxe); + musb->intrrxe = musb->epmask & 0xfffe; + musb_writew(regs, MUSB_INTRRXE, musb->intrrxe); + musb_writeb(regs, MUSB_INTRUSBE, 0xf7); + + musb_writeb(regs, MUSB_TESTMODE, 0); + + /* put into basic highspeed mode and start session */ + musb_writeb(regs, MUSB_POWER, MUSB_POWER_ISOUPDATE + | MUSB_POWER_HSENAB + /* ENSUSPEND wedges tusb */ + /* | MUSB_POWER_ENSUSPEND */ + ); + + musb->is_active = 0; + devctl = musb_readb(regs, MUSB_DEVCTL); + devctl &= ~MUSB_DEVCTL_SESSION; + + /* session started after: + * (a) ID-grounded irq, host mode; + * (b) vbus present/connect IRQ, peripheral mode; + * (c) peripheral initiates, using SRP + */ + if (musb->port_mode != MUSB_PORT_MODE_HOST && + (devctl & MUSB_DEVCTL_VBUS) == MUSB_DEVCTL_VBUS) { + musb->is_active = 1; + } else { + devctl |= MUSB_DEVCTL_SESSION; + } + + musb_platform_enable(musb); + musb_writeb(regs, MUSB_DEVCTL, devctl); +} + /* * Make the HDRC stop (disable interrupts, etc.); * reversible by musb_start diff --git a/drivers/usb/musb/musb_core.h b/drivers/usb/musb/musb_core.h index 65f3917b4fc5..1c5bf75ee8ff 100644 --- a/drivers/usb/musb/musb_core.h +++ b/drivers/usb/musb/musb_core.h @@ -503,6 +503,7 @@ static inline void musb_configure_ep0(struct musb *musb) extern const char musb_driver_name[]; extern void musb_stop(struct musb *musb); +extern void musb_start(struct musb *musb); extern void musb_write_fifo(struct musb_hw_ep *ep, u16 len, const u8 *src); extern void musb_read_fifo(struct musb_hw_ep *ep, u16 len, u8 *dst); diff --git a/drivers/usb/musb/musb_dsps.c b/drivers/usb/musb/musb_dsps.c index 4047cbb91bac..bd4138d80a48 100644 --- a/drivers/usb/musb/musb_dsps.c +++ b/drivers/usb/musb/musb_dsps.c @@ -535,6 +535,9 @@ static int dsps_probe(struct platform_device *pdev) struct dsps_glue *glue; int ret; + if (!strcmp(pdev->name, "musb-hdrc")) + return -ENODEV; + match = of_match_node(musb_dsps_of_match, pdev->dev.of_node); if (!match) { dev_err(&pdev->dev, "fail to get matching of_match struct\n"); diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c index 9a08679d204d..3671898a4535 100644 --- a/drivers/usb/musb/musb_gadget.c +++ b/drivers/usb/musb/musb_gadget.c @@ -1790,6 +1790,10 @@ int musb_gadget_setup(struct musb *musb) musb->g.max_speed = USB_SPEED_HIGH; musb->g.speed = USB_SPEED_UNKNOWN; + MUSB_DEV_MODE(musb); + musb->xceiv->otg->default_a = 0; + musb->xceiv->state = OTG_STATE_B_IDLE; + /* this "gadget" abstracts/virtualizes the controller */ musb->g.name = musb_driver_name; musb->g.is_otg = 1; @@ -1855,6 +1859,8 @@ static int musb_gadget_start(struct usb_gadget *g, musb->xceiv->state = OTG_STATE_B_IDLE; spin_unlock_irqrestore(&musb->lock, flags); + musb_start(musb); + /* REVISIT: funcall to other code, which also * handles power budgeting ... this way also * ensures HdrcStart is indirectly called. diff --git a/drivers/usb/musb/musb_virthub.c b/drivers/usb/musb/musb_virthub.c index a523950c2b32..d1d6b83aabca 100644 --- a/drivers/usb/musb/musb_virthub.c +++ b/drivers/usb/musb/musb_virthub.c @@ -44,52 +44,6 @@ #include "musb_core.h" -/* -* Program the HDRC to start (enable interrupts, dma, etc.). -*/ -static void musb_start(struct musb *musb) -{ - void __iomem *regs = musb->mregs; - u8 devctl = musb_readb(regs, MUSB_DEVCTL); - - dev_dbg(musb->controller, "<== devctl %02x\n", devctl); - - /* Set INT enable registers, enable interrupts */ - musb->intrtxe = musb->epmask; - musb_writew(regs, MUSB_INTRTXE, musb->intrtxe); - musb->intrrxe = musb->epmask & 0xfffe; - musb_writew(regs, MUSB_INTRRXE, musb->intrrxe); - musb_writeb(regs, MUSB_INTRUSBE, 0xf7); - - musb_writeb(regs, MUSB_TESTMODE, 0); - - /* put into basic highspeed mode and start session */ - musb_writeb(regs, MUSB_POWER, MUSB_POWER_ISOUPDATE - | MUSB_POWER_HSENAB - /* ENSUSPEND wedges tusb */ - /* | MUSB_POWER_ENSUSPEND */ - ); - - musb->is_active = 0; - devctl = musb_readb(regs, MUSB_DEVCTL); - devctl &= ~MUSB_DEVCTL_SESSION; - - /* session started after: - * (a) ID-grounded irq, host mode; - * (b) vbus present/connect IRQ, peripheral mode; - * (c) peripheral initiates, using SRP - */ - if (musb->port_mode != MUSB_PORT_MODE_HOST && - (devctl & MUSB_DEVCTL_VBUS) == MUSB_DEVCTL_VBUS) { - musb->is_active = 1; - } else { - devctl |= MUSB_DEVCTL_SESSION; - } - - musb_platform_enable(musb); - musb_writeb(regs, MUSB_DEVCTL, devctl); -} - static void musb_port_suspend(struct musb *musb, bool do_suspend) { struct usb_otg *otg = musb->xceiv->otg; diff --git a/drivers/usb/phy/phy-gpio-vbus-usb.c b/drivers/usb/phy/phy-gpio-vbus-usb.c index b2f29c9aebbf..02799a5efcd4 100644 --- a/drivers/usb/phy/phy-gpio-vbus-usb.c +++ b/drivers/usb/phy/phy-gpio-vbus-usb.c @@ -241,7 +241,7 @@ static int gpio_vbus_set_suspend(struct usb_phy *phy, int suspend) /* platform driver interface */ -static int __init gpio_vbus_probe(struct platform_device *pdev) +static int gpio_vbus_probe(struct platform_device *pdev) { struct gpio_vbus_mach_info *pdata = dev_get_platdata(&pdev->dev); struct gpio_vbus_data *gpio_vbus; @@ -349,7 +349,7 @@ err_gpio: return err; } -static int __exit gpio_vbus_remove(struct platform_device *pdev) +static int gpio_vbus_remove(struct platform_device *pdev) { struct gpio_vbus_data *gpio_vbus = platform_get_drvdata(pdev); struct gpio_vbus_mach_info *pdata = dev_get_platdata(&pdev->dev); @@ -398,8 +398,6 @@ static const struct dev_pm_ops gpio_vbus_dev_pm_ops = { }; #endif -/* NOTE: the gpio-vbus device may *NOT* be hotplugged */ - MODULE_ALIAS("platform:gpio-vbus"); static struct platform_driver gpio_vbus_driver = { @@ -410,10 +408,11 @@ static struct platform_driver gpio_vbus_driver = { .pm = &gpio_vbus_dev_pm_ops, #endif }, - .remove = __exit_p(gpio_vbus_remove), + .probe = gpio_vbus_probe, + .remove = gpio_vbus_remove, }; -module_platform_driver_probe(gpio_vbus_driver, gpio_vbus_probe); +module_platform_driver(gpio_vbus_driver); MODULE_DESCRIPTION("simple GPIO controlled OTG transceiver driver"); MODULE_AUTHOR("Philipp Zabel"); diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 1cf6f125f5f0..acaee066b99a 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -81,6 +81,7 @@ static void option_instat_callback(struct urb *urb); #define HUAWEI_VENDOR_ID 0x12D1 #define HUAWEI_PRODUCT_E173 0x140C +#define HUAWEI_PRODUCT_E1750 0x1406 #define HUAWEI_PRODUCT_K4505 0x1464 #define HUAWEI_PRODUCT_K3765 0x1465 #define HUAWEI_PRODUCT_K4605 0x14C6 @@ -450,6 +451,10 @@ static void option_instat_callback(struct urb *urb); #define CHANGHONG_VENDOR_ID 0x2077 #define CHANGHONG_PRODUCT_CH690 0x7001 +/* Inovia */ +#define INOVIA_VENDOR_ID 0x20a6 +#define INOVIA_SEW858 0x1105 + /* some devices interfaces need special handling due to a number of reasons */ enum option_blacklist_reason { OPTION_BLACKLIST_NONE = 0, @@ -567,6 +572,8 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0x1c23, USB_CLASS_COMM, 0x02, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_E173, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t) &net_intf1_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_E1750, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t) &net_intf2_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0x1441, USB_CLASS_COMM, 0x02, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0x1442, USB_CLASS_COMM, 0x02, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_K4505, 0xff, 0xff, 0xff), @@ -686,6 +693,222 @@ static const struct usb_device_id option_ids[] = { { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x02, 0x7A) }, { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x02, 0x7B) }, { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x02, 0x7C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x01) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x02) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x03) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x04) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x05) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x06) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x0A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x0B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x0D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x0E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x0F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x10) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x12) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x13) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x14) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x15) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x17) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x18) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x19) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x1A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x1B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x1C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x31) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x32) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x33) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x34) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x35) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x36) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x3A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x3B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x3D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x3E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x3F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x48) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x49) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x4A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x4B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x4C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x61) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x62) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x63) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x64) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x65) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x66) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x6A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x6B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x6D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x6E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x6F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x78) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x79) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x7A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x7B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x03, 0x7C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x01) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x02) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x03) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x04) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x05) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x06) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x0A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x0B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x0D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x0E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x0F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x10) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x12) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x13) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x14) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x15) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x17) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x18) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x19) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x1A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x1B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x1C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x31) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x32) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x33) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x34) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x35) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x36) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x3A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x3B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x3D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x3E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x3F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x48) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x49) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x4A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x4B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x4C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x61) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x62) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x63) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x64) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x65) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x66) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x6A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x6B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x6D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x6E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x6F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x78) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x79) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x7A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x7B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x04, 0x7C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x01) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x02) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x03) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x04) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x05) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x06) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x0A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x0B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x0D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x0E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x0F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x10) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x12) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x13) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x14) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x15) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x17) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x18) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x19) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x1A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x1B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x1C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x31) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x32) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x33) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x34) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x35) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x36) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x3A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x3B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x3D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x3E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x3F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x48) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x49) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x4A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x4B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x4C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x61) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x62) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x63) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x64) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x65) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x66) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x6A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x6B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x6D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x6E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x6F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x78) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x79) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x7A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x7B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x05, 0x7C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x01) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x02) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x03) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x04) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x05) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x06) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x0A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x0B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x0D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x0E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x0F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x10) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x12) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x13) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x14) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x15) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x17) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x18) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x19) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x1A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x1B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x1C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x31) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x32) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x33) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x34) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x35) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x36) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x3A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x3B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x3D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x3E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x3F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x48) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x49) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x4A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x4B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x4C) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x61) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x62) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x63) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x64) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x65) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x66) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x6A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x6B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x6D) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x6E) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x6F) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x78) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x79) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x7A) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x7B) }, + { USB_VENDOR_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, 0xff, 0x06, 0x7C) }, { USB_DEVICE(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_V640) }, @@ -1254,7 +1477,9 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD100) }, { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD145) }, - { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD200) }, + { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD200), + .driver_info = (kernel_ulong_t)&net_intf6_blacklist + }, { USB_DEVICE(CELOT_VENDOR_ID, CELOT_PRODUCT_CT680M) }, /* CT-650 CDMA 450 1xEVDO modem */ { USB_DEVICE_AND_INTERFACE_INFO(SAMSUNG_VENDOR_ID, SAMSUNG_PRODUCT_GT_B3730, USB_CLASS_CDC_DATA, 0x00, 0x00) }, /* Samsung GT-B3730 LTE USB modem.*/ { USB_DEVICE(YUGA_VENDOR_ID, YUGA_PRODUCT_CEM600) }, @@ -1342,6 +1567,7 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(0x2001, 0x7d03, 0xff, 0x00, 0x00) }, { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e01, 0xff, 0xff, 0xff) }, /* D-Link DWM-152/C1 */ { USB_DEVICE_AND_INTERFACE_INFO(0x07d1, 0x3e02, 0xff, 0xff, 0xff) }, /* D-Link DWM-156/C1 */ + { USB_DEVICE(INOVIA_VENDOR_ID, INOVIA_SEW858) }, { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, option_ids); diff --git a/drivers/usb/serial/ti_usb_3410_5052.c b/drivers/usb/serial/ti_usb_3410_5052.c index 760b78560f67..c9a35697ebe9 100644 --- a/drivers/usb/serial/ti_usb_3410_5052.c +++ b/drivers/usb/serial/ti_usb_3410_5052.c @@ -190,6 +190,7 @@ static struct usb_device_id ti_id_table_combined[] = { { USB_DEVICE(IBM_VENDOR_ID, IBM_454B_PRODUCT_ID) }, { USB_DEVICE(IBM_VENDOR_ID, IBM_454C_PRODUCT_ID) }, { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_PRODUCT_ID) }, + { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) }, { USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) }, { } /* terminator */ }; diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index 94d75edef77f..18509e6c21ab 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -211,8 +211,11 @@ static int slave_configure(struct scsi_device *sdev) /* * Many devices do not respond properly to READ_CAPACITY_16. * Tell the SCSI layer to try READ_CAPACITY_10 first. + * However some USB 3.0 drive enclosures return capacity + * modulo 2TB. Those must use READ_CAPACITY_16 */ - sdev->try_rc_10_first = 1; + if (!(us->fflags & US_FL_NEEDS_CAP16)) + sdev->try_rc_10_first = 1; /* assume SPC3 or latter devices support sense size > 18 */ if (sdev->scsi_level > SCSI_SPC_2) diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h index c015f2c16729..de32cfa5bfa6 100644 --- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -1925,6 +1925,13 @@ UNUSUAL_DEV( 0x1652, 0x6600, 0x0201, 0x0201, USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_IGNORE_RESIDUE ), +/* Reported by Oliver Neukum */ +UNUSUAL_DEV( 0x174c, 0x55aa, 0x0100, 0x0100, + "ASMedia", + "AS2105", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_NEEDS_CAP16), + /* Reported by Jesse Feddema */ UNUSUAL_DEV( 0x177f, 0x0400, 0x0000, 0x0000, "Yarvik", diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a9807dea3887..4fb7a8f83c8a 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -545,6 +545,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, long npage; int ret = 0, prot = 0; uint64_t mask; + struct vfio_dma *dma = NULL; + unsigned long pfn; end = map->iova + map->size; @@ -587,8 +589,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, } for (iova = map->iova; iova < end; iova += size, vaddr += size) { - struct vfio_dma *dma = NULL; - unsigned long pfn; long i; /* Pin a contiguous chunk of memory */ @@ -597,16 +597,15 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, if (npage <= 0) { WARN_ON(!npage); ret = (int)npage; - break; + goto out; } /* Verify pages are not already mapped */ for (i = 0; i < npage; i++) { if (iommu_iova_to_phys(iommu->domain, iova + (i << PAGE_SHIFT))) { - vfio_unpin_pages(pfn, npage, prot, true); ret = -EBUSY; - break; + goto out_unpin; } } @@ -616,8 +615,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, if (ret) { if (ret != -EBUSY || map_try_harder(iommu, iova, pfn, npage, prot)) { - vfio_unpin_pages(pfn, npage, prot, true); - break; + goto out_unpin; } } @@ -672,9 +670,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, dma = kzalloc(sizeof(*dma), GFP_KERNEL); if (!dma) { iommu_unmap(iommu->domain, iova, size); - vfio_unpin_pages(pfn, npage, prot, true); ret = -ENOMEM; - break; + goto out_unpin; } dma->size = size; @@ -685,16 +682,21 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, } } - if (ret) { - struct vfio_dma *tmp; - iova = map->iova; - size = map->size; - while ((tmp = vfio_find_dma(iommu, iova, size))) { - int r = vfio_remove_dma_overlap(iommu, iova, - &size, tmp); - if (WARN_ON(r || !size)) - break; - } + WARN_ON(ret); + mutex_unlock(&iommu->lock); + return ret; + +out_unpin: + vfio_unpin_pages(pfn, npage, prot, true); + +out: + iova = map->iova; + size = map->size; + while ((dma = vfio_find_dma(iommu, iova, size))) { + int r = vfio_remove_dma_overlap(iommu, iova, + &size, dma); + if (WARN_ON(r || !size)) + break; } mutex_unlock(&iommu->lock); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 592b31698fc8..ce5221fa393a 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -728,7 +728,12 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, } se_sess = tv_nexus->tvn_se_sess; - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, GFP_KERNEL); + tag = percpu_ida_alloc(&se_sess->sess_tag_pool, GFP_ATOMIC); + if (tag < 0) { + pr_err("Unable to obtain tag for tcm_vhost_cmd\n"); + return ERR_PTR(-ENOMEM); + } + cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[tag]; sg = cmd->tvc_sgl; pages = cmd->tvc_upages; diff --git a/drivers/video/mmp/hw/mmp_ctrl.c b/drivers/video/mmp/hw/mmp_ctrl.c index 75dca19bf214..6ac755270ab4 100644 --- a/drivers/video/mmp/hw/mmp_ctrl.c +++ b/drivers/video/mmp/hw/mmp_ctrl.c @@ -514,7 +514,7 @@ static int mmphw_probe(struct platform_device *pdev) if (IS_ERR(ctrl->clk)) { dev_err(ctrl->dev, "unable to get clk %s\n", mi->clk_name); ret = -ENOENT; - goto failed_get_clk; + goto failed; } clk_prepare_enable(ctrl->clk); @@ -551,21 +551,8 @@ failed_path_init: path_deinit(path_plat); } - if (ctrl->clk) { - devm_clk_put(ctrl->dev, ctrl->clk); - clk_disable_unprepare(ctrl->clk); - } -failed_get_clk: - devm_free_irq(ctrl->dev, ctrl->irq, ctrl); + clk_disable_unprepare(ctrl->clk); failed: - if (ctrl) { - if (ctrl->reg_base) - devm_iounmap(ctrl->dev, ctrl->reg_base); - devm_release_mem_region(ctrl->dev, res->start, - resource_size(res)); - devm_kfree(ctrl->dev, ctrl); - } - dev_err(&pdev->dev, "device init failed\n"); return ret; diff --git a/drivers/video/mxsfb.c b/drivers/video/mxsfb.c index d250ed0f806d..27197a8048c0 100644 --- a/drivers/video/mxsfb.c +++ b/drivers/video/mxsfb.c @@ -620,6 +620,7 @@ static int mxsfb_restore_mode(struct mxsfb_info *host) break; case 3: bits_per_pixel = 32; + break; case 1: default: return -EINVAL; diff --git a/drivers/video/neofb.c b/drivers/video/neofb.c index 7ef079c146e7..c172a5281f9e 100644 --- a/drivers/video/neofb.c +++ b/drivers/video/neofb.c @@ -2075,6 +2075,7 @@ static int neofb_probe(struct pci_dev *dev, const struct pci_device_id *id) if (!fb_find_mode(&info->var, info, mode_option, NULL, 0, info->monspecs.modedb, 16)) { printk(KERN_ERR "neofb: Unable to find usable video mode.\n"); + err = -EINVAL; goto err_map_video; } @@ -2097,7 +2098,8 @@ static int neofb_probe(struct pci_dev *dev, const struct pci_device_id *id) info->fix.smem_len >> 10, info->var.xres, info->var.yres, h_sync / 1000, h_sync % 1000, v_sync); - if (fb_alloc_cmap(&info->cmap, 256, 0) < 0) + err = fb_alloc_cmap(&info->cmap, 256, 0); + if (err < 0) goto err_map_video; err = register_framebuffer(info); diff --git a/drivers/video/of_display_timing.c b/drivers/video/of_display_timing.c index 171821ddd78d..ba5b40f581f6 100644 --- a/drivers/video/of_display_timing.c +++ b/drivers/video/of_display_timing.c @@ -120,7 +120,7 @@ int of_get_display_timing(struct device_node *np, const char *name, return -EINVAL; } - timing_np = of_find_node_by_name(np, name); + timing_np = of_get_child_by_name(np, name); if (!timing_np) { pr_err("%s: could not find node '%s'\n", of_node_full_name(np), name); @@ -143,11 +143,11 @@ struct display_timings *of_get_display_timings(struct device_node *np) struct display_timings *disp; if (!np) { - pr_err("%s: no devicenode given\n", of_node_full_name(np)); + pr_err("%s: no device node given\n", of_node_full_name(np)); return NULL; } - timings_np = of_find_node_by_name(np, "display-timings"); + timings_np = of_get_child_by_name(np, "display-timings"); if (!timings_np) { pr_err("%s: could not find display-timings node\n", of_node_full_name(np)); diff --git a/drivers/video/omap2/displays-new/Kconfig b/drivers/video/omap2/displays-new/Kconfig index 6c90885b0940..10b25e7cd878 100644 --- a/drivers/video/omap2/displays-new/Kconfig +++ b/drivers/video/omap2/displays-new/Kconfig @@ -35,6 +35,7 @@ config DISPLAY_PANEL_DPI config DISPLAY_PANEL_DSI_CM tristate "Generic DSI Command Mode Panel" + depends on BACKLIGHT_CLASS_DEVICE help Driver for generic DSI command mode panels. diff --git a/drivers/video/omap2/displays-new/connector-analog-tv.c b/drivers/video/omap2/displays-new/connector-analog-tv.c index 1b60698f141e..ccd9073f706f 100644 --- a/drivers/video/omap2/displays-new/connector-analog-tv.c +++ b/drivers/video/omap2/displays-new/connector-analog-tv.c @@ -191,7 +191,7 @@ static int tvc_probe_pdata(struct platform_device *pdev) in = omap_dss_find_output(pdata->source); if (in == NULL) { dev_err(&pdev->dev, "Failed to find video source\n"); - return -ENODEV; + return -EPROBE_DEFER; } ddata->in = in; diff --git a/drivers/video/omap2/displays-new/connector-dvi.c b/drivers/video/omap2/displays-new/connector-dvi.c index bc5f8ceda371..63d88ee6dfe4 100644 --- a/drivers/video/omap2/displays-new/connector-dvi.c +++ b/drivers/video/omap2/displays-new/connector-dvi.c @@ -263,7 +263,7 @@ static int dvic_probe_pdata(struct platform_device *pdev) in = omap_dss_find_output(pdata->source); if (in == NULL) { dev_err(&pdev->dev, "Failed to find video source\n"); - return -ENODEV; + return -EPROBE_DEFER; } ddata->in = in; diff --git a/drivers/video/omap2/displays-new/connector-hdmi.c b/drivers/video/omap2/displays-new/connector-hdmi.c index c5826716d6ab..9abe2c039ae9 100644 --- a/drivers/video/omap2/displays-new/connector-hdmi.c +++ b/drivers/video/omap2/displays-new/connector-hdmi.c @@ -290,7 +290,7 @@ static int hdmic_probe_pdata(struct platform_device *pdev) in = omap_dss_find_output(pdata->source); if (in == NULL) { dev_err(&pdev->dev, "Failed to find video source\n"); - return -ENODEV; + return -EPROBE_DEFER; } ddata->in = in; diff --git a/drivers/video/omap2/dss/dispc.c b/drivers/video/omap2/dss/dispc.c index 02a7340111df..477975009eee 100644 --- a/drivers/video/omap2/dss/dispc.c +++ b/drivers/video/omap2/dss/dispc.c @@ -3691,6 +3691,7 @@ static int __init omap_dispchw_probe(struct platform_device *pdev) } pm_runtime_enable(&pdev->dev); + pm_runtime_irq_safe(&pdev->dev); r = dispc_runtime_get(); if (r) diff --git a/drivers/video/s3fb.c b/drivers/video/s3fb.c index 47ca86c5c6c0..d838ba829459 100644 --- a/drivers/video/s3fb.c +++ b/drivers/video/s3fb.c @@ -1336,14 +1336,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) (info->var.bits_per_pixel * info->var.xres_virtual); if (info->var.yres_virtual < info->var.yres) { dev_err(info->device, "virtual vertical size smaller than real\n"); - goto err_find_mode; - } - - /* maximize virtual vertical size for fast scrolling */ - info->var.yres_virtual = info->fix.smem_len * 8 / - (info->var.bits_per_pixel * info->var.xres_virtual); - if (info->var.yres_virtual < info->var.yres) { - dev_err(info->device, "virtual vertical size smaller than real\n"); + rc = -EINVAL; goto err_find_mode; } diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index c7c64f18773d..fa932c2f7d97 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -613,6 +613,9 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, sl = dev_to_w1_slave(dev); fops = sl->family->fops; + if (!fops) + return 0; + switch (action) { case BUS_NOTIFY_ADD_DEVICE: /* if the family driver needs to initialize something... */ @@ -713,7 +716,10 @@ static int w1_attach_slave_device(struct w1_master *dev, struct w1_reg_num *rn) atomic_set(&sl->refcnt, 0); init_completion(&sl->released); + /* slave modules need to be loaded in a context with unlocked mutex */ + mutex_unlock(&dev->mutex); request_module("w1-family-0x%0x", rn->family); + mutex_lock(&dev->mutex); spin_lock(&w1_flock); f = w1_family_registered(rn->family); diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c index 5be5e3d14f79..19f3c3fc65f4 100644 --- a/drivers/watchdog/hpwdt.c +++ b/drivers/watchdog/hpwdt.c @@ -802,6 +802,12 @@ static int hpwdt_init_one(struct pci_dev *dev, return -ENODEV; } + /* + * Ignore all auxilary iLO devices with the following PCI ID + */ + if (dev->subsystem_device == 0x1979) + return -ENODEV; + if (pci_enable_device(dev)) { dev_warn(&dev->dev, "Not possible to enable PCI Device: 0x%x:0x%x.\n", diff --git a/drivers/watchdog/intel_scu_watchdog.c b/drivers/watchdog/intel_scu_watchdog.c index 9dda2d08af91..8ced25613956 100644 --- a/drivers/watchdog/intel_scu_watchdog.c +++ b/drivers/watchdog/intel_scu_watchdog.c @@ -48,7 +48,7 @@ #include #include #include -#include +#include #include "intel_scu_watchdog.h" @@ -445,7 +445,7 @@ static int __init intel_scu_watchdog_init(void) * * If it isn't an intel MID device then it doesn't have this watchdog */ - if (!mrst_identify_cpu()) + if (!intel_mid_identify_cpu()) return -ENODEV; /* Check boot parameters to verify that their initial values */ diff --git a/drivers/watchdog/kempld_wdt.c b/drivers/watchdog/kempld_wdt.c index 491419e0772a..5c3d4df63e68 100644 --- a/drivers/watchdog/kempld_wdt.c +++ b/drivers/watchdog/kempld_wdt.c @@ -35,7 +35,7 @@ #define KEMPLD_WDT_STAGE_TIMEOUT(x) (0x1b + (x) * 4) #define KEMPLD_WDT_STAGE_CFG(x) (0x18 + (x)) #define STAGE_CFG_GET_PRESCALER(x) (((x) & 0x30) >> 4) -#define STAGE_CFG_SET_PRESCALER(x) (((x) & 0x30) << 4) +#define STAGE_CFG_SET_PRESCALER(x) (((x) & 0x3) << 4) #define STAGE_CFG_PRESCALER_MASK 0x30 #define STAGE_CFG_ACTION_MASK 0x7 #define STAGE_CFG_ASSERT (1 << 3) diff --git a/drivers/watchdog/sunxi_wdt.c b/drivers/watchdog/sunxi_wdt.c index 1f94b42764aa..f6caa77151c7 100644 --- a/drivers/watchdog/sunxi_wdt.c +++ b/drivers/watchdog/sunxi_wdt.c @@ -146,7 +146,7 @@ static const struct watchdog_ops sunxi_wdt_ops = { .set_timeout = sunxi_wdt_set_timeout, }; -static int __init sunxi_wdt_probe(struct platform_device *pdev) +static int sunxi_wdt_probe(struct platform_device *pdev) { struct sunxi_wdt_dev *sunxi_wdt; struct resource *res; @@ -187,7 +187,7 @@ static int __init sunxi_wdt_probe(struct platform_device *pdev) return 0; } -static int __exit sunxi_wdt_remove(struct platform_device *pdev) +static int sunxi_wdt_remove(struct platform_device *pdev) { struct sunxi_wdt_dev *sunxi_wdt = platform_get_drvdata(pdev); diff --git a/drivers/watchdog/ts72xx_wdt.c b/drivers/watchdog/ts72xx_wdt.c index 42913f131dc2..c9b0c627fe7e 100644 --- a/drivers/watchdog/ts72xx_wdt.c +++ b/drivers/watchdog/ts72xx_wdt.c @@ -310,7 +310,8 @@ static long ts72xx_wdt_ioctl(struct file *file, unsigned int cmd, case WDIOC_GETSTATUS: case WDIOC_GETBOOTSTATUS: - return put_user(0, p); + error = put_user(0, p); + break; case WDIOC_KEEPALIVE: ts72xx_wdt_kick(wdt); diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index a50c6e3a7cc4..b232908a6192 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -398,8 +398,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) if (nr_pages > ARRAY_SIZE(frame_list)) nr_pages = ARRAY_SIZE(frame_list); - scratch_page = get_balloon_scratch_page(); - for (i = 0; i < nr_pages; i++) { page = alloc_page(gfp); if (page == NULL) { @@ -413,6 +411,12 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) scrub_page(page); + /* + * Ballooned out frames are effectively replaced with + * a scratch frame. Ensure direct mappings and the + * p2m are consistent. + */ + scratch_page = get_balloon_scratch_page(); #ifdef CONFIG_XEN_HAVE_PVMMU if (xen_pv_domain() && !PageHighMem(page)) { ret = HYPERVISOR_update_va_mapping( @@ -422,24 +426,19 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) BUG_ON(ret); } #endif - } - - /* Ensure that ballooned highmem pages don't have kmaps. */ - kmap_flush_unused(); - flush_tlb_all(); - - /* No more mappings: invalidate P2M and add to balloon. */ - for (i = 0; i < nr_pages; i++) { - pfn = mfn_to_pfn(frame_list[i]); if (!xen_feature(XENFEAT_auto_translated_physmap)) { unsigned long p; p = page_to_pfn(scratch_page); __set_phys_to_machine(pfn, pfn_to_mfn(p)); } + put_balloon_scratch_page(); + balloon_append(pfn_to_page(pfn)); } - put_balloon_scratch_page(); + /* Ensure that ballooned highmem pages don't have kmaps. */ + kmap_flush_unused(); + flush_tlb_all(); set_xen_guest_handle(reservation.extent_start, frame_list); reservation.nr_extents = nr_pages; diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 646337dc5201..529300327f45 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -600,9 +600,6 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) /* lock down the parent dentry so we can peer at it */ parent = dget_parent(dentry); - if (!parent->d_inode) - goto out_bad; - dir = AFS_FS_I(parent->d_inode); /* validate the parent directory */ diff --git a/fs/aio.c b/fs/aio.c index 6b868f0e0c4c..067e3d340c35 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -167,10 +167,25 @@ static int __init aio_setup(void) } __initcall(aio_setup); +static void put_aio_ring_file(struct kioctx *ctx) +{ + struct file *aio_ring_file = ctx->aio_ring_file; + if (aio_ring_file) { + truncate_setsize(aio_ring_file->f_inode, 0); + + /* Prevent further access to the kioctx from migratepages */ + spin_lock(&aio_ring_file->f_inode->i_mapping->private_lock); + aio_ring_file->f_inode->i_mapping->private_data = NULL; + ctx->aio_ring_file = NULL; + spin_unlock(&aio_ring_file->f_inode->i_mapping->private_lock); + + fput(aio_ring_file); + } +} + static void aio_free_ring(struct kioctx *ctx) { int i; - struct file *aio_ring_file = ctx->aio_ring_file; for (i = 0; i < ctx->nr_pages; i++) { pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i, @@ -178,14 +193,10 @@ static void aio_free_ring(struct kioctx *ctx) put_page(ctx->ring_pages[i]); } + put_aio_ring_file(ctx); + if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages) kfree(ctx->ring_pages); - - if (aio_ring_file) { - truncate_setsize(aio_ring_file->f_inode, 0); - fput(aio_ring_file); - ctx->aio_ring_file = NULL; - } } static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma) @@ -207,9 +218,8 @@ static int aio_set_page_dirty(struct page *page) static int aio_migratepage(struct address_space *mapping, struct page *new, struct page *old, enum migrate_mode mode) { - struct kioctx *ctx = mapping->private_data; + struct kioctx *ctx; unsigned long flags; - unsigned idx = old->index; int rc; /* Writeback must be complete */ @@ -224,10 +234,23 @@ static int aio_migratepage(struct address_space *mapping, struct page *new, get_page(new); - spin_lock_irqsave(&ctx->completion_lock, flags); - migrate_page_copy(new, old); - ctx->ring_pages[idx] = new; - spin_unlock_irqrestore(&ctx->completion_lock, flags); + /* We can potentially race against kioctx teardown here. Use the + * address_space's private data lock to protect the mapping's + * private_data. + */ + spin_lock(&mapping->private_lock); + ctx = mapping->private_data; + if (ctx) { + pgoff_t idx; + spin_lock_irqsave(&ctx->completion_lock, flags); + migrate_page_copy(new, old); + idx = old->index; + if (idx < (pgoff_t)ctx->nr_pages) + ctx->ring_pages[idx] = new; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else + rc = -EBUSY; + spin_unlock(&mapping->private_lock); return rc; } @@ -617,8 +640,7 @@ out_freepcpu: out_freeref: free_percpu(ctx->users.pcpu_count); out_freectx: - if (ctx->aio_ring_file) - fput(ctx->aio_ring_file); + put_aio_ring_file(ctx); kmem_cache_free(kioctx_cachep, ctx); pr_debug("error allocating ioctx %d\n", err); return ERR_PTR(err); diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 100edcc5e312..4c94a79991bb 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1413,7 +1413,7 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata, * long file_ofs * followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL... */ -static void fill_files_note(struct memelfnote *note) +static int fill_files_note(struct memelfnote *note) { struct vm_area_struct *vma; unsigned count, size, names_ofs, remaining, n; @@ -1428,11 +1428,11 @@ static void fill_files_note(struct memelfnote *note) names_ofs = (2 + 3 * count) * sizeof(data[0]); alloc: if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */ - goto err; + return -EINVAL; size = round_up(size, PAGE_SIZE); data = vmalloc(size); if (!data) - goto err; + return -ENOMEM; start_end_ofs = data + 2; name_base = name_curpos = ((char *)data) + names_ofs; @@ -1485,7 +1485,7 @@ static void fill_files_note(struct memelfnote *note) size = name_curpos - (char *)data; fill_note(note, "CORE", NT_FILE, size, data); - err: ; + return 0; } #ifdef CORE_DUMP_USE_REGSET @@ -1686,8 +1686,8 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_auxv_note(&info->auxv, current->mm); info->size += notesize(&info->auxv); - fill_files_note(&info->files); - info->size += notesize(&info->files); + if (fill_files_note(&info->files) == 0) + info->size += notesize(&info->files); return 1; } @@ -1719,7 +1719,8 @@ static int write_note_info(struct elf_note_info *info, return 0; if (first && !writenote(&info->auxv, file, foffset)) return 0; - if (first && !writenote(&info->files, file, foffset)) + if (first && info->files.data && + !writenote(&info->files, file, foffset)) return 0; for (i = 1; i < info->thread_notes; ++i) @@ -1806,6 +1807,7 @@ static int elf_dump_thread_status(long signr, struct elf_thread_status *t) struct elf_note_info { struct memelfnote *notes; + struct memelfnote *notes_files; struct elf_prstatus *prstatus; /* NT_PRSTATUS */ struct elf_prpsinfo *psinfo; /* NT_PRPSINFO */ struct list_head thread_list; @@ -1896,9 +1898,12 @@ static int fill_note_info(struct elfhdr *elf, int phdrs, fill_siginfo_note(info->notes + 2, &info->csigdata, siginfo); fill_auxv_note(info->notes + 3, current->mm); - fill_files_note(info->notes + 4); + info->numnote = 4; - info->numnote = 5; + if (fill_files_note(info->notes + info->numnote) == 0) { + info->notes_files = info->notes + info->numnote; + info->numnote++; + } /* Try to dump the FPU. */ info->prstatus->pr_fpvalid = elf_core_copy_task_fpregs(current, regs, @@ -1960,8 +1965,9 @@ static void free_note_info(struct elf_note_info *info) kfree(list_entry(tmp, struct elf_thread_status, list)); } - /* Free data allocated by fill_files_note(): */ - vfree(info->notes[4].data); + /* Free data possibly allocated by fill_files_note(): */ + if (info->notes_files) + vfree(info->notes_files->data); kfree(info->prstatus); kfree(info->psinfo); @@ -2044,7 +2050,7 @@ static int elf_core_dump(struct coredump_params *cprm) struct vm_area_struct *vma, *gate_vma; struct elfhdr *elf = NULL; loff_t offset = 0, dataoff, foffset; - struct elf_note_info info; + struct elf_note_info info = { }; struct elf_phdr *phdr4note = NULL; struct elf_shdr *shdr4extnum = NULL; Elf_Half e_phnum; diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 58b7d14b08ee..08cc08f037a6 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -107,7 +107,8 @@ static void check_idle_worker(struct btrfs_worker_thread *worker) worker->idle = 1; /* the list may be empty if the worker is just starting */ - if (!list_empty(&worker->worker_list)) { + if (!list_empty(&worker->worker_list) && + !worker->workers->stopping) { list_move(&worker->worker_list, &worker->workers->idle_list); } @@ -127,7 +128,8 @@ static void check_busy_worker(struct btrfs_worker_thread *worker) spin_lock_irqsave(&worker->workers->lock, flags); worker->idle = 0; - if (!list_empty(&worker->worker_list)) { + if (!list_empty(&worker->worker_list) && + !worker->workers->stopping) { list_move_tail(&worker->worker_list, &worker->workers->worker_list); } @@ -412,6 +414,7 @@ void btrfs_stop_workers(struct btrfs_workers *workers) int can_stop; spin_lock_irq(&workers->lock); + workers->stopping = 1; list_splice_init(&workers->idle_list, &workers->worker_list); while (!list_empty(&workers->worker_list)) { cur = workers->worker_list.next; @@ -455,6 +458,7 @@ void btrfs_init_workers(struct btrfs_workers *workers, char *name, int max, workers->ordered = 0; workers->atomic_start_pending = 0; workers->atomic_worker_start = async_helper; + workers->stopping = 0; } /* @@ -480,15 +484,19 @@ static int __btrfs_start_workers(struct btrfs_workers *workers) atomic_set(&worker->num_pending, 0); atomic_set(&worker->refs, 1); worker->workers = workers; - worker->task = kthread_run(worker_loop, worker, - "btrfs-%s-%d", workers->name, - workers->num_workers + 1); + worker->task = kthread_create(worker_loop, worker, + "btrfs-%s-%d", workers->name, + workers->num_workers + 1); if (IS_ERR(worker->task)) { ret = PTR_ERR(worker->task); - kfree(worker); goto fail; } + spin_lock_irq(&workers->lock); + if (workers->stopping) { + spin_unlock_irq(&workers->lock); + goto fail_kthread; + } list_add_tail(&worker->worker_list, &workers->idle_list); worker->idle = 1; workers->num_workers++; @@ -496,8 +504,13 @@ static int __btrfs_start_workers(struct btrfs_workers *workers) WARN_ON(workers->num_workers_starting < 0); spin_unlock_irq(&workers->lock); + wake_up_process(worker->task); return 0; + +fail_kthread: + kthread_stop(worker->task); fail: + kfree(worker); spin_lock_irq(&workers->lock); workers->num_workers_starting--; spin_unlock_irq(&workers->lock); diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h index 063698b90ce2..1f26792683ed 100644 --- a/fs/btrfs/async-thread.h +++ b/fs/btrfs/async-thread.h @@ -107,6 +107,8 @@ struct btrfs_workers { /* extra name for this worker, used for current->name */ char *name; + + int stopping; }; void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work); diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 70681686e8dc..9efb94e95858 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -535,10 +535,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list); btrfs_rm_dev_replace_srcdev(fs_info, src_device); - if (src_device->bdev) { - /* zero out the old super */ - btrfs_scratch_superblock(src_device); - } + /* * this is again a consistent state where no dev_replace procedure * is running, the target device is part of the filesystem, the diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4ae17ed13b32..62176ad89846 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1561,8 +1561,9 @@ int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info, return ret; } -struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, - struct btrfs_key *location) +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, + struct btrfs_key *location, + bool check_ref) { struct btrfs_root *root; int ret; @@ -1586,7 +1587,7 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, again: root = btrfs_lookup_fs_root(fs_info, location->objectid); if (root) { - if (btrfs_root_refs(&root->root_item) == 0) + if (check_ref && btrfs_root_refs(&root->root_item) == 0) return ERR_PTR(-ENOENT); return root; } @@ -1595,7 +1596,7 @@ again: if (IS_ERR(root)) return root; - if (btrfs_root_refs(&root->root_item) == 0) { + if (check_ref && btrfs_root_refs(&root->root_item) == 0) { ret = -ENOENT; goto fail; } diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index b71acd6e1e5b..5ce2a7da8b11 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -68,8 +68,17 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root, int btrfs_init_fs_root(struct btrfs_root *root); int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info, struct btrfs_root *root); -struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, - struct btrfs_key *location); + +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, + struct btrfs_key *key, + bool check_ref); +static inline struct btrfs_root * +btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, + struct btrfs_key *location) +{ + return btrfs_get_fs_root(fs_info, location, true); +} + int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info); void btrfs_btree_balance_dirty(struct btrfs_root *root); void btrfs_btree_balance_dirty_nodelay(struct btrfs_root *root); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c09a40db53db..51731b76900d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -145,8 +145,16 @@ int __init extent_io_init(void) offsetof(struct btrfs_io_bio, bio)); if (!btrfs_bioset) goto free_buffer_cache; + + if (bioset_integrity_create(btrfs_bioset, BIO_POOL_SIZE)) + goto free_bioset; + return 0; +free_bioset: + bioset_free(btrfs_bioset); + btrfs_bioset = NULL; + free_buffer_cache: kmem_cache_destroy(extent_buffer_cache); extent_buffer_cache = NULL; @@ -1482,10 +1490,8 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, cur_start = state->end + 1; node = rb_next(node); total_bytes += state->end - state->start + 1; - if (total_bytes >= max_bytes) { - *end = *start + max_bytes - 1; + if (total_bytes >= max_bytes) break; - } if (!node) break; } @@ -1614,7 +1620,7 @@ again: *start = delalloc_start; *end = delalloc_end; free_extent_state(cached_state); - return found; + return 0; } /* @@ -1627,10 +1633,9 @@ again: /* * make sure to limit the number of pages we try to lock down - * if we're looping. */ - if (delalloc_end + 1 - delalloc_start > max_bytes && loops) - delalloc_end = delalloc_start + PAGE_CACHE_SIZE - 1; + if (delalloc_end + 1 - delalloc_start > max_bytes) + delalloc_end = delalloc_start + max_bytes - 1; /* step two, lock all the pages after the page that has start */ ret = lock_delalloc_pages(inode, locked_page, @@ -1641,8 +1646,7 @@ again: */ free_extent_state(cached_state); if (!loops) { - unsigned long offset = (*start) & (PAGE_CACHE_SIZE - 1); - max_bytes = PAGE_CACHE_SIZE - offset; + max_bytes = PAGE_CACHE_SIZE; loops = 1; goto again; } else { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 22ebc13b6c99..51e3afa78354 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6437,6 +6437,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len, if (btrfs_extent_readonly(root, disk_bytenr)) goto out; + btrfs_release_path(path); /* * look for other files referencing this extent, if we @@ -7986,7 +7987,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, /* check for collisions, even if the name isn't there */ - ret = btrfs_check_dir_item_collision(root, new_dir->i_ino, + ret = btrfs_check_dir_item_collision(dest, new_dir->i_ino, new_dentry->d_name.name, new_dentry->d_name.len); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index a5a26320503f..4a355726151e 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -588,7 +588,7 @@ static struct btrfs_root *read_fs_root(struct btrfs_fs_info *fs_info, else key.offset = (u64)-1; - return btrfs_read_fs_root_no_name(fs_info, &key); + return btrfs_get_fs_root(fs_info, &key, false); } #ifdef BTRFS_COMPAT_EXTENT_TREE_V0 diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 0b1f4ef8db98..ec71ea44d2b4 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -299,11 +299,6 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root) continue; } - if (btrfs_root_refs(&root->root_item) == 0) { - btrfs_add_dead_root(root); - continue; - } - err = btrfs_init_fs_root(root); if (err) { btrfs_free_fs_root(root); @@ -318,6 +313,9 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root) btrfs_free_fs_root(root); break; } + + if (btrfs_root_refs(&root->root_item) == 0) + btrfs_add_dead_root(root); } btrfs_free_path(path); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e7a95356df83..8c81bdc1ef9b 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1838,11 +1838,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, assert_qgroups_uptodate(trans); update_super_roots(root); - if (!root->fs_info->log_root_recovering) { - btrfs_set_super_log_root(root->fs_info->super_copy, 0); - btrfs_set_super_log_root_level(root->fs_info->super_copy, 0); - } - + btrfs_set_super_log_root(root->fs_info->super_copy, 0); + btrfs_set_super_log_root_level(root->fs_info->super_copy, 0); memcpy(root->fs_info->super_for_commit, root->fs_info->super_copy, sizeof(*root->fs_info->super_copy)); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a10645830223..043b215769c2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1716,6 +1716,7 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, struct btrfs_device *srcdev) { WARN_ON(!mutex_is_locked(&fs_info->fs_devices->device_list_mutex)); + list_del_rcu(&srcdev->dev_list); list_del_rcu(&srcdev->dev_alloc_list); fs_info->fs_devices->num_devices--; @@ -1725,9 +1726,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, } if (srcdev->can_discard) fs_info->fs_devices->num_can_discard--; - if (srcdev->bdev) + if (srcdev->bdev) { fs_info->fs_devices->open_devices--; + /* zero out the old super */ + btrfs_scratch_superblock(srcdev); + } + call_rcu(&srcdev->rcu, free_device); } diff --git a/fs/buffer.c b/fs/buffer.c index 4d7433534f5c..6024877335ca 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1005,9 +1005,19 @@ grow_dev_page(struct block_device *bdev, sector_t block, struct buffer_head *bh; sector_t end_block; int ret = 0; /* Will call free_more_memory() */ + gfp_t gfp_mask; - page = find_or_create_page(inode->i_mapping, index, - (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE); + gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS; + gfp_mask |= __GFP_MOVABLE; + /* + * XXX: __getblk_slow() can not really deal with failure and + * will endlessly loop on improvised global reclaim. Prefer + * looping in the allocator rather than here, at least that + * code knows what it's doing. + */ + gfp_mask |= __GFP_NOFAIL; + + page = find_or_create_page(inode->i_mapping, index, gfp_mask); if (!page) return ret; diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index a16b4e58bcc6..77fc5e181077 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -120,14 +120,16 @@ cifs_read_super(struct super_block *sb) { struct inode *inode; struct cifs_sb_info *cifs_sb; + struct cifs_tcon *tcon; int rc = 0; cifs_sb = CIFS_SB(sb); + tcon = cifs_sb_master_tcon(cifs_sb); if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIXACL) sb->s_flags |= MS_POSIXACL; - if (cifs_sb_master_tcon(cifs_sb)->ses->capabilities & CAP_LARGE_FILES) + if (tcon->ses->capabilities & tcon->ses->server->vals->cap_large_files) sb->s_maxbytes = MAX_LFS_FILESIZE; else sb->s_maxbytes = MAX_NON_LFS; @@ -147,7 +149,7 @@ cifs_read_super(struct super_block *sb) goto out_no_root; } - if (cifs_sb_master_tcon(cifs_sb)->nocase) + if (tcon->nocase) sb->s_d_op = &cifs_ci_dentry_ops; else sb->s_d_op = &cifs_dentry_ops; diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index ea723a5e8226..6d0b07217ac9 100644 --- a/fs/cifs/cifsfs.h +++ b/fs/cifs/cifsfs.h @@ -132,5 +132,5 @@ extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg); extern const struct export_operations cifs_export_ops; #endif /* CONFIG_CIFS_NFSD_EXPORT */ -#define CIFS_VERSION "2.01" +#define CIFS_VERSION "2.02" #endif /* _CIFSFS_H */ diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index cfa14c80ef3b..52b6f6c26bfc 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -547,9 +547,6 @@ struct TCP_Server_Info { unsigned int max_rw; /* maxRw specifies the maximum */ /* message size the server can send or receive for */ /* SMB_COM_WRITE_RAW or SMB_COM_READ_RAW. */ - unsigned int max_vcs; /* maximum number of smb sessions, at least - those that can be specified uniquely with - vcnumbers */ unsigned int capabilities; /* selective disabling of caps by smb sess */ int timeAdj; /* Adjust for difference in server time zone in sec */ __u64 CurrentMid; /* multiplex id - rotating counter */ @@ -715,7 +712,6 @@ struct cifs_ses { enum statusEnum status; unsigned overrideSecFlg; /* if non-zero override global sec flags */ __u16 ipc_tid; /* special tid for connection to IPC share */ - __u16 vcnum; char *serverOS; /* name of operating system underlying server */ char *serverNOS; /* name of network operating system of server */ char *serverDomain; /* security realm of server */ @@ -1272,6 +1268,7 @@ struct dfs_info3_param { #define CIFS_FATTR_DELETE_PENDING 0x2 #define CIFS_FATTR_NEED_REVAL 0x4 #define CIFS_FATTR_INO_COLLISION 0x8 +#define CIFS_FATTR_UNKNOWN_NLINK 0x10 struct cifs_fattr { u32 cf_flags; diff --git a/fs/cifs/cifspdu.h b/fs/cifs/cifspdu.h index 948676db8e2e..08f9dfb1a894 100644 --- a/fs/cifs/cifspdu.h +++ b/fs/cifs/cifspdu.h @@ -1491,15 +1491,30 @@ struct file_notify_information { __u8 FileName[0]; } __attribute__((packed)); -struct reparse_data { - __u32 ReparseTag; - __u16 ReparseDataLength; +/* For IO_REPARSE_TAG_SYMLINK */ +struct reparse_symlink_data { + __le32 ReparseTag; + __le16 ReparseDataLength; __u16 Reserved; - __u16 SubstituteNameOffset; - __u16 SubstituteNameLength; - __u16 PrintNameOffset; - __u16 PrintNameLength; - __u32 Flags; + __le16 SubstituteNameOffset; + __le16 SubstituteNameLength; + __le16 PrintNameOffset; + __le16 PrintNameLength; + __le32 Flags; + char PathBuffer[0]; +} __attribute__((packed)); + +/* For IO_REPARSE_TAG_NFS */ +#define NFS_SPECFILE_LNK 0x00000000014B4E4C +#define NFS_SPECFILE_CHR 0x0000000000524843 +#define NFS_SPECFILE_BLK 0x00000000004B4C42 +#define NFS_SPECFILE_FIFO 0x000000004F464946 +#define NFS_SPECFILE_SOCK 0x000000004B434F53 +struct reparse_posix_data { + __le32 ReparseTag; + __le16 ReparseDataLength; + __u16 Reserved; + __le64 InodeType; /* LNK, FIFO, CHR etc. */ char PathBuffer[0]; } __attribute__((packed)); @@ -2652,26 +2667,7 @@ typedef struct file_xattr_info { } __attribute__((packed)) FILE_XATTR_INFO; /* extended attribute info level 0x205 */ - -/* flags for chattr command */ -#define EXT_SECURE_DELETE 0x00000001 /* EXT3_SECRM_FL */ -#define EXT_ENABLE_UNDELETE 0x00000002 /* EXT3_UNRM_FL */ -/* Reserved for compress file 0x4 */ -#define EXT_SYNCHRONOUS 0x00000008 /* EXT3_SYNC_FL */ -#define EXT_IMMUTABLE_FL 0x00000010 /* EXT3_IMMUTABLE_FL */ -#define EXT_OPEN_APPEND_ONLY 0x00000020 /* EXT3_APPEND_FL */ -#define EXT_DO_NOT_BACKUP 0x00000040 /* EXT3_NODUMP_FL */ -#define EXT_NO_UPDATE_ATIME 0x00000080 /* EXT3_NOATIME_FL */ -/* 0x100 through 0x800 reserved for compression flags and are GET-ONLY */ -#define EXT_HASH_TREE_INDEXED_DIR 0x00001000 /* GET-ONLY EXT3_INDEX_FL */ -/* 0x2000 reserved for IMAGIC_FL */ -#define EXT_JOURNAL_THIS_FILE 0x00004000 /* GET-ONLY EXT3_JOURNAL_DATA_FL */ -/* 0x8000 reserved for EXT3_NOTAIL_FL */ -#define EXT_SYNCHRONOUS_DIR 0x00010000 /* EXT3_DIRSYNC_FL */ -#define EXT_TOPDIR 0x00020000 /* EXT3_TOPDIR_FL */ - -#define EXT_SET_MASK 0x000300FF -#define EXT_GET_MASK 0x0003DFFF +/* flags for lsattr and chflags commands removed arein uapi/linux/fs.h */ typedef struct file_chattr_info { __le64 mask; /* list of all possible attribute bits */ diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index a3d74fea1623..ccd31ab815d4 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -463,7 +463,6 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, NEGOTIATE_RSP *pSMBr) cifs_max_pending); set_credits(server, server->maxReq); server->maxBuf = le16_to_cpu(rsp->MaxBufSize); - server->max_vcs = le16_to_cpu(rsp->MaxNumberVcs); /* even though we do not use raw we might as well set this accurately, in case we ever find a need for it */ if ((le16_to_cpu(rsp->RawMode) & RAW_ENABLE) == RAW_ENABLE) { @@ -3089,7 +3088,8 @@ CIFSSMBQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, bool is_unicode; unsigned int sub_len; char *sub_start; - struct reparse_data *reparse_buf; + struct reparse_symlink_data *reparse_buf; + struct reparse_posix_data *posix_buf; __u32 data_offset, data_count; char *end_of_smb; @@ -3138,20 +3138,47 @@ CIFSSMBQuerySymLink(const unsigned int xid, struct cifs_tcon *tcon, goto qreparse_out; } end_of_smb = 2 + get_bcc(&pSMBr->hdr) + (char *)&pSMBr->ByteCount; - reparse_buf = (struct reparse_data *) + reparse_buf = (struct reparse_symlink_data *) ((char *)&pSMBr->hdr.Protocol + data_offset); if ((char *)reparse_buf >= end_of_smb) { rc = -EIO; goto qreparse_out; } - if ((reparse_buf->PathBuffer + reparse_buf->PrintNameOffset + - reparse_buf->PrintNameLength) > end_of_smb) { + if (reparse_buf->ReparseTag == cpu_to_le32(IO_REPARSE_TAG_NFS)) { + cifs_dbg(FYI, "NFS style reparse tag\n"); + posix_buf = (struct reparse_posix_data *)reparse_buf; + + if (posix_buf->InodeType != cpu_to_le64(NFS_SPECFILE_LNK)) { + cifs_dbg(FYI, "unsupported file type 0x%llx\n", + le64_to_cpu(posix_buf->InodeType)); + rc = -EOPNOTSUPP; + goto qreparse_out; + } + is_unicode = true; + sub_len = le16_to_cpu(reparse_buf->ReparseDataLength); + if (posix_buf->PathBuffer + sub_len > end_of_smb) { + cifs_dbg(FYI, "reparse buf beyond SMB\n"); + rc = -EIO; + goto qreparse_out; + } + *symlinkinfo = cifs_strndup_from_utf16(posix_buf->PathBuffer, + sub_len, is_unicode, nls_codepage); + goto qreparse_out; + } else if (reparse_buf->ReparseTag != + cpu_to_le32(IO_REPARSE_TAG_SYMLINK)) { + rc = -EOPNOTSUPP; + goto qreparse_out; + } + + /* Reparse tag is NTFS symlink */ + sub_start = le16_to_cpu(reparse_buf->SubstituteNameOffset) + + reparse_buf->PathBuffer; + sub_len = le16_to_cpu(reparse_buf->SubstituteNameLength); + if (sub_start + sub_len > end_of_smb) { cifs_dbg(FYI, "reparse buf beyond SMB\n"); rc = -EIO; goto qreparse_out; } - sub_start = reparse_buf->SubstituteNameOffset + reparse_buf->PathBuffer; - sub_len = reparse_buf->SubstituteNameLength; if (pSMBr->hdr.Flags2 & SMBFLG2_UNICODE) is_unicode = true; else diff --git a/fs/cifs/file.c b/fs/cifs/file.c index eb955b525e55..7ddddf2e2504 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3254,6 +3254,9 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, /* * Reads as many pages as possible from fscache. Returns -ENOBUFS * immediately if the cookie is negative + * + * After this point, every page in the list might have PG_fscache set, + * so we will need to clean that up off of every page we don't use. */ rc = cifs_readpages_from_fscache(mapping->host, mapping, page_list, &num_pages); @@ -3376,6 +3379,11 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, kref_put(&rdata->refcount, cifs_readdata_release); } + /* Any pages that have been shown to fscache but didn't get added to + * the pagecache must be uncached before they get returned to the + * allocator. + */ + cifs_fscache_readpages_cancel(mapping->host, page_list); return rc; } diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c index 2f4bc5a58054..b3258f35e88a 100644 --- a/fs/cifs/fscache.c +++ b/fs/cifs/fscache.c @@ -223,6 +223,13 @@ void __cifs_readpage_to_fscache(struct inode *inode, struct page *page) fscache_uncache_page(CIFS_I(inode)->fscache, page); } +void __cifs_fscache_readpages_cancel(struct inode *inode, struct list_head *pages) +{ + cifs_dbg(FYI, "%s: (fsc: %p, i: %p)\n", + __func__, CIFS_I(inode)->fscache, inode); + fscache_readpages_cancel(CIFS_I(inode)->fscache, pages); +} + void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode) { struct cifsInodeInfo *cifsi = CIFS_I(inode); diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h index 63539323e0b9..24794b6cd8ec 100644 --- a/fs/cifs/fscache.h +++ b/fs/cifs/fscache.h @@ -54,6 +54,7 @@ extern int __cifs_readpages_from_fscache(struct inode *, struct address_space *, struct list_head *, unsigned *); +extern void __cifs_fscache_readpages_cancel(struct inode *, struct list_head *); extern void __cifs_readpage_to_fscache(struct inode *, struct page *); @@ -91,6 +92,13 @@ static inline void cifs_readpage_to_fscache(struct inode *inode, __cifs_readpage_to_fscache(inode, page); } +static inline void cifs_fscache_readpages_cancel(struct inode *inode, + struct list_head *pages) +{ + if (CIFS_I(inode)->fscache) + return __cifs_fscache_readpages_cancel(inode, pages); +} + #else /* CONFIG_CIFS_FSCACHE */ static inline int cifs_fscache_register(void) { return 0; } static inline void cifs_fscache_unregister(void) {} @@ -131,6 +139,11 @@ static inline int cifs_readpages_from_fscache(struct inode *inode, static inline void cifs_readpage_to_fscache(struct inode *inode, struct page *page) {} +static inline void cifs_fscache_readpages_cancel(struct inode *inode, + struct list_head *pages) +{ +} + #endif /* CONFIG_CIFS_FSCACHE */ #endif /* _CIFS_FSCACHE_H */ diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index f9ff9c173f78..867b7cdc794a 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -120,6 +120,33 @@ cifs_revalidate_cache(struct inode *inode, struct cifs_fattr *fattr) cifs_i->invalid_mapping = true; } +/* + * copy nlink to the inode, unless it wasn't provided. Provide + * sane values if we don't have an existing one and none was provided + */ +static void +cifs_nlink_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) +{ + /* + * if we're in a situation where we can't trust what we + * got from the server (readdir, some non-unix cases) + * fake reasonable values + */ + if (fattr->cf_flags & CIFS_FATTR_UNKNOWN_NLINK) { + /* only provide fake values on a new inode */ + if (inode->i_state & I_NEW) { + if (fattr->cf_cifsattrs & ATTR_DIRECTORY) + set_nlink(inode, 2); + else + set_nlink(inode, 1); + } + return; + } + + /* we trust the server, so update it */ + set_nlink(inode, fattr->cf_nlink); +} + /* populate an inode with info from a cifs_fattr struct */ void cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) @@ -134,7 +161,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr) inode->i_mtime = fattr->cf_mtime; inode->i_ctime = fattr->cf_ctime; inode->i_rdev = fattr->cf_rdev; - set_nlink(inode, fattr->cf_nlink); + cifs_nlink_fattr_to_inode(inode, fattr); inode->i_uid = fattr->cf_uid; inode->i_gid = fattr->cf_gid; @@ -541,6 +568,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, fattr->cf_bytes = le64_to_cpu(info->AllocationSize); fattr->cf_createtime = le64_to_cpu(info->CreationTime); + fattr->cf_nlink = le32_to_cpu(info->NumberOfLinks); if (fattr->cf_cifsattrs & ATTR_DIRECTORY) { fattr->cf_mode = S_IFDIR | cifs_sb->mnt_dir_mode; fattr->cf_dtype = DT_DIR; @@ -548,7 +576,8 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, * Server can return wrong NumberOfLinks value for directories * when Unix extensions are disabled - fake it. */ - fattr->cf_nlink = 2; + if (!tcon->unix_ext) + fattr->cf_flags |= CIFS_FATTR_UNKNOWN_NLINK; } else if (fattr->cf_cifsattrs & ATTR_REPARSE) { fattr->cf_mode = S_IFLNK; fattr->cf_dtype = DT_LNK; @@ -561,11 +590,15 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, if (fattr->cf_cifsattrs & ATTR_READONLY) fattr->cf_mode &= ~(S_IWUGO); - fattr->cf_nlink = le32_to_cpu(info->NumberOfLinks); - if (fattr->cf_nlink < 1) { - cifs_dbg(1, "replacing bogus file nlink value %u\n", + /* + * Don't accept zero nlink from non-unix servers unless + * delete is pending. Instead mark it as unknown. + */ + if ((fattr->cf_nlink < 1) && !tcon->unix_ext && + !info->DeletePending) { + cifs_dbg(1, "bogus file nlink value %u\n", fattr->cf_nlink); - fattr->cf_nlink = 1; + fattr->cf_flags |= CIFS_FATTR_UNKNOWN_NLINK; } } diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c index af847e1cf1c1..651a5279607b 100644 --- a/fs/cifs/netmisc.c +++ b/fs/cifs/netmisc.c @@ -780,7 +780,9 @@ static const struct { ERRDOS, ERRnoaccess, 0xc0000290}, { ERRDOS, ERRbadfunc, 0xc000029c}, { ERRDOS, ERRsymlink, NT_STATUS_STOPPED_ON_SYMLINK}, { - ERRDOS, ERRinvlevel, 0x007c0001}, }; + ERRDOS, ERRinvlevel, 0x007c0001}, { + 0, 0, 0 } +}; /***************************************************************************** Print an error message from the status code diff --git a/fs/cifs/readdir.c b/fs/cifs/readdir.c index 42ef03be089f..53a75f3d0179 100644 --- a/fs/cifs/readdir.c +++ b/fs/cifs/readdir.c @@ -180,6 +180,9 @@ cifs_fill_common_info(struct cifs_fattr *fattr, struct cifs_sb_info *cifs_sb) fattr->cf_dtype = DT_REG; } + /* non-unix readdir doesn't provide nlink */ + fattr->cf_flags |= CIFS_FATTR_UNKNOWN_NLINK; + if (fattr->cf_cifsattrs & ATTR_READONLY) fattr->cf_mode &= ~S_IWUGO; diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index 5f99b7f19e78..e87387dbf39f 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -32,88 +32,6 @@ #include #include "cifs_spnego.h" -/* - * Checks if this is the first smb session to be reconnected after - * the socket has been reestablished (so we know whether to use vc 0). - * Called while holding the cifs_tcp_ses_lock, so do not block - */ -static bool is_first_ses_reconnect(struct cifs_ses *ses) -{ - struct list_head *tmp; - struct cifs_ses *tmp_ses; - - list_for_each(tmp, &ses->server->smb_ses_list) { - tmp_ses = list_entry(tmp, struct cifs_ses, - smb_ses_list); - if (tmp_ses->need_reconnect == false) - return false; - } - /* could not find a session that was already connected, - this must be the first one we are reconnecting */ - return true; -} - -/* - * vc number 0 is treated specially by some servers, and should be the - * first one we request. After that we can use vcnumbers up to maxvcs, - * one for each smb session (some Windows versions set maxvcs incorrectly - * so maxvc=1 can be ignored). If we have too many vcs, we can reuse - * any vc but zero (some servers reset the connection on vcnum zero) - * - */ -static __le16 get_next_vcnum(struct cifs_ses *ses) -{ - __u16 vcnum = 0; - struct list_head *tmp; - struct cifs_ses *tmp_ses; - __u16 max_vcs = ses->server->max_vcs; - __u16 i; - int free_vc_found = 0; - - /* Quoting the MS-SMB specification: "Windows-based SMB servers set this - field to one but do not enforce this limit, which allows an SMB client - to establish more virtual circuits than allowed by this value ... but - other server implementations can enforce this limit." */ - if (max_vcs < 2) - max_vcs = 0xFFFF; - - spin_lock(&cifs_tcp_ses_lock); - if ((ses->need_reconnect) && is_first_ses_reconnect(ses)) - goto get_vc_num_exit; /* vcnum will be zero */ - for (i = ses->server->srv_count - 1; i < max_vcs; i++) { - if (i == 0) /* this is the only connection, use vc 0 */ - break; - - free_vc_found = 1; - - list_for_each(tmp, &ses->server->smb_ses_list) { - tmp_ses = list_entry(tmp, struct cifs_ses, - smb_ses_list); - if (tmp_ses->vcnum == i) { - free_vc_found = 0; - break; /* found duplicate, try next vcnum */ - } - } - if (free_vc_found) - break; /* we found a vcnumber that will work - use it */ - } - - if (i == 0) - vcnum = 0; /* for most common case, ie if one smb session, use - vc zero. Also for case when no free vcnum, zero - is safest to send (some clients only send zero) */ - else if (free_vc_found == 0) - vcnum = 1; /* we can not reuse vc=0 safely, since some servers - reset all uids on that, but 1 is ok. */ - else - vcnum = i; - ses->vcnum = vcnum; -get_vc_num_exit: - spin_unlock(&cifs_tcp_ses_lock); - - return cpu_to_le16(vcnum); -} - static __u32 cifs_ssetup_hdr(struct cifs_ses *ses, SESSION_SETUP_ANDX *pSMB) { __u32 capabilities = 0; @@ -128,7 +46,7 @@ static __u32 cifs_ssetup_hdr(struct cifs_ses *ses, SESSION_SETUP_ANDX *pSMB) CIFSMaxBufSize + MAX_CIFS_HDR_SIZE - 4, USHRT_MAX)); pSMB->req.MaxMpxCount = cpu_to_le16(ses->server->maxReq); - pSMB->req.VcNumber = get_next_vcnum(ses); + pSMB->req.VcNumber = __constant_cpu_to_le16(1); /* Now no need to set SMBFLG_CASELESS or obsolete CANONICAL PATH */ @@ -582,9 +500,9 @@ select_sectype(struct TCP_Server_Info *server, enum securityEnum requested) return NTLMv2; if (global_secflags & CIFSSEC_MAY_NTLM) return NTLM; - /* Fallthrough */ default: - return Unspecified; + /* Fallthrough to attempt LANMAN authentication next */ + break; } case CIFS_NEGFLAVOR_LANMAN: switch (requested) { diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index eba0efde66d7..edccb5252462 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -687,6 +687,10 @@ SMB2_logoff(const unsigned int xid, struct cifs_ses *ses) else return -EIO; + /* no need to send SMB logoff if uid already closed due to reconnect */ + if (ses->need_reconnect) + goto smb2_session_already_dead; + rc = small_smb2_init(SMB2_LOGOFF, NULL, (void **) &req); if (rc) return rc; @@ -701,6 +705,8 @@ SMB2_logoff(const unsigned int xid, struct cifs_ses *ses) * No tcon so can't do * cifs_stats_inc(&tcon->stats.smb2_stats.smb2_com_fail[SMB2...]); */ + +smb2_session_already_dead: return rc; } diff --git a/fs/cifs/smbfsctl.h b/fs/cifs/smbfsctl.h index d952ee48f4dc..a4b2391fe66e 100644 --- a/fs/cifs/smbfsctl.h +++ b/fs/cifs/smbfsctl.h @@ -97,9 +97,23 @@ #define FSCTL_QUERY_NETWORK_INTERFACE_INFO 0x001401FC /* BB add struct */ #define FSCTL_SRV_READ_HASH 0x001441BB /* BB add struct */ +/* See FSCC 2.1.2.5 */ #define IO_REPARSE_TAG_MOUNT_POINT 0xA0000003 #define IO_REPARSE_TAG_HSM 0xC0000004 #define IO_REPARSE_TAG_SIS 0x80000007 +#define IO_REPARSE_TAG_HSM2 0x80000006 +#define IO_REPARSE_TAG_DRIVER_EXTENDER 0x80000005 +/* Used by the DFS filter. See MS-DFSC */ +#define IO_REPARSE_TAG_DFS 0x8000000A +/* Used by the DFS filter See MS-DFSC */ +#define IO_REPARSE_TAG_DFSR 0x80000012 +#define IO_REPARSE_TAG_FILTER_MANAGER 0x8000000B +/* See section MS-FSCC 2.1.2.4 */ +#define IO_REPARSE_TAG_SYMLINK 0xA000000C +#define IO_REPARSE_TAG_DEDUP 0x80000013 +#define IO_REPARSE_APPXSTREAM 0xC0000014 +/* NFS symlinks, Win 8/SMB3 and later */ +#define IO_REPARSE_TAG_NFS 0x80000014 /* fsctl flags */ /* If Flags is set to this value, the request is an FSCTL not ioctl request */ diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 6fdcb1b4a106..800b938e4061 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -410,8 +410,13 @@ static int wait_for_free_request(struct TCP_Server_Info *server, const int timeout, const int optype) { - return wait_for_free_credits(server, timeout, - server->ops->get_credits_field(server, optype)); + int *val; + + val = server->ops->get_credits_field(server, optype); + /* Since an echo is already inflight, no need to wait to send another */ + if (*val <= 0 && optype == CIFS_ECHO_OP) + return -EAGAIN; + return wait_for_free_credits(server, timeout, val); } static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf, diff --git a/fs/exec.c b/fs/exec.c index 8875dd10ae7a..2ea437e5acf4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1547,6 +1547,7 @@ static int do_execve_common(const char *filename, current->fs->in_exec = 0; current->in_execve = 0; acct_update_integrals(current); + task_numa_free(current); free_bprm(bprm); if (displaced) put_files_struct(displaced); diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 1194b1f0f839..f8cde46de9cd 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -1783,7 +1783,7 @@ retry: d_tmpfile(dentry, inode); err = ext3_orphan_add(handle, inode); if (err) - goto err_drop_inode; + goto err_unlock_inode; mark_inode_dirty(inode); unlock_new_inode(inode); } @@ -1791,10 +1791,9 @@ retry: if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; -err_drop_inode: +err_unlock_inode: ext3_journal_stop(handle); unlock_new_inode(inode); - iput(inode); return err; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0d424d7ac02b..e274e9c1171f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2563,7 +2563,7 @@ retry: break; } blk_finish_plug(&plug); - if (!ret && !cycled) { + if (!ret && !cycled && wbc->nr_to_write > 0) { cycled = 1; mpd.last_page = writeback_index - 1; mpd.first_page = 0; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 1bec5a5c1e45..5a0408d7b114 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2319,7 +2319,7 @@ retry: d_tmpfile(dentry, inode); err = ext4_orphan_add(handle, inode); if (err) - goto err_drop_inode; + goto err_unlock_inode; mark_inode_dirty(inode); unlock_new_inode(inode); } @@ -2328,10 +2328,9 @@ retry: if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; -err_drop_inode: +err_unlock_inode: ext4_journal_stop(handle); unlock_new_inode(inode); - iput(inode); return err; } diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index c081e34f717f..03e9bebba198 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1350,6 +1350,8 @@ retry: s_min_extra_isize) { tried_min_extra_isize++; new_extra_isize = s_min_extra_isize; + kfree(is); is = NULL; + kfree(bs); bs = NULL; goto retry; } error = -1; diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 62b43b577bfc..b7989f2ab4c4 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -182,6 +182,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) struct inode *inode; struct dentry *parent; struct fuse_conn *fc; + struct fuse_inode *fi; int ret; inode = ACCESS_ONCE(entry->d_inode); @@ -228,7 +229,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (!err && !outarg.nodeid) err = -ENOENT; if (!err) { - struct fuse_inode *fi = get_fuse_inode(inode); + fi = get_fuse_inode(inode); if (outarg.nodeid != get_node_id(inode)) { fuse_queue_forget(fc, forget, outarg.nodeid, 1); goto invalid; @@ -246,8 +247,11 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) attr_version); fuse_change_entry_timeout(entry, &outarg); } else if (inode) { - fc = get_fuse_conn(inode); - if (fc->readdirplus_auto) { + fi = get_fuse_inode(inode); + if (flags & LOOKUP_RCU) { + if (test_bit(FUSE_I_INIT_RDPLUS, &fi->state)) + return -ECHILD; + } else if (test_and_clear_bit(FUSE_I_INIT_RDPLUS, &fi->state)) { parent = dget_parent(entry); fuse_advise_use_readdirplus(parent->d_inode); dput(parent); @@ -259,7 +263,8 @@ out: invalid: ret = 0; - if (check_submounts_and_drop(entry) != 0) + + if (!(flags & LOOKUP_RCU) && check_submounts_and_drop(entry) != 0) ret = 1; goto out; } @@ -1063,6 +1068,8 @@ static int fuse_access(struct inode *inode, int mask) struct fuse_access_in inarg; int err; + BUG_ON(mask & MAY_NOT_BLOCK); + if (fc->no_access) return 0; @@ -1150,9 +1157,6 @@ static int fuse_permission(struct inode *inode, int mask) noticed immediately, only after the attribute timeout has expired */ } else if (mask & (MAY_ACCESS | MAY_CHDIR)) { - if (mask & MAY_NOT_BLOCK) - return -ECHILD; - err = fuse_access(inode, mask); } else if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode)) { if (!(inode->i_mode & S_IXUGO)) { @@ -1291,6 +1295,8 @@ static int fuse_direntplus_link(struct file *file, } found: + if (fc->readdirplus_auto) + set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state); fuse_change_entry_timeout(dentry, o); err = 0; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index d409deafc67b..4598345ab87d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2467,6 +2467,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, { struct fuse_file *ff = file->private_data; struct inode *inode = file->f_inode; + struct fuse_inode *fi = get_fuse_inode(inode); struct fuse_conn *fc = ff->fc; struct fuse_req *req; struct fuse_fallocate_in inarg = { @@ -2484,10 +2485,20 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, if (lock_inode) { mutex_lock(&inode->i_mutex); - if (mode & FALLOC_FL_PUNCH_HOLE) - fuse_set_nowrite(inode); + if (mode & FALLOC_FL_PUNCH_HOLE) { + loff_t endbyte = offset + length - 1; + err = filemap_write_and_wait_range(inode->i_mapping, + offset, endbyte); + if (err) + goto out; + + fuse_sync_writes(inode); + } } + if (!(mode & FALLOC_FL_KEEP_SIZE)) + set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); + req = fuse_get_req_nopages(fc); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -2520,11 +2531,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, fuse_invalidate_attr(inode); out: - if (lock_inode) { - if (mode & FALLOC_FL_PUNCH_HOLE) - fuse_release_nowrite(inode); + if (!(mode & FALLOC_FL_KEEP_SIZE)) + clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); + + if (lock_inode) mutex_unlock(&inode->i_mutex); - } return err; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 5ced199b50bb..5b9e6f3b6aef 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -115,6 +115,8 @@ struct fuse_inode { enum { /** Advise readdirplus */ FUSE_I_ADVISE_RDPLUS, + /** Initialized with readdirplus */ + FUSE_I_INIT_RDPLUS, /** An operation changing file size is in progress */ FUSE_I_SIZE_UNSTABLE, }; diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 854a8f05a610..02b0df769e2d 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1458,7 +1458,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, trace_nfs_atomic_open_enter(dir, ctx, open_flags); nfs_block_sillyrename(dentry->d_parent); - inode = NFS_PROTO(dir)->open_context(dir, ctx, open_flags, &attr); + inode = NFS_PROTO(dir)->open_context(dir, ctx, open_flags, &attr, opened); nfs_unblock_sillyrename(dentry->d_parent); if (IS_ERR(inode)) { err = PTR_ERR(inode); diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index e5b804dd944c..77efaf15ec90 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -19,6 +19,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) struct inode *dir; unsigned openflags = filp->f_flags; struct iattr attr; + int opened = 0; int err; /* @@ -55,7 +56,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) nfs_wb_all(inode); } - inode = NFS_PROTO(dir)->open_context(dir, ctx, openflags, &attr); + inode = NFS_PROTO(dir)->open_context(dir, ctx, openflags, &attr, &opened); if (IS_ERR(inode)) { err = PTR_ERR(inode); switch (err) { diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c index 95604f64cab8..c7c295e556ed 100644 --- a/fs/nfs/nfs4filelayoutdev.c +++ b/fs/nfs/nfs4filelayoutdev.c @@ -185,6 +185,7 @@ nfs4_ds_connect(struct nfs_server *mds_srv, struct nfs4_pnfs_ds *ds) if (status) goto out_put; + smp_wmb(); ds->ds_clp = clp; dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr); out: @@ -801,34 +802,35 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx) struct nfs4_file_layout_dsaddr *dsaddr = FILELAYOUT_LSEG(lseg)->dsaddr; struct nfs4_pnfs_ds *ds = dsaddr->ds_list[ds_idx]; struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg); - - if (filelayout_test_devid_unavailable(devid)) - return NULL; + struct nfs4_pnfs_ds *ret = ds; if (ds == NULL) { printk(KERN_ERR "NFS: %s: No data server for offset index %d\n", __func__, ds_idx); filelayout_mark_devid_invalid(devid); - return NULL; + goto out; } + smp_rmb(); if (ds->ds_clp) - return ds; + goto out_test_devid; if (test_and_set_bit(NFS4DS_CONNECTING, &ds->ds_state) == 0) { struct nfs_server *s = NFS_SERVER(lseg->pls_layout->plh_inode); int err; err = nfs4_ds_connect(s, ds); - if (err) { + if (err) nfs4_mark_deviceid_unavailable(devid); - ds = NULL; - } nfs4_clear_ds_conn_bit(ds); } else { /* Either ds is connected, or ds is NULL */ nfs4_wait_ds_connect(ds); } - return ds; +out_test_devid: + if (filelayout_test_devid_unavailable(devid)) + ret = NULL; +out: + return ret; } module_param(dataserver_retrans, uint, 0644); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 989bb9d3074d..d53d6785cba2 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -912,6 +912,7 @@ struct nfs4_opendata { struct iattr attrs; unsigned long timestamp; unsigned int rpc_done : 1; + unsigned int file_created : 1; unsigned int is_recover : 1; int rpc_status; int cancelled; @@ -1946,8 +1947,13 @@ static int _nfs4_proc_open(struct nfs4_opendata *data) nfs_fattr_map_and_free_names(server, &data->f_attr); - if (o_arg->open_flags & O_CREAT) + if (o_arg->open_flags & O_CREAT) { update_changeattr(dir, &o_res->cinfo); + if (o_arg->open_flags & O_EXCL) + data->file_created = 1; + else if (o_res->cinfo.before != o_res->cinfo.after) + data->file_created = 1; + } if ((o_res->rflags & NFS4_OPEN_RESULT_LOCKTYPE_POSIX) == 0) server->caps &= ~NFS_CAP_POSIX_LOCK; if(o_res->rflags & NFS4_OPEN_RESULT_CONFIRM) { @@ -2191,7 +2197,8 @@ static int _nfs4_do_open(struct inode *dir, struct nfs_open_context *ctx, int flags, struct iattr *sattr, - struct nfs4_label *label) + struct nfs4_label *label, + int *opened) { struct nfs4_state_owner *sp; struct nfs4_state *state = NULL; @@ -2261,6 +2268,8 @@ static int _nfs4_do_open(struct inode *dir, nfs_setsecurity(state->inode, opendata->o_res.f_attr, olabel); } } + if (opendata->file_created) + *opened |= FILE_CREATED; if (pnfs_use_threshold(ctx_th, opendata->f_attr.mdsthreshold, server)) *ctx_th = opendata->f_attr.mdsthreshold; @@ -2289,7 +2298,8 @@ static struct nfs4_state *nfs4_do_open(struct inode *dir, struct nfs_open_context *ctx, int flags, struct iattr *sattr, - struct nfs4_label *label) + struct nfs4_label *label, + int *opened) { struct nfs_server *server = NFS_SERVER(dir); struct nfs4_exception exception = { }; @@ -2297,7 +2307,7 @@ static struct nfs4_state *nfs4_do_open(struct inode *dir, int status; do { - status = _nfs4_do_open(dir, ctx, flags, sattr, label); + status = _nfs4_do_open(dir, ctx, flags, sattr, label, opened); res = ctx->state; trace_nfs4_open_file(ctx, flags, status); if (status == 0) @@ -2659,7 +2669,8 @@ out: } static struct inode * -nfs4_atomic_open(struct inode *dir, struct nfs_open_context *ctx, int open_flags, struct iattr *attr) +nfs4_atomic_open(struct inode *dir, struct nfs_open_context *ctx, + int open_flags, struct iattr *attr, int *opened) { struct nfs4_state *state; struct nfs4_label l = {0, 0, 0, NULL}, *label = NULL; @@ -2667,7 +2678,7 @@ nfs4_atomic_open(struct inode *dir, struct nfs_open_context *ctx, int open_flags label = nfs4_label_init_security(dir, ctx->dentry, attr, &l); /* Protect against concurrent sillydeletes */ - state = nfs4_do_open(dir, ctx, open_flags, attr, label); + state = nfs4_do_open(dir, ctx, open_flags, attr, label, opened); nfs4_label_release_security(label); @@ -3332,6 +3343,7 @@ nfs4_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, struct nfs4_label l, *ilabel = NULL; struct nfs_open_context *ctx; struct nfs4_state *state; + int opened = 0; int status = 0; ctx = alloc_nfs_open_context(dentry, FMODE_READ); @@ -3341,7 +3353,7 @@ nfs4_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, ilabel = nfs4_label_init_security(dir, dentry, sattr, &l); sattr->ia_mode &= ~current_umask(); - state = nfs4_do_open(dir, ctx, flags, sattr, ilabel); + state = nfs4_do_open(dir, ctx, flags, sattr, ilabel, &opened); if (IS_ERR(state)) { status = PTR_ERR(state); goto out; @@ -7564,8 +7576,10 @@ nfs41_find_root_sec(struct nfs_server *server, struct nfs_fh *fhandle, { int err; struct page *page; - rpc_authflavor_t flavor; + rpc_authflavor_t flavor = RPC_AUTH_MAXFLAVOR; struct nfs4_secinfo_flavors *flavors; + struct nfs4_secinfo4 *secinfo; + int i; page = alloc_page(GFP_KERNEL); if (!page) { @@ -7587,9 +7601,31 @@ nfs41_find_root_sec(struct nfs_server *server, struct nfs_fh *fhandle, if (err) goto out_freepage; - flavor = nfs_find_best_sec(flavors); - if (err == 0) - err = nfs4_lookup_root_sec(server, fhandle, info, flavor); + for (i = 0; i < flavors->num_flavors; i++) { + secinfo = &flavors->flavors[i]; + + switch (secinfo->flavor) { + case RPC_AUTH_NULL: + case RPC_AUTH_UNIX: + case RPC_AUTH_GSS: + flavor = rpcauth_get_pseudoflavor(secinfo->flavor, + &secinfo->flavor_info); + break; + default: + flavor = RPC_AUTH_MAXFLAVOR; + break; + } + + if (flavor != RPC_AUTH_MAXFLAVOR) { + err = nfs4_lookup_root_sec(server, fhandle, + info, flavor); + if (!err) + break; + } + } + + if (flavor == RPC_AUTH_MAXFLAVOR) + err = -EPERM; out_freepage: put_page(page); diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 0ba679866e50..da276640f776 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -94,6 +94,7 @@ void nilfs_forget_buffer(struct buffer_head *bh) clear_buffer_nilfs_volatile(bh); clear_buffer_nilfs_checked(bh); clear_buffer_nilfs_redirected(bh); + clear_buffer_async_write(bh); clear_buffer_dirty(bh); if (nilfs_page_buffers_clean(page)) __nilfs_clear_page_dirty(page); @@ -429,6 +430,7 @@ void nilfs_clear_dirty_page(struct page *page, bool silent) "discard block %llu, size %zu", (u64)bh->b_blocknr, bh->b_size); } + clear_buffer_async_write(bh); clear_buffer_dirty(bh); clear_buffer_nilfs_volatile(bh); clear_buffer_nilfs_checked(bh); diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index bd88a7461063..9f6b486b6c01 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -665,7 +665,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode, bh = head = page_buffers(page); do { - if (!buffer_dirty(bh)) + if (!buffer_dirty(bh) || buffer_async_write(bh)) continue; get_bh(bh); list_add_tail(&bh->b_assoc_buffers, listp); @@ -699,7 +699,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode, for (i = 0; i < pagevec_count(&pvec); i++) { bh = head = page_buffers(pvec.pages[i]); do { - if (buffer_dirty(bh)) { + if (buffer_dirty(bh) && + !buffer_async_write(bh)) { get_bh(bh); list_add_tail(&bh->b_assoc_buffers, listp); @@ -1579,6 +1580,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + set_buffer_async_write(bh); if (bh->b_page != bd_page) { if (bd_page) { lock_page(bd_page); @@ -1592,6 +1594,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) list_for_each_entry(bh, &segbuf->sb_payload_buffers, b_assoc_buffers) { + set_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { if (bh->b_page != bd_page) { lock_page(bd_page); @@ -1677,6 +1680,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err) list_for_each_entry(segbuf, logs, sb_list) { list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + clear_buffer_async_write(bh); if (bh->b_page != bd_page) { if (bd_page) end_page_writeback(bd_page); @@ -1686,6 +1690,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err) list_for_each_entry(bh, &segbuf->sb_payload_buffers, b_assoc_buffers) { + clear_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { if (bh->b_page != bd_page) { end_page_writeback(bd_page); @@ -1755,6 +1760,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) b_assoc_buffers) { set_buffer_uptodate(bh); clear_buffer_dirty(bh); + clear_buffer_async_write(bh); if (bh->b_page != bd_page) { if (bd_page) end_page_writeback(bd_page); @@ -1776,6 +1782,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) b_assoc_buffers) { set_buffer_uptodate(bh); clear_buffer_dirty(bh); + clear_buffer_async_write(bh); clear_buffer_delay(bh); clear_buffer_nilfs_volatile(bh); clear_buffer_nilfs_redirected(bh); diff --git a/fs/ocfs2/dcache.c b/fs/ocfs2/dcache.c index ef999729e274..0d3a97d2d5f6 100644 --- a/fs/ocfs2/dcache.c +++ b/fs/ocfs2/dcache.c @@ -70,9 +70,10 @@ static int ocfs2_dentry_revalidate(struct dentry *dentry, unsigned int flags) */ if (inode == NULL) { unsigned long gen = (unsigned long) dentry->d_fsdata; - unsigned long pgen = - OCFS2_I(dentry->d_parent->d_inode)->ip_dir_lock_gen; - + unsigned long pgen; + spin_lock(&dentry->d_lock); + pgen = OCFS2_I(dentry->d_parent->d_inode)->ip_dir_lock_gen; + spin_unlock(&dentry->d_lock); trace_ocfs2_dentry_revalidate_negative(dentry->d_name.len, dentry->d_name.name, pgen, gen); diff --git a/fs/proc/array.c b/fs/proc/array.c index cbd0f1b324b9..1bd2077187fd 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -183,6 +183,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, seq_printf(m, "State:\t%s\n" "Tgid:\t%d\n" + "Ngid:\t%d\n" "Pid:\t%d\n" "PPid:\t%d\n" "TracerPid:\t%d\n" @@ -190,6 +191,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, "Gid:\t%d\t%d\t%d\t%d\n", get_task_state(p), task_tgid_nr_ns(p, ns), + task_numa_group_id(p), pid_nr_ns(pid, ns), ppid, tpid, from_kuid_munged(user_ns, cred->uid), diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 9f8ef9b7674d..8eaa1ba793fc 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -288,10 +288,14 @@ static int proc_reg_mmap(struct file *file, struct vm_area_struct *vma) static unsigned long proc_reg_get_unmapped_area(struct file *file, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct proc_dir_entry *pde = PDE(file_inode(file)); - int rv = -EIO; - unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); + unsigned long rv = -EIO; + unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long) = NULL; if (use_pde(pde)) { - get_unmapped_area = pde->proc_fops->get_unmapped_area; +#ifdef CONFIG_MMU + get_unmapped_area = current->mm->get_unmapped_area; +#endif + if (pde->proc_fops->get_unmapped_area) + get_unmapped_area = pde->proc_fops->get_unmapped_area; if (get_unmapped_area) rv = get_unmapped_area(file, orig_addr, len, pgoff, flags); unuse_pde(pde); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7366e9d63cee..390bdab01c3c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -941,6 +941,8 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm, frame = pte_pfn(pte); flags = PM_PRESENT; page = vm_normal_page(vma, addr, pte); + if (pte_soft_dirty(pte)) + flags2 |= __PM_SOFT_DIRTY; } else if (is_swap_pte(pte)) { swp_entry_t entry; if (pte_swp_soft_dirty(pte)) @@ -960,7 +962,7 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm, if (page && !PageAnon(page)) flags |= PM_FILE; - if ((vma->vm_flags & VM_SOFTDIRTY) || pte_soft_dirty(pte)) + if ((vma->vm_flags & VM_SOFTDIRTY)) flags2 |= __PM_SOFT_DIRTY; *pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags); diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c index 73feacc49b2e..fd777032c2ba 100644 --- a/fs/reiserfs/journal.c +++ b/fs/reiserfs/journal.c @@ -1163,21 +1163,6 @@ static struct reiserfs_journal_list *find_newer_jl_for_cn(struct return NULL; } -static int newer_jl_done(struct reiserfs_journal_cnode *cn) -{ - struct super_block *sb = cn->sb; - b_blocknr_t blocknr = cn->blocknr; - - cn = cn->hprev; - while (cn) { - if (cn->sb == sb && cn->blocknr == blocknr && cn->jlist && - atomic_read(&cn->jlist->j_commit_left) != 0) - return 0; - cn = cn->hprev; - } - return 1; -} - static void remove_journal_hash(struct super_block *, struct reiserfs_journal_cnode **, struct reiserfs_journal_list *, unsigned long, @@ -1353,7 +1338,6 @@ static int flush_journal_list(struct super_block *s, reiserfs_warning(s, "clm-2048", "called with wcount %d", atomic_read(&journal->j_wcount)); } - BUG_ON(jl->j_trans_id == 0); /* if flushall == 0, the lock is already held */ if (flushall) { @@ -1593,31 +1577,6 @@ static int flush_journal_list(struct super_block *s, return err; } -static int test_transaction(struct super_block *s, - struct reiserfs_journal_list *jl) -{ - struct reiserfs_journal_cnode *cn; - - if (jl->j_len == 0 || atomic_read(&jl->j_nonzerolen) == 0) - return 1; - - cn = jl->j_realblock; - while (cn) { - /* if the blocknr == 0, this has been cleared from the hash, - ** skip it - */ - if (cn->blocknr == 0) { - goto next; - } - if (cn->bh && !newer_jl_done(cn)) - return 0; - next: - cn = cn->next; - cond_resched(); - } - return 0; -} - static int write_one_transaction(struct super_block *s, struct reiserfs_journal_list *jl, struct buffer_chunk *chunk) @@ -1805,6 +1764,8 @@ static int flush_used_journal_lists(struct super_block *s, break; tjl = JOURNAL_LIST_ENTRY(tjl->j_list.next); } + get_journal_list(jl); + get_journal_list(flush_jl); /* try to find a group of blocks we can flush across all the ** transactions, but only bother if we've actually spanned ** across multiple lists @@ -1813,6 +1774,8 @@ static int flush_used_journal_lists(struct super_block *s, ret = kupdate_transactions(s, jl, &tjl, &trans_id, len, i); } flush_journal_list(s, flush_jl, 1); + put_journal_list(s, flush_jl); + put_journal_list(s, jl); return 0; } @@ -3868,27 +3831,6 @@ int reiserfs_prepare_for_journal(struct super_block *sb, return 1; } -static void flush_old_journal_lists(struct super_block *s) -{ - struct reiserfs_journal *journal = SB_JOURNAL(s); - struct reiserfs_journal_list *jl; - struct list_head *entry; - time_t now = get_seconds(); - - while (!list_empty(&journal->j_journal_list)) { - entry = journal->j_journal_list.next; - jl = JOURNAL_LIST_ENTRY(entry); - /* this check should always be run, to send old lists to disk */ - if (jl->j_timestamp < (now - (JOURNAL_MAX_TRANS_AGE * 4)) && - atomic_read(&jl->j_commit_left) == 0 && - test_transaction(s, jl)) { - flush_used_journal_lists(s, jl); - } else { - break; - } - } -} - /* ** long and ugly. If flush, will not return until all commit ** blocks and all real buffers in the trans are on disk. @@ -4232,7 +4174,6 @@ static int do_journal_end(struct reiserfs_transaction_handle *th, } } } - flush_old_journal_lists(sb); journal->j_current_jl->j_list_bitmap = get_list_bitmap(sb, journal->j_current_jl); diff --git a/fs/statfs.c b/fs/statfs.c index c219e733f553..083dc0ac9140 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -94,7 +94,7 @@ retry: int fd_statfs(int fd, struct kstatfs *st) { - struct fd f = fdget(fd); + struct fd f = fdget_raw(fd); int error = -EBADF; if (f.file) { error = vfs_statfs(&f.file->f_path, st); diff --git a/fs/super.c b/fs/super.c index 3a96c9783a8b..0225c20f8770 100644 --- a/fs/super.c +++ b/fs/super.c @@ -264,6 +264,8 @@ out_free_sb: */ static inline void destroy_super(struct super_block *s) { + list_lru_destroy(&s->s_dentry_lru); + list_lru_destroy(&s->s_inode_lru); #ifdef CONFIG_SMP free_percpu(s->s_files); #endif @@ -323,8 +325,6 @@ void deactivate_locked_super(struct super_block *s) /* caches are now gone, we can safely kill the shrinker now */ unregister_shrinker(&s->s_shrink); - list_lru_destroy(&s->s_dentry_lru); - list_lru_destroy(&s->s_inode_lru); put_filesystem(fs); put_super(s); diff --git a/fs/sysv/super.c b/fs/sysv/super.c index d0c6a007ce83..eda10959714f 100644 --- a/fs/sysv/super.c +++ b/fs/sysv/super.c @@ -487,6 +487,7 @@ static int v7_fill_super(struct super_block *sb, void *data, int silent) sbi->s_sb = sb; sbi->s_block_base = 0; sbi->s_type = FSTYPE_V7; + mutex_init(&sbi->s_lock); sb->s_fs_info = sbi; sb_set_blocksize(sb, 512); diff --git a/fs/udf/ialloc.c b/fs/udf/ialloc.c index 7e5aae4bf46f..6eaf5edf1ea1 100644 --- a/fs/udf/ialloc.c +++ b/fs/udf/ialloc.c @@ -30,18 +30,17 @@ void udf_free_inode(struct inode *inode) { struct super_block *sb = inode->i_sb; struct udf_sb_info *sbi = UDF_SB(sb); + struct logicalVolIntegrityDescImpUse *lvidiu = udf_sb_lvidiu(sb); - mutex_lock(&sbi->s_alloc_mutex); - if (sbi->s_lvid_bh) { - struct logicalVolIntegrityDescImpUse *lvidiu = - udf_sb_lvidiu(sbi); + if (lvidiu) { + mutex_lock(&sbi->s_alloc_mutex); if (S_ISDIR(inode->i_mode)) le32_add_cpu(&lvidiu->numDirs, -1); else le32_add_cpu(&lvidiu->numFiles, -1); udf_updated_lvid(sb); + mutex_unlock(&sbi->s_alloc_mutex); } - mutex_unlock(&sbi->s_alloc_mutex); udf_free_blocks(sb, NULL, &UDF_I(inode)->i_location, 0, 1); } @@ -55,6 +54,7 @@ struct inode *udf_new_inode(struct inode *dir, umode_t mode, int *err) uint32_t start = UDF_I(dir)->i_location.logicalBlockNum; struct udf_inode_info *iinfo; struct udf_inode_info *dinfo = UDF_I(dir); + struct logicalVolIntegrityDescImpUse *lvidiu; inode = new_inode(sb); @@ -92,12 +92,10 @@ struct inode *udf_new_inode(struct inode *dir, umode_t mode, int *err) return NULL; } - if (sbi->s_lvid_bh) { - struct logicalVolIntegrityDescImpUse *lvidiu; - + lvidiu = udf_sb_lvidiu(sb); + if (lvidiu) { iinfo->i_unique = lvid_get_unique_id(sb); mutex_lock(&sbi->s_alloc_mutex); - lvidiu = udf_sb_lvidiu(sbi); if (S_ISDIR(mode)) le32_add_cpu(&lvidiu->numDirs, 1); else diff --git a/fs/udf/super.c b/fs/udf/super.c index 839a2bad7f45..91219385691d 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -94,13 +94,25 @@ static unsigned int udf_count_free(struct super_block *); static int udf_statfs(struct dentry *, struct kstatfs *); static int udf_show_options(struct seq_file *, struct dentry *); -struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct udf_sb_info *sbi) +struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct super_block *sb) { - struct logicalVolIntegrityDesc *lvid = - (struct logicalVolIntegrityDesc *)sbi->s_lvid_bh->b_data; - __u32 number_of_partitions = le32_to_cpu(lvid->numOfPartitions); - __u32 offset = number_of_partitions * 2 * - sizeof(uint32_t)/sizeof(uint8_t); + struct logicalVolIntegrityDesc *lvid; + unsigned int partnum; + unsigned int offset; + + if (!UDF_SB(sb)->s_lvid_bh) + return NULL; + lvid = (struct logicalVolIntegrityDesc *)UDF_SB(sb)->s_lvid_bh->b_data; + partnum = le32_to_cpu(lvid->numOfPartitions); + if ((sb->s_blocksize - sizeof(struct logicalVolIntegrityDescImpUse) - + offsetof(struct logicalVolIntegrityDesc, impUse)) / + (2 * sizeof(uint32_t)) < partnum) { + udf_err(sb, "Logical volume integrity descriptor corrupted " + "(numOfPartitions = %u)!\n", partnum); + return NULL; + } + /* The offset is to skip freeSpaceTable and sizeTable arrays */ + offset = partnum * 2 * sizeof(uint32_t); return (struct logicalVolIntegrityDescImpUse *)&(lvid->impUse[offset]); } @@ -629,9 +641,10 @@ static int udf_remount_fs(struct super_block *sb, int *flags, char *options) struct udf_options uopt; struct udf_sb_info *sbi = UDF_SB(sb); int error = 0; + struct logicalVolIntegrityDescImpUse *lvidiu = udf_sb_lvidiu(sb); - if (sbi->s_lvid_bh) { - int write_rev = le16_to_cpu(udf_sb_lvidiu(sbi)->minUDFWriteRev); + if (lvidiu) { + int write_rev = le16_to_cpu(lvidiu->minUDFWriteRev); if (write_rev > UDF_MAX_WRITE_VERSION && !(*flags & MS_RDONLY)) return -EACCES; } @@ -1905,11 +1918,12 @@ static void udf_open_lvid(struct super_block *sb) if (!bh) return; - - mutex_lock(&sbi->s_alloc_mutex); lvid = (struct logicalVolIntegrityDesc *)bh->b_data; - lvidiu = udf_sb_lvidiu(sbi); + lvidiu = udf_sb_lvidiu(sb); + if (!lvidiu) + return; + mutex_lock(&sbi->s_alloc_mutex); lvidiu->impIdent.identSuffix[0] = UDF_OS_CLASS_UNIX; lvidiu->impIdent.identSuffix[1] = UDF_OS_ID_LINUX; udf_time_to_disk_stamp(&lvid->recordingDateAndTime, @@ -1937,10 +1951,12 @@ static void udf_close_lvid(struct super_block *sb) if (!bh) return; + lvid = (struct logicalVolIntegrityDesc *)bh->b_data; + lvidiu = udf_sb_lvidiu(sb); + if (!lvidiu) + return; mutex_lock(&sbi->s_alloc_mutex); - lvid = (struct logicalVolIntegrityDesc *)bh->b_data; - lvidiu = udf_sb_lvidiu(sbi); lvidiu->impIdent.identSuffix[0] = UDF_OS_CLASS_UNIX; lvidiu->impIdent.identSuffix[1] = UDF_OS_ID_LINUX; udf_time_to_disk_stamp(&lvid->recordingDateAndTime, CURRENT_TIME); @@ -2093,15 +2109,19 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent) if (sbi->s_lvid_bh) { struct logicalVolIntegrityDescImpUse *lvidiu = - udf_sb_lvidiu(sbi); - uint16_t minUDFReadRev = le16_to_cpu(lvidiu->minUDFReadRev); - uint16_t minUDFWriteRev = le16_to_cpu(lvidiu->minUDFWriteRev); - /* uint16_t maxUDFWriteRev = - le16_to_cpu(lvidiu->maxUDFWriteRev); */ + udf_sb_lvidiu(sb); + uint16_t minUDFReadRev; + uint16_t minUDFWriteRev; + if (!lvidiu) { + ret = -EINVAL; + goto error_out; + } + minUDFReadRev = le16_to_cpu(lvidiu->minUDFReadRev); + minUDFWriteRev = le16_to_cpu(lvidiu->minUDFWriteRev); if (minUDFReadRev > UDF_MAX_READ_VERSION) { udf_err(sb, "minUDFReadRev=%x (max is %x)\n", - le16_to_cpu(lvidiu->minUDFReadRev), + minUDFReadRev, UDF_MAX_READ_VERSION); ret = -EINVAL; goto error_out; @@ -2265,11 +2285,7 @@ static int udf_statfs(struct dentry *dentry, struct kstatfs *buf) struct logicalVolIntegrityDescImpUse *lvidiu; u64 id = huge_encode_dev(sb->s_bdev->bd_dev); - if (sbi->s_lvid_bh != NULL) - lvidiu = udf_sb_lvidiu(sbi); - else - lvidiu = NULL; - + lvidiu = udf_sb_lvidiu(sb); buf->f_type = UDF_SUPER_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_blocks = sbi->s_partmaps[sbi->s_partition].s_partition_len; diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h index ed401e94aa8c..1f32c7bd9f57 100644 --- a/fs/udf/udf_sb.h +++ b/fs/udf/udf_sb.h @@ -162,7 +162,7 @@ static inline struct udf_sb_info *UDF_SB(struct super_block *sb) return sb->s_fs_info; } -struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct udf_sb_info *sbi); +struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct super_block *sb); int udf_compute_nr_groups(struct super_block *sb, u32 partition); diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 88c5ea75ebf6..f1d85cfc0a54 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -628,6 +628,7 @@ xfs_buf_item_unlock( else if (aborted) { ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp)); if (lip->li_flags & XFS_LI_IN_AIL) { + spin_lock(&lip->li_ailp->xa_lock); xfs_trans_ail_delete(lip->li_ailp, lip, SHUTDOWN_LOG_IO_ERROR); } diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c index 069537c845e5..20bf8e8002d6 100644 --- a/fs/xfs/xfs_da_btree.c +++ b/fs/xfs/xfs_da_btree.c @@ -1224,6 +1224,7 @@ xfs_da3_node_toosmall( /* start with smaller blk num */ forward = nodehdr.forw < nodehdr.back; for (i = 0; i < 2; forward = !forward, i++) { + struct xfs_da3_icnode_hdr thdr; if (forward) blkno = nodehdr.forw; else @@ -1236,10 +1237,10 @@ xfs_da3_node_toosmall( return(error); node = bp->b_addr; - xfs_da3_node_hdr_from_disk(&nodehdr, node); + xfs_da3_node_hdr_from_disk(&thdr, node); xfs_trans_brelse(state->args->trans, bp); - if (count - nodehdr.count >= 0) + if (count - thdr.count >= 0) break; /* fits with at least 25% to spare */ } if (i >= 2) { diff --git a/fs/xfs/xfs_dir2_block.c b/fs/xfs/xfs_dir2_block.c index 0957aa98b6c0..12dad188939d 100644 --- a/fs/xfs/xfs_dir2_block.c +++ b/fs/xfs/xfs_dir2_block.c @@ -1158,7 +1158,7 @@ xfs_dir2_sf_to_block( /* * Create entry for . */ - dep = xfs_dir3_data_dot_entry_p(hdr); + dep = xfs_dir3_data_dot_entry_p(mp, hdr); dep->inumber = cpu_to_be64(dp->i_ino); dep->namelen = 1; dep->name[0] = '.'; @@ -1172,7 +1172,7 @@ xfs_dir2_sf_to_block( /* * Create entry for .. */ - dep = xfs_dir3_data_dotdot_entry_p(hdr); + dep = xfs_dir3_data_dotdot_entry_p(mp, hdr); dep->inumber = cpu_to_be64(xfs_dir2_sf_get_parent_ino(sfp)); dep->namelen = 2; dep->name[0] = dep->name[1] = '.'; @@ -1183,7 +1183,7 @@ xfs_dir2_sf_to_block( blp[1].hashval = cpu_to_be32(xfs_dir_hash_dotdot); blp[1].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp, (char *)dep - (char *)hdr)); - offset = xfs_dir3_data_first_offset(hdr); + offset = xfs_dir3_data_first_offset(mp); /* * Loop over existing entries, stuff them in. */ diff --git a/fs/xfs/xfs_dir2_format.h b/fs/xfs/xfs_dir2_format.h index a0961a61ac1a..9cf67381adf6 100644 --- a/fs/xfs/xfs_dir2_format.h +++ b/fs/xfs/xfs_dir2_format.h @@ -497,69 +497,58 @@ xfs_dir3_data_unused_p(struct xfs_dir2_data_hdr *hdr) /* * Offsets of . and .. in data space (always block 0) * - * The macros are used for shortform directories as they have no headers to read - * the magic number out of. Shortform directories need to know the size of the - * data block header because the sfe embeds the block offset of the entry into - * it so that it doesn't change when format conversion occurs. Bad Things Happen - * if we don't follow this rule. - * * XXX: there is scope for significant optimisation of the logic here. Right * now we are checking for "dir3 format" over and over again. Ideally we should * only do it once for each operation. */ -#define XFS_DIR3_DATA_DOT_OFFSET(mp) \ - xfs_dir3_data_hdr_size(xfs_sb_version_hascrc(&(mp)->m_sb)) -#define XFS_DIR3_DATA_DOTDOT_OFFSET(mp) \ - (XFS_DIR3_DATA_DOT_OFFSET(mp) + xfs_dir3_data_entsize(mp, 1)) -#define XFS_DIR3_DATA_FIRST_OFFSET(mp) \ - (XFS_DIR3_DATA_DOTDOT_OFFSET(mp) + xfs_dir3_data_entsize(mp, 2)) - static inline xfs_dir2_data_aoff_t -xfs_dir3_data_dot_offset(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_dot_offset(struct xfs_mount *mp) { - return xfs_dir3_data_entry_offset(hdr); + return xfs_dir3_data_hdr_size(xfs_sb_version_hascrc(&mp->m_sb)); } static inline xfs_dir2_data_aoff_t -xfs_dir3_data_dotdot_offset(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_dotdot_offset(struct xfs_mount *mp) { - bool dir3 = hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC) || - hdr->magic == cpu_to_be32(XFS_DIR3_BLOCK_MAGIC); - return xfs_dir3_data_dot_offset(hdr) + - __xfs_dir3_data_entsize(dir3, 1); + return xfs_dir3_data_dot_offset(mp) + + xfs_dir3_data_entsize(mp, 1); } static inline xfs_dir2_data_aoff_t -xfs_dir3_data_first_offset(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_first_offset(struct xfs_mount *mp) { - bool dir3 = hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC) || - hdr->magic == cpu_to_be32(XFS_DIR3_BLOCK_MAGIC); - return xfs_dir3_data_dotdot_offset(hdr) + - __xfs_dir3_data_entsize(dir3, 2); + return xfs_dir3_data_dotdot_offset(mp) + + xfs_dir3_data_entsize(mp, 2); } /* * location of . and .. in data space (always block 0) */ static inline struct xfs_dir2_data_entry * -xfs_dir3_data_dot_entry_p(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_dot_entry_p( + struct xfs_mount *mp, + struct xfs_dir2_data_hdr *hdr) { return (struct xfs_dir2_data_entry *) - ((char *)hdr + xfs_dir3_data_dot_offset(hdr)); + ((char *)hdr + xfs_dir3_data_dot_offset(mp)); } static inline struct xfs_dir2_data_entry * -xfs_dir3_data_dotdot_entry_p(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_dotdot_entry_p( + struct xfs_mount *mp, + struct xfs_dir2_data_hdr *hdr) { return (struct xfs_dir2_data_entry *) - ((char *)hdr + xfs_dir3_data_dotdot_offset(hdr)); + ((char *)hdr + xfs_dir3_data_dotdot_offset(mp)); } static inline struct xfs_dir2_data_entry * -xfs_dir3_data_first_entry_p(struct xfs_dir2_data_hdr *hdr) +xfs_dir3_data_first_entry_p( + struct xfs_mount *mp, + struct xfs_dir2_data_hdr *hdr) { return (struct xfs_dir2_data_entry *) - ((char *)hdr + xfs_dir3_data_first_offset(hdr)); + ((char *)hdr + xfs_dir3_data_first_offset(mp)); } /* diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c index 8993ec17452c..8f84153e98a8 100644 --- a/fs/xfs/xfs_dir2_readdir.c +++ b/fs/xfs/xfs_dir2_readdir.c @@ -119,9 +119,9 @@ xfs_dir2_sf_getdents( * mp->m_dirdatablk. */ dot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk, - XFS_DIR3_DATA_DOT_OFFSET(mp)); + xfs_dir3_data_dot_offset(mp)); dotdot_offset = xfs_dir2_db_off_to_dataptr(mp, mp->m_dirdatablk, - XFS_DIR3_DATA_DOTDOT_OFFSET(mp)); + xfs_dir3_data_dotdot_offset(mp)); /* * Put . entry unless we're starting past it. diff --git a/fs/xfs/xfs_dir2_sf.c b/fs/xfs/xfs_dir2_sf.c index bb6e2848f473..3ef6d402084c 100644 --- a/fs/xfs/xfs_dir2_sf.c +++ b/fs/xfs/xfs_dir2_sf.c @@ -557,7 +557,7 @@ xfs_dir2_sf_addname_hard( * to insert the new entry. * If it's going to end up at the end then oldsfep will point there. */ - for (offset = XFS_DIR3_DATA_FIRST_OFFSET(mp), + for (offset = xfs_dir3_data_first_offset(mp), oldsfep = xfs_dir2_sf_firstentry(oldsfp), add_datasize = xfs_dir3_data_entsize(mp, args->namelen), eof = (char *)oldsfep == &buf[old_isize]; @@ -640,7 +640,7 @@ xfs_dir2_sf_addname_pick( sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; size = xfs_dir3_data_entsize(mp, args->namelen); - offset = XFS_DIR3_DATA_FIRST_OFFSET(mp); + offset = xfs_dir3_data_first_offset(mp); sfep = xfs_dir2_sf_firstentry(sfp); holefit = 0; /* @@ -713,7 +713,7 @@ xfs_dir2_sf_check( mp = dp->i_mount; sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; - offset = XFS_DIR3_DATA_FIRST_OFFSET(mp); + offset = xfs_dir3_data_first_offset(mp); ino = xfs_dir2_sf_get_parent_ino(sfp); i8count = ino > XFS_DIR2_MAX_SHORT_INUM; diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 71520e6e5d65..1ee776d477c3 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -64,7 +64,8 @@ int xfs_dqerror_mod = 33; struct kmem_zone *xfs_qm_dqtrxzone; static struct kmem_zone *xfs_qm_dqzone; -static struct lock_class_key xfs_dquot_other_class; +static struct lock_class_key xfs_dquot_group_class; +static struct lock_class_key xfs_dquot_project_class; /* * This is called to free all the memory associated with a dquot @@ -703,8 +704,20 @@ xfs_qm_dqread( * Make sure group quotas have a different lock class than user * quotas. */ - if (!(type & XFS_DQ_USER)) - lockdep_set_class(&dqp->q_qlock, &xfs_dquot_other_class); + switch (type) { + case XFS_DQ_USER: + /* uses the default lock class */ + break; + case XFS_DQ_GROUP: + lockdep_set_class(&dqp->q_qlock, &xfs_dquot_group_class); + break; + case XFS_DQ_PROJ: + lockdep_set_class(&dqp->q_qlock, &xfs_dquot_project_class); + break; + default: + ASSERT(0); + break; + } XFS_STATS_INC(xs_qm_dquot); diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h index 1edb5cc3e5f4..18272c766a50 100644 --- a/fs/xfs/xfs_fs.h +++ b/fs/xfs/xfs_fs.h @@ -515,7 +515,7 @@ typedef struct xfs_swapext /* XFS_IOC_GETBIOSIZE ---- deprecated 47 */ #define XFS_IOC_GETBMAPX _IOWR('X', 56, struct getbmap) #define XFS_IOC_ZERO_RANGE _IOW ('X', 57, struct xfs_flock64) -#define XFS_IOC_FREE_EOFBLOCKS _IOR ('X', 58, struct xfs_eofblocks) +#define XFS_IOC_FREE_EOFBLOCKS _IOR ('X', 58, struct xfs_fs_eofblocks) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 193206ba4358..474807a401c8 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -119,11 +119,6 @@ xfs_inode_free( ip->i_itemp = NULL; } - /* asserts to verify all state is correct here */ - ASSERT(atomic_read(&ip->i_pincount) == 0); - ASSERT(!spin_is_locked(&ip->i_flags_lock)); - ASSERT(!xfs_isiflocked(ip)); - /* * Because we use RCU freeing we need to ensure the inode always * appears to be reclaimed with an invalid inode number when in the @@ -135,6 +130,10 @@ xfs_inode_free( ip->i_ino = 0; spin_unlock(&ip->i_flags_lock); + /* asserts to verify all state is correct here */ + ASSERT(atomic_read(&ip->i_pincount) == 0); + ASSERT(!xfs_isiflocked(ip)); + call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback); } diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index dabda9521b4b..39797490a1f1 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1585,6 +1585,7 @@ xlog_recover_add_to_trans( "bad number of regions (%d) in inode log format", in_f->ilf_size); ASSERT(0); + kmem_free(ptr); return XFS_ERROR(EIO); } @@ -1970,6 +1971,13 @@ xlog_recover_do_inode_buffer( * magic number. If we don't recognise the magic number in the buffer, then * return a LSN of -1 so that the caller knows it was an unrecognised block and * so can recover the buffer. + * + * Note: we cannot rely solely on magic number matches to determine that the + * buffer has a valid LSN - we also need to verify that it belongs to this + * filesystem, so we need to extract the object's LSN and compare it to that + * which we read from the superblock. If the UUIDs don't match, then we've got a + * stale metadata block from an old filesystem instance that we need to recover + * over the top of. */ static xfs_lsn_t xlog_recover_get_buf_lsn( @@ -1980,6 +1988,8 @@ xlog_recover_get_buf_lsn( __uint16_t magic16; __uint16_t magicda; void *blk = bp->b_addr; + uuid_t *uuid; + xfs_lsn_t lsn = -1; /* v4 filesystems always recover immediately */ if (!xfs_sb_version_hascrc(&mp->m_sb)) @@ -1992,43 +2002,79 @@ xlog_recover_get_buf_lsn( case XFS_ABTB_MAGIC: case XFS_ABTC_MAGIC: case XFS_IBT_CRC_MAGIC: - case XFS_IBT_MAGIC: - return be64_to_cpu( - ((struct xfs_btree_block *)blk)->bb_u.s.bb_lsn); + case XFS_IBT_MAGIC: { + struct xfs_btree_block *btb = blk; + + lsn = be64_to_cpu(btb->bb_u.s.bb_lsn); + uuid = &btb->bb_u.s.bb_uuid; + break; + } case XFS_BMAP_CRC_MAGIC: - case XFS_BMAP_MAGIC: - return be64_to_cpu( - ((struct xfs_btree_block *)blk)->bb_u.l.bb_lsn); + case XFS_BMAP_MAGIC: { + struct xfs_btree_block *btb = blk; + + lsn = be64_to_cpu(btb->bb_u.l.bb_lsn); + uuid = &btb->bb_u.l.bb_uuid; + break; + } case XFS_AGF_MAGIC: - return be64_to_cpu(((struct xfs_agf *)blk)->agf_lsn); + lsn = be64_to_cpu(((struct xfs_agf *)blk)->agf_lsn); + uuid = &((struct xfs_agf *)blk)->agf_uuid; + break; case XFS_AGFL_MAGIC: - return be64_to_cpu(((struct xfs_agfl *)blk)->agfl_lsn); + lsn = be64_to_cpu(((struct xfs_agfl *)blk)->agfl_lsn); + uuid = &((struct xfs_agfl *)blk)->agfl_uuid; + break; case XFS_AGI_MAGIC: - return be64_to_cpu(((struct xfs_agi *)blk)->agi_lsn); + lsn = be64_to_cpu(((struct xfs_agi *)blk)->agi_lsn); + uuid = &((struct xfs_agi *)blk)->agi_uuid; + break; case XFS_SYMLINK_MAGIC: - return be64_to_cpu(((struct xfs_dsymlink_hdr *)blk)->sl_lsn); + lsn = be64_to_cpu(((struct xfs_dsymlink_hdr *)blk)->sl_lsn); + uuid = &((struct xfs_dsymlink_hdr *)blk)->sl_uuid; + break; case XFS_DIR3_BLOCK_MAGIC: case XFS_DIR3_DATA_MAGIC: case XFS_DIR3_FREE_MAGIC: - return be64_to_cpu(((struct xfs_dir3_blk_hdr *)blk)->lsn); + lsn = be64_to_cpu(((struct xfs_dir3_blk_hdr *)blk)->lsn); + uuid = &((struct xfs_dir3_blk_hdr *)blk)->uuid; + break; case XFS_ATTR3_RMT_MAGIC: - return be64_to_cpu(((struct xfs_attr3_rmt_hdr *)blk)->rm_lsn); + lsn = be64_to_cpu(((struct xfs_attr3_rmt_hdr *)blk)->rm_lsn); + uuid = &((struct xfs_attr3_rmt_hdr *)blk)->rm_uuid; + break; case XFS_SB_MAGIC: - return be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn); + lsn = be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn); + uuid = &((struct xfs_dsb *)blk)->sb_uuid; + break; default: break; } + if (lsn != (xfs_lsn_t)-1) { + if (!uuid_equal(&mp->m_sb.sb_uuid, uuid)) + goto recover_immediately; + return lsn; + } + magicda = be16_to_cpu(((struct xfs_da_blkinfo *)blk)->magic); switch (magicda) { case XFS_DIR3_LEAF1_MAGIC: case XFS_DIR3_LEAFN_MAGIC: case XFS_DA3_NODE_MAGIC: - return be64_to_cpu(((struct xfs_da3_blkinfo *)blk)->lsn); + lsn = be64_to_cpu(((struct xfs_da3_blkinfo *)blk)->lsn); + uuid = &((struct xfs_da3_blkinfo *)blk)->uuid; + break; default: break; } + if (lsn != (xfs_lsn_t)-1) { + if (!uuid_equal(&mp->m_sb.sb_uuid, uuid)) + goto recover_immediately; + return lsn; + } + /* * We do individual object checks on dquot and inode buffers as they * have their own individual LSN records. Also, we could have a stale diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 02e113bb8b7d..d9019821aa60 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -311,7 +311,6 @@ struct acpi_device { unsigned int physical_node_count; struct list_head physical_node_list; struct mutex physical_node_lock; - struct list_head power_dependent; void (*remove)(struct acpi_device *); }; @@ -456,8 +455,6 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, acpi_status acpi_remove_pm_notifier(struct acpi_device *adev, acpi_notify_handler handler); int acpi_pm_device_sleep_state(struct device *, int *, int); -void acpi_dev_pm_add_dependent(acpi_handle handle, struct device *depdev); -void acpi_dev_pm_remove_dependent(acpi_handle handle, struct device *depdev); #else static inline acpi_status acpi_add_pm_notifier(struct acpi_device *adev, acpi_notify_handler handler, @@ -478,10 +475,6 @@ static inline int acpi_pm_device_sleep_state(struct device *d, int *p, int m) return (m >= ACPI_STATE_D0 && m <= ACPI_STATE_D3_COLD) ? m : ACPI_STATE_D0; } -static inline void acpi_dev_pm_add_dependent(acpi_handle handle, - struct device *depdev) {} -static inline void acpi_dev_pm_remove_dependent(acpi_handle handle, - struct device *depdev) {} #endif #ifdef CONFIG_PM_RUNTIME diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d06079c774a0..99b490b4d05a 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -6,12 +6,12 @@ static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) return mk_pte(page, pgprot); } -static inline int huge_pte_write(pte_t pte) +static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); } -static inline int huge_pte_dirty(pte_t pte) +static inline unsigned long huge_pte_dirty(pte_t pte) { return pte_dirty(pte); } diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h new file mode 100644 index 000000000000..ddf2b420ac8f --- /dev/null +++ b/include/asm-generic/preempt.h @@ -0,0 +1,105 @@ +#ifndef __ASM_PREEMPT_H +#define __ASM_PREEMPT_H + +#include + +/* + * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users + * that think a non-zero value indicates we cannot preempt. + */ +static __always_inline int preempt_count(void) +{ + return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED; +} + +static __always_inline int *preempt_count_ptr(void) +{ + return ¤t_thread_info()->preempt_count; +} + +/* + * We now loose PREEMPT_NEED_RESCHED and cause an extra reschedule; however the + * alternative is loosing a reschedule. Better schedule too often -- also this + * should be a very rare operation. + */ +static __always_inline void preempt_count_set(int pc) +{ + *preempt_count_ptr() = pc; +} + +/* + * must be macros to avoid header recursion hell + */ +#define task_preempt_count(p) \ + (task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED) + +#define init_task_preempt_count(p) do { \ + task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \ +} while (0) + +#define init_idle_preempt_count(p, cpu) do { \ + task_thread_info(p)->preempt_count = PREEMPT_ENABLED; \ +} while (0) + +/* + * We fold the NEED_RESCHED bit into the preempt count such that + * preempt_enable() can decrement and test for needing to reschedule with a + * single instruction. + * + * We invert the actual bit, so that when the decrement hits 0 we know we both + * need to resched (the bit is cleared) and can resched (no preempt count). + */ + +static __always_inline void set_preempt_need_resched(void) +{ + *preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED; +} + +static __always_inline void clear_preempt_need_resched(void) +{ + *preempt_count_ptr() |= PREEMPT_NEED_RESCHED; +} + +static __always_inline bool test_preempt_need_resched(void) +{ + return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED); +} + +/* + * The various preempt_count add/sub methods + */ + +static __always_inline void __preempt_count_add(int val) +{ + *preempt_count_ptr() += val; +} + +static __always_inline void __preempt_count_sub(int val) +{ + *preempt_count_ptr() -= val; +} + +static __always_inline bool __preempt_count_dec_and_test(void) +{ + return !--*preempt_count_ptr(); +} + +/* + * Returns true when we need to resched and can (barring IRQ state). + */ +static __always_inline bool should_resched(void) +{ + return unlikely(!*preempt_count_ptr()); +} + +#ifdef CONFIG_PREEMPT +extern asmlinkage void preempt_schedule(void); +#define __preempt_schedule() preempt_schedule() + +#ifdef CONFIG_CONTEXT_TRACKING +extern asmlinkage void preempt_schedule_context(void); +#define __preempt_schedule_context() preempt_schedule_context() +#endif +#endif /* CONFIG_PREEMPT */ + +#endif /* __ASM_PREEMPT_H */ diff --git a/include/asm-generic/vtime.h b/include/asm-generic/vtime.h index e69de29bb2d1..b1a49677fe25 100644 --- a/include/asm-generic/vtime.h +++ b/include/asm-generic/vtime.h @@ -0,0 +1 @@ +/* no content, but patch(1) dislikes empty files */ diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h index 93b7f96f9c59..6d26b40cbf5d 100644 --- a/include/clocksource/arm_arch_timer.h +++ b/include/clocksource/arm_arch_timer.h @@ -33,6 +33,16 @@ enum arch_timer_reg { #define ARCH_TIMER_MEM_PHYS_ACCESS 2 #define ARCH_TIMER_MEM_VIRT_ACCESS 3 +#define ARCH_TIMER_USR_PCT_ACCESS_EN (1 << 0) /* physical counter */ +#define ARCH_TIMER_USR_VCT_ACCESS_EN (1 << 1) /* virtual counter */ +#define ARCH_TIMER_VIRT_EVT_EN (1 << 2) +#define ARCH_TIMER_EVT_TRIGGER_SHIFT (4) +#define ARCH_TIMER_EVT_TRIGGER_MASK (0xF << ARCH_TIMER_EVT_TRIGGER_SHIFT) +#define ARCH_TIMER_USR_VT_ACCESS_EN (1 << 8) /* virtual timer registers */ +#define ARCH_TIMER_USR_PT_ACCESS_EN (1 << 9) /* physical timer registers */ + +#define ARCH_TIMER_EVT_STREAM_FREQ 10000 /* 100us */ + #ifdef CONFIG_ARM_ARCH_TIMER extern u32 arch_timer_get_rate(void); diff --git a/include/dt-bindings/pinctrl/omap.h b/include/dt-bindings/pinctrl/omap.h index edbd250809cb..bed35e36fd27 100644 --- a/include/dt-bindings/pinctrl/omap.h +++ b/include/dt-bindings/pinctrl/omap.h @@ -23,7 +23,7 @@ #define PULL_UP (1 << 4) #define ALTELECTRICALSEL (1 << 5) -/* 34xx specific mux bit defines */ +/* omap3/4/5 specific mux bit defines */ #define INPUT_EN (1 << 8) #define OFF_EN (1 << 9) #define OFFOUT_EN (1 << 10) @@ -31,8 +31,6 @@ #define OFF_PULL_EN (1 << 12) #define OFF_PULL_UP (1 << 13) #define WAKEUP_EN (1 << 14) - -/* 44xx specific mux bit defines */ #define WAKEUP_EVENT (1 << 15) /* Active pin states */ diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h index f7f1d7169b11..089743ade734 100644 --- a/include/linux/balloon_compaction.h +++ b/include/linux/balloon_compaction.h @@ -158,6 +158,26 @@ static inline bool balloon_page_movable(struct page *page) return false; } +/* + * isolated_balloon_page - identify an isolated balloon page on private + * compaction/migration page lists. + * + * After a compaction thread isolates a balloon page for migration, it raises + * the page refcount to prevent concurrent compaction threads from re-isolating + * the same page. For that reason putback_movable_pages(), or other routines + * that need to identify isolated balloon pages on private pagelists, cannot + * rely on balloon_page_movable() to accomplish the task. + */ +static inline bool isolated_balloon_page(struct page *page) +{ + /* Already isolated balloon pages, by default, have a raised refcount */ + if (page_flags_cleared(page) && !page_mapped(page) && + page_count(page) >= 2) + return __is_movable_balloon_page(page); + + return false; +} + /* * balloon_page_insert - insert a page into the balloon's page list and make * the page->mapping assignment accordingly. @@ -243,6 +263,11 @@ static inline bool balloon_page_movable(struct page *page) return false; } +static inline bool isolated_balloon_page(struct page *page) +{ + return false; +} + static inline bool balloon_page_isolate(struct page *page) { return false; diff --git a/include/linux/bcma/bcma_driver_pci.h b/include/linux/bcma/bcma_driver_pci.h index d66033f418c9..0333e605ea0d 100644 --- a/include/linux/bcma/bcma_driver_pci.h +++ b/include/linux/bcma/bcma_driver_pci.h @@ -242,6 +242,7 @@ extern int bcma_core_pci_irq_ctl(struct bcma_drv_pci *pc, struct bcma_device *core, bool enable); extern void bcma_core_pci_up(struct bcma_bus *bus); extern void bcma_core_pci_down(struct bcma_bus *bus); +extern void bcma_core_pci_power_save(struct bcma_bus *bus, bool up); extern int bcma_core_pci_pcibios_map_irq(const struct pci_dev *dev); extern int bcma_core_pci_plat_dev_init(struct pci_dev *dev); diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 0857922e8ad0..493aa021c7a9 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -60,6 +60,7 @@ enum clock_event_mode { * Core shall set the interrupt affinity dynamically in broadcast mode */ #define CLOCK_EVT_FEAT_DYNIRQ 0x000020 +#define CLOCK_EVT_FEAT_PERCPU 0x000040 /** * struct clock_event_device - clock event device descriptor diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index dbbf8aa7731b..67301a405712 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -292,6 +292,8 @@ extern void clocksource_resume(void); extern struct clocksource * __init __weak clocksource_default_clock(void); extern void clocksource_mark_unstable(struct clocksource *cs); +extern u64 +clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask); extern void clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec); diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h index 842de225055f..ded429966c1f 100644 --- a/include/linux/compiler-gcc4.h +++ b/include/linux/compiler-gcc4.h @@ -65,6 +65,21 @@ #define __visible __attribute__((externally_visible)) #endif +/* + * GCC 'asm goto' miscompiles certain code sequences: + * + * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 + * + * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. + * Fixed in GCC 4.8.2 and later versions. + * + * (asm goto is automatically volatile - the naming reflects this.) + */ +#if GCC_VERSION <= 40801 +# define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) +#else +# define asm_volatile_goto(x...) do { asm goto(x); } while (0) +#endif #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP #if GCC_VERSION >= 40400 diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 653073de09e3..ed419c62dde1 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -406,13 +406,14 @@ int dm_noflush_suspending(struct dm_target *ti); union map_info *dm_get_mapinfo(struct bio *bio); union map_info *dm_get_rq_mapinfo(struct request *rq); +struct queue_limits *dm_get_queue_limits(struct mapped_device *md); + /* * Geometry functions. */ int dm_get_geometry(struct mapped_device *md, struct hd_geometry *geo); int dm_set_geometry(struct mapped_device *md, struct hd_geometry *geo); - /*----------------------------------------------------------------- * Functions for manipulating device-mapper tables. *---------------------------------------------------------------*/ diff --git a/include/linux/efi.h b/include/linux/efi.h index 5f8f176154f7..bc5687d0f315 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -39,6 +39,8 @@ typedef unsigned long efi_status_t; typedef u8 efi_bool_t; typedef u16 efi_char16_t; /* UNICODE character */ +typedef u64 efi_physical_addr_t; +typedef void *efi_handle_t; typedef struct { @@ -96,6 +98,7 @@ typedef struct { #define EFI_MEMORY_DESCRIPTOR_VERSION 1 #define EFI_PAGE_SHIFT 12 +#define EFI_PAGE_SIZE (1UL << EFI_PAGE_SHIFT) typedef struct { u32 type; @@ -157,11 +160,13 @@ typedef struct { efi_table_hdr_t hdr; void *raise_tpl; void *restore_tpl; - void *allocate_pages; - void *free_pages; - void *get_memory_map; - void *allocate_pool; - void *free_pool; + efi_status_t (*allocate_pages)(int, int, unsigned long, + efi_physical_addr_t *); + efi_status_t (*free_pages)(efi_physical_addr_t, unsigned long); + efi_status_t (*get_memory_map)(unsigned long *, void *, unsigned long *, + unsigned long *, u32 *); + efi_status_t (*allocate_pool)(int, unsigned long, void **); + efi_status_t (*free_pool)(void *); void *create_event; void *set_timer; void *wait_for_event; @@ -171,7 +176,7 @@ typedef struct { void *install_protocol_interface; void *reinstall_protocol_interface; void *uninstall_protocol_interface; - void *handle_protocol; + efi_status_t (*handle_protocol)(efi_handle_t, efi_guid_t *, void **); void *__reserved; void *register_protocol_notify; void *locate_handle; @@ -181,7 +186,7 @@ typedef struct { void *start_image; void *exit; void *unload_image; - void *exit_boot_services; + efi_status_t (*exit_boot_services)(efi_handle_t, unsigned long); void *get_next_monotonic_count; void *stall; void *set_watchdog_timer; @@ -404,6 +409,12 @@ typedef struct { unsigned long table; } efi_config_table_t; +typedef struct { + efi_guid_t guid; + const char *name; + unsigned long *ptr; +} efi_config_table_type_t; + #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL) #define EFI_2_30_SYSTEM_TABLE_REVISION ((2 << 16) | (30)) @@ -488,10 +499,6 @@ typedef struct { unsigned long unload; } efi_loaded_image_t; -typedef struct { - u64 revision; - void *open_volume; -} efi_file_io_interface_t; typedef struct { u64 size; @@ -504,20 +511,30 @@ typedef struct { efi_char16_t filename[1]; } efi_file_info_t; -typedef struct { +typedef struct _efi_file_handle { u64 revision; - void *open; - void *close; + efi_status_t (*open)(struct _efi_file_handle *, + struct _efi_file_handle **, + efi_char16_t *, u64, u64); + efi_status_t (*close)(struct _efi_file_handle *); void *delete; - void *read; + efi_status_t (*read)(struct _efi_file_handle *, unsigned long *, + void *); void *write; void *get_position; void *set_position; - void *get_info; + efi_status_t (*get_info)(struct _efi_file_handle *, efi_guid_t *, + unsigned long *, void *); void *set_info; void *flush; } efi_file_handle_t; +typedef struct _efi_file_io_interface { + u64 revision; + int (*open_volume)(struct _efi_file_io_interface *, + efi_file_handle_t **); +} efi_file_io_interface_t; + #define EFI_FILE_MODE_READ 0x0000000000000001 #define EFI_FILE_MODE_WRITE 0x0000000000000002 #define EFI_FILE_MODE_CREATE 0x8000000000000000 @@ -552,6 +569,7 @@ extern struct efi { efi_get_next_high_mono_count_t *get_next_high_mono_count; efi_reset_system_t *reset_system; efi_set_virtual_address_map_t *set_virtual_address_map; + struct efi_memory_map *memmap; } efi; static inline int @@ -587,6 +605,7 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned lon } #endif extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr); +extern int efi_config_init(efi_config_table_type_t *arch_tables); extern u64 efi_get_iobase (void); extern u32 efi_mem_type (unsigned long phys_addr); extern u64 efi_mem_attributes (unsigned long phys_addr); @@ -784,6 +803,13 @@ struct efivar_entry { struct kobject kobj; }; + +struct efi_simple_text_output_protocol { + void *reset; + efi_status_t (*output_string)(void *, void *); + void *test_string; +}; + extern struct list_head efivar_sysfs_list; static inline void diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 1e041063b226..d9cf963ac832 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -33,7 +33,7 @@ extern void rcu_nmi_exit(void); #define __irq_enter() \ do { \ account_irq_enter_time(current); \ - add_preempt_count(HARDIRQ_OFFSET); \ + preempt_count_add(HARDIRQ_OFFSET); \ trace_hardirq_enter(); \ } while (0) @@ -49,7 +49,7 @@ extern void irq_enter(void); do { \ trace_hardirq_exit(); \ account_irq_exit_time(current); \ - sub_preempt_count(HARDIRQ_OFFSET); \ + preempt_count_sub(HARDIRQ_OFFSET); \ } while (0) /* @@ -62,7 +62,7 @@ extern void irq_exit(void); lockdep_off(); \ ftrace_nmi_enter(); \ BUG_ON(in_nmi()); \ - add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \ + preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \ rcu_nmi_enter(); \ trace_hardirq_enter(); \ } while (0) @@ -72,7 +72,7 @@ extern void irq_exit(void); trace_hardirq_exit(); \ rcu_nmi_exit(); \ BUG_ON(!in_nmi()); \ - sub_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \ + preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \ ftrace_nmi_exit(); \ lockdep_on(); \ } while (0) diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index a3b8b2e2d244..d98503bde7e9 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -30,10 +30,13 @@ /* * Framework version for util services. */ +#define UTIL_FW_MINOR 0 + +#define UTIL_WS2K8_FW_MAJOR 1 +#define UTIL_WS2K8_FW_VERSION (UTIL_WS2K8_FW_MAJOR << 16 | UTIL_FW_MINOR) #define UTIL_FW_MAJOR 3 -#define UTIL_FW_MINOR 0 -#define UTIL_FW_MAJOR_MINOR (UTIL_FW_MAJOR << 16 | UTIL_FW_MINOR) +#define UTIL_FW_VERSION (UTIL_FW_MAJOR << 16 | UTIL_FW_MINOR) /* diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 78e2ada50cd5..d380c5e68008 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -55,7 +55,7 @@ #define DMAR_IQT_REG 0x88 /* Invalidation queue tail register */ #define DMAR_IQ_SHIFT 4 /* Invalidation queue head/tail shift */ #define DMAR_IQA_REG 0x90 /* Invalidation queue addr register */ -#define DMAR_ICS_REG 0x98 /* Invalidation complete status register */ +#define DMAR_ICS_REG 0x9c /* Invalidation complete status register */ #define DMAR_IRTA_REG 0xb8 /* Interrupt remapping table addr register */ #define OFFSET_STRIDE (9) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 5e865b554940..c9e831dc80bc 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -19,6 +19,7 @@ #include #include +#include /* * These correspond to the IORESOURCE_IRQ_* defines in @@ -374,6 +375,16 @@ struct softirq_action asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); + +#ifdef __ARCH_HAS_DO_SOFTIRQ +void do_softirq_own_stack(void); +#else +static inline void do_softirq_own_stack(void) +{ + __do_softirq(); +} +#endif + extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); extern void __raise_softirq_irqoff(unsigned int nr); diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 482ad2d84a32..672ddc4de4af 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -439,6 +439,17 @@ static inline char *hex_byte_pack(char *buf, u8 byte) return buf; } +extern const char hex_asc_upper[]; +#define hex_asc_upper_lo(x) hex_asc_upper[((x) & 0x0f)] +#define hex_asc_upper_hi(x) hex_asc_upper[((x) & 0xf0) >> 4] + +static inline char *hex_byte_pack_upper(char *buf, u8 byte) +{ + *buf++ = hex_asc_upper_hi(byte); + *buf++ = hex_asc_upper_lo(byte); + return buf; +} + static inline char * __deprecated pack_hex_byte(char *buf, u8 byte) { return hex_byte_pack(buf, byte); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ecc82b37c4cc..b3e7a667e03c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -137,47 +137,24 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, extern void mem_cgroup_replace_page_cache(struct page *oldpage, struct page *newpage); -/** - * mem_cgroup_toggle_oom - toggle the memcg OOM killer for the current task - * @new: true to enable, false to disable - * - * Toggle whether a failed memcg charge should invoke the OOM killer - * or just return -ENOMEM. Returns the previous toggle state. - * - * NOTE: Any path that enables the OOM killer before charging must - * call mem_cgroup_oom_synchronize() afterward to finalize the - * OOM handling and clean up. - */ -static inline bool mem_cgroup_toggle_oom(bool new) +static inline void mem_cgroup_oom_enable(void) { - bool old; - - old = current->memcg_oom.may_oom; - current->memcg_oom.may_oom = new; - - return old; + WARN_ON(current->memcg_oom.may_oom); + current->memcg_oom.may_oom = 1; } -static inline void mem_cgroup_enable_oom(void) +static inline void mem_cgroup_oom_disable(void) { - bool old = mem_cgroup_toggle_oom(true); - - WARN_ON(old == true); -} - -static inline void mem_cgroup_disable_oom(void) -{ - bool old = mem_cgroup_toggle_oom(false); - - WARN_ON(old == false); + WARN_ON(!current->memcg_oom.may_oom); + current->memcg_oom.may_oom = 0; } static inline bool task_in_memcg_oom(struct task_struct *p) { - return p->memcg_oom.in_memcg_oom; + return p->memcg_oom.memcg; } -bool mem_cgroup_oom_synchronize(void); +bool mem_cgroup_oom_synchronize(bool wait); #ifdef CONFIG_MEMCG_SWAP extern int do_swap_account; @@ -402,16 +379,11 @@ static inline void mem_cgroup_end_update_page_stat(struct page *page, { } -static inline bool mem_cgroup_toggle_oom(bool new) -{ - return false; -} - -static inline void mem_cgroup_enable_oom(void) +static inline void mem_cgroup_oom_enable(void) { } -static inline void mem_cgroup_disable_oom(void) +static inline void mem_cgroup_oom_disable(void) { } @@ -420,7 +392,7 @@ static inline bool task_in_memcg_oom(struct task_struct *p) return false; } -static inline bool mem_cgroup_oom_synchronize(void) +static inline bool mem_cgroup_oom_synchronize(bool wait) { return false; } diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index da6716b9e3fe..ea4d2495c646 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -136,6 +136,7 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, struct mempolicy *get_vma_policy(struct task_struct *tsk, struct vm_area_struct *vma, unsigned long addr); +bool vma_policy_mof(struct task_struct *task, struct vm_area_struct *vma); extern void numa_default_policy(void); extern void numa_policy_init(void); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 8d3c57fdf221..f5096b58b20d 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -90,11 +90,12 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_NUMA_BALANCING -extern int migrate_misplaced_page(struct page *page, int node); -extern int migrate_misplaced_page(struct page *page, int node); +extern int migrate_misplaced_page(struct page *page, + struct vm_area_struct *vma, int node); extern bool migrate_ratelimited(int node); #else -static inline int migrate_misplaced_page(struct page *page, int node) +static inline int migrate_misplaced_page(struct page *page, + struct vm_area_struct *vma, int node) { return -EAGAIN; /* can't migrate now */ } diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index 09c2300ddb37..cb358355ef43 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -45,6 +45,7 @@ #define MAPPER_CTRL_MINOR 236 #define LOOP_CTRL_MINOR 237 #define VHOST_NET_MINOR 238 +#define UHID_MINOR 239 #define MISC_DYNAMIC_MINOR 255 struct device; diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 68029b30c3dc..5eb4e31af22b 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -181,7 +181,7 @@ enum { MLX5_DEV_CAP_FLAG_TLP_HINTS = 1LL << 39, MLX5_DEV_CAP_FLAG_SIG_HAND_OVER = 1LL << 40, MLX5_DEV_CAP_FLAG_DCT = 1LL << 41, - MLX5_DEV_CAP_FLAG_CMDIF_CSUM = 1LL << 46, + MLX5_DEV_CAP_FLAG_CMDIF_CSUM = 3LL << 46, }; enum { @@ -417,7 +417,7 @@ struct mlx5_init_seg { struct health_buffer health; __be32 rsvd2[884]; __be32 health_counter; - __be32 rsvd3[1023]; + __be32 rsvd3[1019]; __be64 ieee1588_clk; __be32 ieee1588_clk_type; __be32 clr_intx; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 8888381fc150..6b8c496572c8 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -82,7 +82,7 @@ enum { }; enum { - MLX5_MAX_EQ_NAME = 20 + MLX5_MAX_EQ_NAME = 32 }; enum { @@ -747,8 +747,7 @@ static inline u32 mlx5_idx_to_mkey(u32 mkey_idx) enum { MLX5_PROF_MASK_QP_SIZE = (u64)1 << 0, - MLX5_PROF_MASK_CMDIF_CSUM = (u64)1 << 1, - MLX5_PROF_MASK_MR_CACHE = (u64)1 << 2, + MLX5_PROF_MASK_MR_CACHE = (u64)1 << 1, }; enum { @@ -758,7 +757,6 @@ enum { struct mlx5_profile { u64 mask; u32 log_max_qp; - int cmdif_csum; struct { int size; int limit; diff --git a/include/linux/mm.h b/include/linux/mm.h index 8b6e55ee8855..81443d557a2e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -581,11 +581,11 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) * sets it, so none of the operations on it need to be atomic. */ -/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_NID] | ... | FLAGS | */ +/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */ #define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH) #define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH) #define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) -#define LAST_NID_PGOFF (ZONES_PGOFF - LAST_NID_WIDTH) +#define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) /* * Define the bit shifts to access each section. For non-existent @@ -595,7 +595,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) #define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0)) #define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0)) #define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0)) -#define LAST_NID_PGSHIFT (LAST_NID_PGOFF * (LAST_NID_WIDTH != 0)) +#define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0)) /* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */ #ifdef NODE_NOT_IN_PAGE_FLAGS @@ -617,7 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) #define NODES_MASK ((1UL << NODES_WIDTH) - 1) #define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1) -#define LAST_NID_MASK ((1UL << LAST_NID_WIDTH) - 1) +#define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_WIDTH) - 1) #define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1) static inline enum zone_type page_zonenum(const struct page *page) @@ -661,51 +661,117 @@ static inline int page_to_nid(const struct page *page) #endif #ifdef CONFIG_NUMA_BALANCING -#ifdef LAST_NID_NOT_IN_PAGE_FLAGS -static inline int page_nid_xchg_last(struct page *page, int nid) +static inline int cpu_pid_to_cpupid(int cpu, int pid) { - return xchg(&page->_last_nid, nid); + return ((cpu & LAST__CPU_MASK) << LAST__PID_SHIFT) | (pid & LAST__PID_MASK); } -static inline int page_nid_last(struct page *page) +static inline int cpupid_to_pid(int cpupid) { - return page->_last_nid; + return cpupid & LAST__PID_MASK; } -static inline void page_nid_reset_last(struct page *page) + +static inline int cpupid_to_cpu(int cpupid) { - page->_last_nid = -1; + return (cpupid >> LAST__PID_SHIFT) & LAST__CPU_MASK; } -#else -static inline int page_nid_last(struct page *page) + +static inline int cpupid_to_nid(int cpupid) { - return (page->flags >> LAST_NID_PGSHIFT) & LAST_NID_MASK; + return cpu_to_node(cpupid_to_cpu(cpupid)); } -extern int page_nid_xchg_last(struct page *page, int nid); +static inline bool cpupid_pid_unset(int cpupid) +{ + return cpupid_to_pid(cpupid) == (-1 & LAST__PID_MASK); +} -static inline void page_nid_reset_last(struct page *page) +static inline bool cpupid_cpu_unset(int cpupid) { - int nid = (1 << LAST_NID_SHIFT) - 1; + return cpupid_to_cpu(cpupid) == (-1 & LAST__CPU_MASK); +} - page->flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT); - page->flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT; +static inline bool __cpupid_match_pid(pid_t task_pid, int cpupid) +{ + return (task_pid & LAST__PID_MASK) == cpupid_to_pid(cpupid); +} + +#define cpupid_match_pid(task, cpupid) __cpupid_match_pid(task->pid, cpupid) +#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS +static inline int page_cpupid_xchg_last(struct page *page, int cpupid) +{ + return xchg(&page->_last_cpupid, cpupid); +} + +static inline int page_cpupid_last(struct page *page) +{ + return page->_last_cpupid; +} +static inline void page_cpupid_reset_last(struct page *page) +{ + page->_last_cpupid = -1; } -#endif /* LAST_NID_NOT_IN_PAGE_FLAGS */ #else -static inline int page_nid_xchg_last(struct page *page, int nid) +static inline int page_cpupid_last(struct page *page) { - return page_to_nid(page); + return (page->flags >> LAST_CPUPID_PGSHIFT) & LAST_CPUPID_MASK; } -static inline int page_nid_last(struct page *page) +extern int page_cpupid_xchg_last(struct page *page, int cpupid); + +static inline void page_cpupid_reset_last(struct page *page) { - return page_to_nid(page); + int cpupid = (1 << LAST_CPUPID_SHIFT) - 1; + + page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT); + page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT; +} +#endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */ +#else /* !CONFIG_NUMA_BALANCING */ +static inline int page_cpupid_xchg_last(struct page *page, int cpupid) +{ + return page_to_nid(page); /* XXX */ } -static inline void page_nid_reset_last(struct page *page) +static inline int page_cpupid_last(struct page *page) { + return page_to_nid(page); /* XXX */ } -#endif + +static inline int cpupid_to_nid(int cpupid) +{ + return -1; +} + +static inline int cpupid_to_pid(int cpupid) +{ + return -1; +} + +static inline int cpupid_to_cpu(int cpupid) +{ + return -1; +} + +static inline int cpu_pid_to_cpupid(int nid, int pid) +{ + return -1; +} + +static inline bool cpupid_pid_unset(int cpupid) +{ + return 1; +} + +static inline void page_cpupid_reset_last(struct page *page) +{ +} + +static inline bool cpupid_match_pid(struct task_struct *task, int cpupid) +{ + return false; +} +#endif /* CONFIG_NUMA_BALANCING */ static inline struct zone *page_zone(const struct page *page) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d9851eeb6e1d..a3198e5aaf4e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -174,8 +174,8 @@ struct page { void *shadow; #endif -#ifdef LAST_NID_NOT_IN_PAGE_FLAGS - int _last_nid; +#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS + int _last_cpupid; #endif } /* @@ -420,28 +420,15 @@ struct mm_struct { */ unsigned long numa_next_scan; - /* numa_next_reset is when the PTE scanner period will be reset */ - unsigned long numa_next_reset; - /* Restart point for scanning and setting pte_numa */ unsigned long numa_scan_offset; /* numa_scan_seq prevents two threads setting pte_numa */ int numa_scan_seq; - - /* - * The first node a task was scheduled on. If a task runs on - * a different node than Make PTE Scan Go Now. - */ - int first_nid; #endif struct uprobes_state uprobes_state; }; -/* first nid will either be a valid NID or one of these values */ -#define NUMA_PTE_SCAN_INIT -1 -#define NUMA_PTE_SCAN_ACTIVE -2 - static inline void mm_init_cpumask(struct mm_struct *mm) { #ifdef CONFIG_CPUMASK_OFFSTACK diff --git a/include/linux/mutex.h b/include/linux/mutex.h index ccd4260834c5..bab49da8a0f0 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -15,8 +15,8 @@ #include #include #include - #include +#include /* * Simple, straightforward mutexes with strict semantics: @@ -175,8 +175,8 @@ extern void mutex_unlock(struct mutex *lock); extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); -#ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX -#define arch_mutex_cpu_relax() cpu_relax() +#ifndef arch_mutex_cpu_relax +# define arch_mutex_cpu_relax() cpu_relax() #endif #endif diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 01fd84b566f7..49f52c8f4422 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1455,7 +1455,8 @@ struct nfs_rpc_ops { struct inode * (*open_context) (struct inode *dir, struct nfs_open_context *ctx, int open_flags, - struct iattr *iattr); + struct iattr *iattr, + int *); int (*have_delegation)(struct inode *, fmode_t); int (*return_delegation)(struct inode *); struct nfs_client *(*alloc_client) (const struct nfs_client_initdata *); diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index 535cecf1e02f..fcd63baee5f2 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -1,8 +1,6 @@ #ifndef __OF_IRQ_H #define __OF_IRQ_H -#if defined(CONFIG_OF) -struct of_irq; #include #include #include @@ -10,14 +8,6 @@ struct of_irq; #include #include -/* - * irq_of_parse_and_map() is used by all OF enabled platforms; but SPARC - * implements it differently. However, the prototype is the same for all, - * so declare it here regardless of the CONFIG_OF_IRQ setting. - */ -extern unsigned int irq_of_parse_and_map(struct device_node *node, int index); - -#if defined(CONFIG_OF_IRQ) /** * of_irq - container for device_node/irq_specifier pair for an irq controller * @controller: pointer to interrupt controller device tree node @@ -71,11 +61,17 @@ extern int of_irq_to_resource(struct device_node *dev, int index, extern int of_irq_count(struct device_node *dev); extern int of_irq_to_resource_table(struct device_node *dev, struct resource *res, int nr_irqs); -extern struct device_node *of_irq_find_parent(struct device_node *child); extern void of_irq_init(const struct of_device_id *matches); -#endif /* CONFIG_OF_IRQ */ +#if defined(CONFIG_OF) +/* + * irq_of_parse_and_map() is used by all OF enabled platforms; but SPARC + * implements it differently. However, the prototype is the same for all, + * so declare it here regardless of the CONFIG_OF_IRQ setting. + */ +extern unsigned int irq_of_parse_and_map(struct device_node *node, int index); +extern struct device_node *of_irq_find_parent(struct device_node *child); #else /* !CONFIG_OF */ static inline unsigned int irq_of_parse_and_map(struct device_node *dev, diff --git a/include/linux/of_reserved_mem.h b/include/linux/of_reserved_mem.h deleted file mode 100644 index c84128255814..000000000000 --- a/include/linux/of_reserved_mem.h +++ /dev/null @@ -1,14 +0,0 @@ -#ifndef __OF_RESERVED_MEM_H -#define __OF_RESERVED_MEM_H - -#ifdef CONFIG_OF_RESERVED_MEM -void of_reserved_mem_device_init(struct device *dev); -void of_reserved_mem_device_release(struct device *dev); -void early_init_dt_scan_reserved_mem(void); -#else -static inline void of_reserved_mem_device_init(struct device *dev) { } -static inline void of_reserved_mem_device_release(struct device *dev) { } -static inline void early_init_dt_scan_reserved_mem(void) { } -#endif - -#endif /* __OF_RESERVED_MEM_H */ diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index 93506a114034..da523661500a 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -38,10 +38,10 @@ * The last is when there is insufficient space in page->flags and a separate * lookup is necessary. * - * No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS | - * " plus space for last_nid: | NODE | ZONE | LAST_NID ... | FLAGS | - * classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS | - * " plus space for last_nid: | SECTION | NODE | ZONE | LAST_NID ... | FLAGS | + * No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS | + * " plus space for last_cpupid: | NODE | ZONE | LAST_CPUPID ... | FLAGS | + * classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS | + * " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS | * classic sparse no space for node: | SECTION | ZONE | ... | FLAGS | */ #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) @@ -62,15 +62,21 @@ #endif #ifdef CONFIG_NUMA_BALANCING -#define LAST_NID_SHIFT NODES_SHIFT +#define LAST__PID_SHIFT 8 +#define LAST__PID_MASK ((1 << LAST__PID_SHIFT)-1) + +#define LAST__CPU_SHIFT NR_CPUS_BITS +#define LAST__CPU_MASK ((1 << LAST__CPU_SHIFT)-1) + +#define LAST_CPUPID_SHIFT (LAST__PID_SHIFT+LAST__CPU_SHIFT) #else -#define LAST_NID_SHIFT 0 +#define LAST_CPUPID_SHIFT 0 #endif -#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_NID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS -#define LAST_NID_WIDTH LAST_NID_SHIFT +#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS +#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else -#define LAST_NID_WIDTH 0 +#define LAST_CPUPID_WIDTH 0 #endif /* @@ -81,8 +87,8 @@ #define NODE_NOT_IN_PAGE_FLAGS #endif -#if defined(CONFIG_NUMA_BALANCING) && LAST_NID_WIDTH == 0 -#define LAST_NID_NOT_IN_PAGE_FLAGS +#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0 +#define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif #endif /* _LINUX_PAGE_FLAGS_LAYOUT */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 866e85c5eb94..2e069d1288df 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -294,9 +294,31 @@ struct ring_buffer; */ struct perf_event { #ifdef CONFIG_PERF_EVENTS - struct list_head group_entry; + /* + * entry onto perf_event_context::event_list; + * modifications require ctx->lock + * RCU safe iterations. + */ struct list_head event_entry; + + /* + * XXX: group_entry and sibling_list should be mutually exclusive; + * either you're a sibling on a group, or you're the group leader. + * Rework the code to always use the same list element. + * + * Locked for modification by both ctx->mutex and ctx->lock; holding + * either sufficies for read. + */ + struct list_head group_entry; struct list_head sibling_list; + + /* + * We need storage to track the entries in perf_pmu_migrate_context; we + * cannot use the event_entry because of RCU and we want to keep the + * group in tact which avoids us using the other two entries. + */ + struct list_head migrate_entry; + struct hlist_node hlist_entry; int nr_siblings; int group_flags; @@ -562,6 +584,10 @@ struct perf_sample_data { struct perf_regs_user regs_user; u64 stack_user_size; u64 weight; + /* + * Transaction flags for abort events: + */ + u64 txn; }; static inline void perf_sample_data_init(struct perf_sample_data *data, @@ -577,6 +603,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->stack_user_size = 0; data->weight = 0; data->data_src.val = 0; + data->txn = 0; } extern void perf_output_sample(struct perf_output_handle *handle, diff --git a/include/linux/preempt.h b/include/linux/preempt.h index f5d4723cdb3d..a3d9dc8c2c00 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -6,106 +6,95 @@ * preempt_count (used for kernel preemption, interrupt count, etc.) */ -#include #include #include -#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) - extern void add_preempt_count(int val); - extern void sub_preempt_count(int val); -#else -# define add_preempt_count(val) do { preempt_count() += (val); } while (0) -# define sub_preempt_count(val) do { preempt_count() -= (val); } while (0) -#endif - -#define inc_preempt_count() add_preempt_count(1) -#define dec_preempt_count() sub_preempt_count(1) - -#define preempt_count() (current_thread_info()->preempt_count) - -#ifdef CONFIG_PREEMPT - -asmlinkage void preempt_schedule(void); - -#define preempt_check_resched() \ -do { \ - if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ - preempt_schedule(); \ -} while (0) - -#ifdef CONFIG_CONTEXT_TRACKING +/* + * We use the MSB mostly because its available; see for + * the other bits -- can't include that header due to inclusion hell. + */ +#define PREEMPT_NEED_RESCHED 0x80000000 -void preempt_schedule_context(void); +#include -#define preempt_check_resched_context() \ -do { \ - if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ - preempt_schedule_context(); \ -} while (0) +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) +extern void preempt_count_add(int val); +extern void preempt_count_sub(int val); +#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); }) #else +#define preempt_count_add(val) __preempt_count_add(val) +#define preempt_count_sub(val) __preempt_count_sub(val) +#define preempt_count_dec_and_test() __preempt_count_dec_and_test() +#endif -#define preempt_check_resched_context() preempt_check_resched() - -#endif /* CONFIG_CONTEXT_TRACKING */ - -#else /* !CONFIG_PREEMPT */ - -#define preempt_check_resched() do { } while (0) -#define preempt_check_resched_context() do { } while (0) - -#endif /* CONFIG_PREEMPT */ +#define __preempt_count_inc() __preempt_count_add(1) +#define __preempt_count_dec() __preempt_count_sub(1) +#define preempt_count_inc() preempt_count_add(1) +#define preempt_count_dec() preempt_count_sub(1) #ifdef CONFIG_PREEMPT_COUNT #define preempt_disable() \ do { \ - inc_preempt_count(); \ + preempt_count_inc(); \ barrier(); \ } while (0) #define sched_preempt_enable_no_resched() \ do { \ barrier(); \ - dec_preempt_count(); \ + preempt_count_dec(); \ } while (0) -#define preempt_enable_no_resched() sched_preempt_enable_no_resched() +#define preempt_enable_no_resched() sched_preempt_enable_no_resched() +#ifdef CONFIG_PREEMPT #define preempt_enable() \ do { \ - preempt_enable_no_resched(); \ barrier(); \ - preempt_check_resched(); \ + if (unlikely(preempt_count_dec_and_test())) \ + __preempt_schedule(); \ +} while (0) + +#define preempt_check_resched() \ +do { \ + if (should_resched()) \ + __preempt_schedule(); \ } while (0) -/* For debugging and tracer internals only! */ -#define add_preempt_count_notrace(val) \ - do { preempt_count() += (val); } while (0) -#define sub_preempt_count_notrace(val) \ - do { preempt_count() -= (val); } while (0) -#define inc_preempt_count_notrace() add_preempt_count_notrace(1) -#define dec_preempt_count_notrace() sub_preempt_count_notrace(1) +#else +#define preempt_enable() preempt_enable_no_resched() +#define preempt_check_resched() do { } while (0) +#endif #define preempt_disable_notrace() \ do { \ - inc_preempt_count_notrace(); \ + __preempt_count_inc(); \ barrier(); \ } while (0) #define preempt_enable_no_resched_notrace() \ do { \ barrier(); \ - dec_preempt_count_notrace(); \ + __preempt_count_dec(); \ } while (0) -/* preempt_check_resched is OK to trace */ +#ifdef CONFIG_PREEMPT + +#ifndef CONFIG_CONTEXT_TRACKING +#define __preempt_schedule_context() __preempt_schedule() +#endif + #define preempt_enable_notrace() \ do { \ - preempt_enable_no_resched_notrace(); \ barrier(); \ - preempt_check_resched_context(); \ + if (unlikely(__preempt_count_dec_and_test())) \ + __preempt_schedule_context(); \ } while (0) +#else +#define preempt_enable_notrace() preempt_enable_no_resched_notrace() +#endif #else /* !CONFIG_PREEMPT_COUNT */ @@ -115,10 +104,11 @@ do { \ * that can cause faults and scheduling migrate into our preempt-protected * region. */ -#define preempt_disable() barrier() +#define preempt_disable() barrier() #define sched_preempt_enable_no_resched() barrier() -#define preempt_enable_no_resched() barrier() -#define preempt_enable() barrier() +#define preempt_enable_no_resched() barrier() +#define preempt_enable() barrier() +#define preempt_check_resched() do { } while (0) #define preempt_disable_notrace() barrier() #define preempt_enable_no_resched_notrace() barrier() diff --git a/include/linux/random.h b/include/linux/random.h index 3b9377d6b7a5..6312dd9ba449 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -17,6 +17,7 @@ extern void add_interrupt_randomness(int irq, int irq_flags); extern void get_random_bytes(void *buf, int nbytes); extern void get_random_bytes_arch(void *buf, int nbytes); void generate_random_uuid(unsigned char uuid_out[16]); +extern int random_int_secret_init(void); #ifndef MODULE extern const struct file_operations random_fops, urandom_fops; diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 4106721c4e5e..45a0a9e81478 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -18,6 +18,21 @@ * be used anywhere you would want to use a list_empty_rcu(). */ +/* + * INIT_LIST_HEAD_RCU - Initialize a list_head visible to RCU readers + * @list: list to be initialized + * + * You should instead use INIT_LIST_HEAD() for normal initialization and + * cleanup tasks, when readers have no access to the list being initialized. + * However, if the list being initialized is visible to readers, you + * need to keep the compiler from being too mischievous. + */ +static inline void INIT_LIST_HEAD_RCU(struct list_head *list) +{ + ACCESS_ONCE(list->next) = list; + ACCESS_ONCE(list->prev) = list; +} + /* * return the ->next pointer of a list_head in an rcu safe * way, we must not access it directly @@ -191,9 +206,13 @@ static inline void list_splice_init_rcu(struct list_head *list, if (list_empty(list)) return; - /* "first" and "last" tracking list, so initialize it. */ + /* + * "first" and "last" tracking list, so initialize it. RCU readers + * have access to this list, so we must use INIT_LIST_HEAD_RCU() + * instead of INIT_LIST_HEAD(). + */ - INIT_LIST_HEAD(list); + INIT_LIST_HEAD_RCU(list); /* * At this point, the list body still points to the source list. diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index f1f1bc39346b..39cbb889e20d 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -261,6 +261,10 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, rcu_irq_exit(); \ } while (0) +#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) +extern bool __rcu_is_watching(void); +#endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ + /* * Infrastructure to implement the synchronize_() primitives in * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. @@ -297,10 +301,6 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head) } #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ -#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SMP) -extern int rcu_is_cpu_idle(void); -#endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SMP) */ - #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) bool rcu_lockdep_current_cpu_online(void); #else /* #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */ @@ -351,7 +351,7 @@ static inline int rcu_read_lock_held(void) { if (!debug_lockdep_rcu_enabled()) return 1; - if (rcu_is_cpu_idle()) + if (!rcu_is_watching()) return 0; if (!rcu_lockdep_current_cpu_online()) return 0; @@ -402,7 +402,7 @@ static inline int rcu_read_lock_sched_held(void) if (!debug_lockdep_rcu_enabled()) return 1; - if (rcu_is_cpu_idle()) + if (!rcu_is_watching()) return 0; if (!rcu_lockdep_current_cpu_online()) return 0; @@ -771,7 +771,7 @@ static inline void rcu_read_lock(void) __rcu_read_lock(); __acquire(RCU); rcu_lock_acquire(&rcu_lock_map); - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_lock() used illegally while idle"); } @@ -792,7 +792,7 @@ static inline void rcu_read_lock(void) */ static inline void rcu_read_unlock(void) { - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_unlock() used illegally while idle"); rcu_lock_release(&rcu_lock_map); __release(RCU); @@ -821,7 +821,7 @@ static inline void rcu_read_lock_bh(void) local_bh_disable(); __acquire(RCU_BH); rcu_lock_acquire(&rcu_bh_lock_map); - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_lock_bh() used illegally while idle"); } @@ -832,7 +832,7 @@ static inline void rcu_read_lock_bh(void) */ static inline void rcu_read_unlock_bh(void) { - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_unlock_bh() used illegally while idle"); rcu_lock_release(&rcu_bh_lock_map); __release(RCU_BH); @@ -857,7 +857,7 @@ static inline void rcu_read_lock_sched(void) preempt_disable(); __acquire(RCU_SCHED); rcu_lock_acquire(&rcu_sched_lock_map); - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_lock_sched() used illegally while idle"); } @@ -875,7 +875,7 @@ static inline notrace void rcu_read_lock_sched_notrace(void) */ static inline void rcu_read_unlock_sched(void) { - rcu_lockdep_assert(!rcu_is_cpu_idle(), + rcu_lockdep_assert(rcu_is_watching(), "rcu_read_unlock_sched() used illegally while idle"); rcu_lock_release(&rcu_sched_lock_map); __release(RCU_SCHED); diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index e31005ee339e..09ebcbe9fd78 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -132,4 +132,21 @@ static inline void rcu_scheduler_starting(void) } #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */ +#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) + +static inline bool rcu_is_watching(void) +{ + return __rcu_is_watching(); +} + +#else /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */ + +static inline bool rcu_is_watching(void) +{ + return true; +} + + +#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */ + #endif /* __LINUX_RCUTINY_H */ diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 226169d1bd2b..4b9c81548742 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -90,4 +90,6 @@ extern void exit_rcu(void); extern void rcu_scheduler_starting(void); extern int rcu_scheduler_active __read_mostly; +extern bool rcu_is_watching(void); + #endif /* __LINUX_RCUTREE_H */ diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h index 67e13aa5a478..9bdad43ad228 100644 --- a/include/linux/regulator/driver.h +++ b/include/linux/regulator/driver.h @@ -40,6 +40,8 @@ enum regulator_status { }; /** + * struct regulator_linear_range - specify linear voltage ranges + * * Specify a range of voltages for regulator_map_linar_range() and * regulator_list_linear_range(). * diff --git a/include/linux/sched.h b/include/linux/sched.h index 6682da36b293..045b0d227846 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -22,6 +22,7 @@ struct sched_param { #include #include #include +#include #include #include @@ -427,6 +428,14 @@ struct task_cputime { .sum_exec_runtime = 0, \ } +#define PREEMPT_ENABLED (PREEMPT_NEED_RESCHED) + +#ifdef CONFIG_PREEMPT_COUNT +#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED) +#else +#define PREEMPT_DISABLED PREEMPT_ENABLED +#endif + /* * Disable preemption until the scheduler is running. * Reset by start_kernel()->sched_init()->init_idle(). @@ -434,7 +443,7 @@ struct task_cputime { * We include PREEMPT_ACTIVE to avoid cond_resched() from working * before the scheduler is active -- see should_resched(). */ -#define INIT_PREEMPT_COUNT (1 + PREEMPT_ACTIVE) +#define INIT_PREEMPT_COUNT (PREEMPT_DISABLED + PREEMPT_ACTIVE) /** * struct thread_group_cputimer - thread group interval timer counts @@ -768,6 +777,7 @@ enum cpu_idle_type { #define SD_ASYM_PACKING 0x0800 /* Place busy groups earlier in the domain */ #define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */ #define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */ +#define SD_NUMA 0x4000 /* cross-node balancing */ extern int __weak arch_sd_sibiling_asym_packing(void); @@ -811,6 +821,10 @@ struct sched_domain { u64 last_update; + /* idle_balance() stats */ + u64 max_newidle_lb_cost; + unsigned long next_decay_max_lb_cost; + #ifdef CONFIG_SCHEDSTATS /* load_balance() stats */ unsigned int lb_count[CPU_MAX_IDLE_TYPES]; @@ -1029,6 +1043,8 @@ struct task_struct { struct task_struct *last_wakee; unsigned long wakee_flips; unsigned long wakee_flip_decay_ts; + + int wake_cpu; #endif int on_rq; @@ -1324,10 +1340,41 @@ struct task_struct { #endif #ifdef CONFIG_NUMA_BALANCING int numa_scan_seq; - int numa_migrate_seq; unsigned int numa_scan_period; + unsigned int numa_scan_period_max; + int numa_preferred_nid; + int numa_migrate_deferred; + unsigned long numa_migrate_retry; u64 node_stamp; /* migration stamp */ struct callback_head numa_work; + + struct list_head numa_entry; + struct numa_group *numa_group; + + /* + * Exponential decaying average of faults on a per-node basis. + * Scheduling placement decisions are made based on the these counts. + * The values remain static for the duration of a PTE scan + */ + unsigned long *numa_faults; + unsigned long total_numa_faults; + + /* + * numa_faults_buffer records faults per node during the current + * scan window. When the scan completes, the counts in numa_faults + * decay and these values are copied. + */ + unsigned long *numa_faults_buffer; + + /* + * numa_faults_locality tracks if faults recorded during the last + * scan window were remote/local. The task scan period is adapted + * based on the locality of the faults with different weights + * depending on whether they were shared or private faults + */ + unsigned long numa_faults_locality[2]; + + unsigned long numa_pages_migrated; #endif /* CONFIG_NUMA_BALANCING */ struct rcu_head rcu; @@ -1394,11 +1441,10 @@ struct task_struct { } memcg_batch; unsigned int memcg_kmem_skip_account; struct memcg_oom_info { + struct mem_cgroup *memcg; + gfp_t gfp_mask; + int order; unsigned int may_oom:1; - unsigned int in_memcg_oom:1; - unsigned int oom_locked:1; - int wakeups; - struct mem_cgroup *wait_on_memcg; } memcg_oom; #endif #ifdef CONFIG_UPROBES @@ -1413,16 +1459,33 @@ struct task_struct { /* Future-safe accessor for struct task_struct's cpus_allowed. */ #define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed) +#define TNF_MIGRATED 0x01 +#define TNF_NO_GROUP 0x02 +#define TNF_SHARED 0x04 +#define TNF_FAULT_LOCAL 0x08 + #ifdef CONFIG_NUMA_BALANCING -extern void task_numa_fault(int node, int pages, bool migrated); +extern void task_numa_fault(int last_node, int node, int pages, int flags); +extern pid_t task_numa_group_id(struct task_struct *p); extern void set_numabalancing_state(bool enabled); +extern void task_numa_free(struct task_struct *p); + +extern unsigned int sysctl_numa_balancing_migrate_deferred; #else -static inline void task_numa_fault(int node, int pages, bool migrated) +static inline void task_numa_fault(int last_node, int node, int pages, + int flags) { } +static inline pid_t task_numa_group_id(struct task_struct *p) +{ + return 0; +} static inline void set_numabalancing_state(bool enabled) { } +static inline void task_numa_free(struct task_struct *p) +{ +} #endif static inline struct pid *task_pid(struct task_struct *task) @@ -1975,7 +2038,7 @@ extern void wake_up_new_task(struct task_struct *tsk); #else static inline void kick_process(struct task_struct *tsk) { } #endif -extern void sched_fork(struct task_struct *p); +extern void sched_fork(unsigned long clone_flags, struct task_struct *p); extern void sched_dead(struct task_struct *p); extern void proc_caches_init(void); @@ -2402,11 +2465,6 @@ static inline int signal_pending_state(long state, struct task_struct *p) return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p); } -static inline int need_resched(void) -{ - return unlikely(test_thread_flag(TIF_NEED_RESCHED)); -} - /* * cond_resched() and cond_resched_lock(): latency reduction via * explicit rescheduling in places that are safe. The return @@ -2475,36 +2533,105 @@ static inline int tsk_is_polling(struct task_struct *p) { return task_thread_info(p)->status & TS_POLLING; } -static inline void current_set_polling(void) +static inline void __current_set_polling(void) { current_thread_info()->status |= TS_POLLING; } -static inline void current_clr_polling(void) +static inline bool __must_check current_set_polling_and_test(void) +{ + __current_set_polling(); + + /* + * Polling state must be visible before we test NEED_RESCHED, + * paired by resched_task() + */ + smp_mb(); + + return unlikely(tif_need_resched()); +} + +static inline void __current_clr_polling(void) { current_thread_info()->status &= ~TS_POLLING; - smp_mb__after_clear_bit(); +} + +static inline bool __must_check current_clr_polling_and_test(void) +{ + __current_clr_polling(); + + /* + * Polling state must be visible before we test NEED_RESCHED, + * paired by resched_task() + */ + smp_mb(); + + return unlikely(tif_need_resched()); } #elif defined(TIF_POLLING_NRFLAG) static inline int tsk_is_polling(struct task_struct *p) { return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG); } -static inline void current_set_polling(void) + +static inline void __current_set_polling(void) { set_thread_flag(TIF_POLLING_NRFLAG); } -static inline void current_clr_polling(void) +static inline bool __must_check current_set_polling_and_test(void) +{ + __current_set_polling(); + + /* + * Polling state must be visible before we test NEED_RESCHED, + * paired by resched_task() + * + * XXX: assumes set/clear bit are identical barrier wise. + */ + smp_mb__after_clear_bit(); + + return unlikely(tif_need_resched()); +} + +static inline void __current_clr_polling(void) { clear_thread_flag(TIF_POLLING_NRFLAG); } + +static inline bool __must_check current_clr_polling_and_test(void) +{ + __current_clr_polling(); + + /* + * Polling state must be visible before we test NEED_RESCHED, + * paired by resched_task() + */ + smp_mb__after_clear_bit(); + + return unlikely(tif_need_resched()); +} + #else static inline int tsk_is_polling(struct task_struct *p) { return 0; } -static inline void current_set_polling(void) { } -static inline void current_clr_polling(void) { } +static inline void __current_set_polling(void) { } +static inline void __current_clr_polling(void) { } + +static inline bool __must_check current_set_polling_and_test(void) +{ + return unlikely(tif_need_resched()); +} +static inline bool __must_check current_clr_polling_and_test(void) +{ + return unlikely(tif_need_resched()); +} #endif +static __always_inline bool need_resched(void) +{ + return unlikely(tif_need_resched()); +} + /* * Thread group CPU time accounting. */ @@ -2546,6 +2673,11 @@ static inline unsigned int task_cpu(const struct task_struct *p) return task_thread_info(p)->cpu; } +static inline int task_node(const struct task_struct *p) +{ + return cpu_to_node(task_cpu(p)); +} + extern void set_task_cpu(struct task_struct *p, unsigned int cpu); #else diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index bf8086b2506e..41467f8ff8ec 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -2,8 +2,8 @@ #define _SCHED_SYSCTL_H #ifdef CONFIG_DETECT_HUNG_TASK +extern int sysctl_hung_task_check_count; extern unsigned int sysctl_hung_task_panic; -extern unsigned long sysctl_hung_task_check_count; extern unsigned long sysctl_hung_task_timeout_secs; extern unsigned long sysctl_hung_task_warnings; extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, @@ -47,7 +47,6 @@ extern enum sched_tunable_scaling sysctl_sched_tunable_scaling; extern unsigned int sysctl_numa_balancing_scan_delay; extern unsigned int sysctl_numa_balancing_scan_period_min; extern unsigned int sysctl_numa_balancing_scan_period_max; -extern unsigned int sysctl_numa_balancing_scan_period_reset; extern unsigned int sysctl_numa_balancing_scan_size; extern unsigned int sysctl_numa_balancing_settle_count; diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h index fa7922c80a41..cddf0c2940b6 100644 --- a/include/linux/sched_clock.h +++ b/include/linux/sched_clock.h @@ -15,7 +15,7 @@ static inline void sched_clock_postinit(void) { } #endif extern void setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate); - -extern unsigned long long (*sched_clock_func)(void); +extern void sched_clock_register(u64 (*read)(void), int bits, + unsigned long rate); #endif diff --git a/include/linux/sfi.h b/include/linux/sfi.h index fe817918b30e..d9b436f09925 100644 --- a/include/linux/sfi.h +++ b/include/linux/sfi.h @@ -59,6 +59,9 @@ #ifndef _LINUX_SFI_H #define _LINUX_SFI_H +#include +#include + /* Table signatures reserved by the SFI specification */ #define SFI_SIG_SYST "SYST" #define SFI_SIG_FREQ "FREQ" diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2ddb48d9312c..c2d89335f637 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -498,7 +498,7 @@ struct sk_buff { * headers if needed */ __u8 encapsulation:1; - /* 7/9 bit hole (depending on ndisc_nodetype presence) */ + /* 6/8 bit hole (depending on ndisc_nodetype presence) */ kmemcheck_bitfield_end(flags2); #if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 3b5e910d14ca..d2abbdb8c6aa 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -28,6 +28,7 @@ struct cpu_stop_work { }; int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg); +int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg); void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg, struct cpu_stop_work *work_buf); int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index e7e04736802f..fddbe2023a5d 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -104,8 +104,21 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag) #define test_thread_flag(flag) \ test_ti_thread_flag(current_thread_info(), flag) -#define set_need_resched() set_thread_flag(TIF_NEED_RESCHED) -#define clear_need_resched() clear_thread_flag(TIF_NEED_RESCHED) +static inline __deprecated void set_need_resched(void) +{ + /* + * Use of this function in deprecated. + * + * As of this writing there are only a few users in the DRM tree left + * all of which are wrong and can be removed without causing too much + * grief. + * + * The DRM people are aware and are working on removing the last few + * instances. + */ +} + +#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED) #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK /* diff --git a/include/linux/timex.h b/include/linux/timex.h index dd3edd7dfc94..9d3f1a5b6178 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -64,6 +64,20 @@ #include +#ifndef random_get_entropy +/* + * The random_get_entropy() function is used by the /dev/random driver + * in order to extract entropy via the relative unpredictability of + * when an interrupt takes places versus a high speed, fine-grained + * timing source or cycle counter. Since it will be occurred on every + * single interrupt, it must have a very low cost/overhead. + * + * By default we use get_cycles() for this purpose, but individual + * architectures may override this in their asm/timex.h header file. + */ +#define random_get_entropy() get_cycles() +#endif + /* * SHIFT_PLL is used as a dampening factor to define how much we * adjust the frequency correction for a given offset in PLL mode. diff --git a/include/linux/topology.h b/include/linux/topology.h index d3cf0d6e7712..12ae6ce997d6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -106,6 +106,8 @@ int arch_update_cpu_topology(void); .last_balance = jiffies, \ .balance_interval = 1, \ .smt_gain = 1178, /* 15% */ \ + .max_newidle_lb_cost = 0, \ + .next_decay_max_lb_cost = jiffies, \ } #endif #endif /* CONFIG_SCHED_SMT */ @@ -135,6 +137,8 @@ int arch_update_cpu_topology(void); , \ .last_balance = jiffies, \ .balance_interval = 1, \ + .max_newidle_lb_cost = 0, \ + .next_decay_max_lb_cost = jiffies, \ } #endif #endif /* CONFIG_SCHED_MC */ @@ -166,6 +170,8 @@ int arch_update_cpu_topology(void); , \ .last_balance = jiffies, \ .balance_interval = 1, \ + .max_newidle_lb_cost = 0, \ + .next_decay_max_lb_cost = jiffies, \ } #endif diff --git a/include/linux/tty.h b/include/linux/tty.h index 64f864651d86..633cac77f9f9 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -672,31 +672,17 @@ static inline void tty_wait_until_sent_from_close(struct tty_struct *tty, #define wait_event_interruptible_tty(tty, wq, condition) \ ({ \ int __ret = 0; \ - if (!(condition)) { \ - __wait_event_interruptible_tty(tty, wq, condition, __ret); \ - } \ + if (!(condition)) \ + __ret = __wait_event_interruptible_tty(tty, wq, \ + condition); \ __ret; \ }) -#define __wait_event_interruptible_tty(tty, wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (!signal_pending(current)) { \ - tty_unlock(tty); \ +#define __wait_event_interruptible_tty(tty, wq, condition) \ + ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \ + tty_unlock(tty); \ schedule(); \ - tty_lock(tty); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) + tty_lock(tty)) #ifdef CONFIG_PROC_FS extern void proc_tty_register_driver(struct tty_driver *); diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 5ca0951e1855..9d8cf056e661 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -15,7 +15,7 @@ */ static inline void pagefault_disable(void) { - inc_preempt_count(); + preempt_count_inc(); /* * make sure to have issued the store before a pagefault * can hit. @@ -30,11 +30,7 @@ static inline void pagefault_enable(void) * the pagefault handler again. */ barrier(); - dec_preempt_count(); - /* - * make sure we do.. - */ - barrier(); + preempt_count_dec(); preempt_check_resched(); } diff --git a/include/linux/usb/usb_phy_gen_xceiv.h b/include/linux/usb/usb_phy_gen_xceiv.h index f9a7e7bc925b..11d85b9c1b08 100644 --- a/include/linux/usb/usb_phy_gen_xceiv.h +++ b/include/linux/usb/usb_phy_gen_xceiv.h @@ -12,7 +12,7 @@ struct usb_phy_gen_xceiv_platform_data { unsigned int needs_reset:1; }; -#if IS_ENABLED(CONFIG_NOP_USB_XCEIV) +#if defined(CONFIG_NOP_USB_XCEIV) || (defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE)) /* sometimes transceivers are accessed only through e.g. ULPI */ extern void usb_nop_xceiv_register(void); extern void usb_nop_xceiv_unregister(void); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 9cb2fe8ca944..e303eef94dd5 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -42,6 +42,7 @@ struct usbnet { struct usb_host_endpoint *status; unsigned maxpacket; struct timer_list delay; + const char *padding_pkt; /* protocol/interface state */ struct net_device *net; diff --git a/include/linux/usb_usual.h b/include/linux/usb_usual.h index bf99cd01be20..630356866030 100644 --- a/include/linux/usb_usual.h +++ b/include/linux/usb_usual.h @@ -66,7 +66,9 @@ US_FLAG(INITIAL_READ10, 0x00100000) \ /* Initial READ(10) (and others) must be retried */ \ US_FLAG(WRITE_CACHE, 0x00200000) \ - /* Write Cache status is not available */ + /* Write Cache status is not available */ \ + US_FLAG(NEEDS_CAP16, 0x00400000) + /* cannot handle READ_CAPACITY_10 */ #define US_FLAG(name, value) US_FL_##name = value , enum { US_DO_ALL_FLAGS }; diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index 80cf8173a65b..2c02f3a8d2ba 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -65,15 +65,8 @@ struct pci_dev; * out of the arbitration process (and can be safe to take * interrupts at any time. */ -#if defined(CONFIG_VGA_ARB) extern void vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes); -#else -static inline void vga_set_legacy_decoding(struct pci_dev *pdev, - unsigned int decodes) -{ -} -#endif /** * vga_get - acquire & locks VGA resources diff --git a/include/linux/wait.h b/include/linux/wait.h index a67fc1635592..ec099b03e11b 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -1,7 +1,8 @@ #ifndef _LINUX_WAIT_H #define _LINUX_WAIT_H - - +/* + * Linux wait queue related types and methods + */ #include #include #include @@ -13,27 +14,27 @@ typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, v int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key); struct __wait_queue { - unsigned int flags; + unsigned int flags; #define WQ_FLAG_EXCLUSIVE 0x01 - void *private; - wait_queue_func_t func; - struct list_head task_list; + void *private; + wait_queue_func_t func; + struct list_head task_list; }; struct wait_bit_key { - void *flags; - int bit_nr; -#define WAIT_ATOMIC_T_BIT_NR -1 + void *flags; + int bit_nr; +#define WAIT_ATOMIC_T_BIT_NR -1 }; struct wait_bit_queue { - struct wait_bit_key key; - wait_queue_t wait; + struct wait_bit_key key; + wait_queue_t wait; }; struct __wait_queue_head { - spinlock_t lock; - struct list_head task_list; + spinlock_t lock; + struct list_head task_list; }; typedef struct __wait_queue_head wait_queue_head_t; @@ -84,17 +85,17 @@ extern void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p) { - q->flags = 0; - q->private = p; - q->func = default_wake_function; + q->flags = 0; + q->private = p; + q->func = default_wake_function; } -static inline void init_waitqueue_func_entry(wait_queue_t *q, - wait_queue_func_t func) +static inline void +init_waitqueue_func_entry(wait_queue_t *q, wait_queue_func_t func) { - q->flags = 0; - q->private = NULL; - q->func = func; + q->flags = 0; + q->private = NULL; + q->func = func; } static inline int waitqueue_active(wait_queue_head_t *q) @@ -114,8 +115,8 @@ static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new) /* * Used for wake-one threads: */ -static inline void __add_wait_queue_exclusive(wait_queue_head_t *q, - wait_queue_t *wait) +static inline void +__add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait) { wait->flags |= WQ_FLAG_EXCLUSIVE; __add_wait_queue(q, wait); @@ -127,23 +128,22 @@ static inline void __add_wait_queue_tail(wait_queue_head_t *head, list_add_tail(&new->task_list, &head->task_list); } -static inline void __add_wait_queue_tail_exclusive(wait_queue_head_t *q, - wait_queue_t *wait) +static inline void +__add_wait_queue_tail_exclusive(wait_queue_head_t *q, wait_queue_t *wait) { wait->flags |= WQ_FLAG_EXCLUSIVE; __add_wait_queue_tail(q, wait); } -static inline void __remove_wait_queue(wait_queue_head_t *head, - wait_queue_t *old) +static inline void +__remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old) { list_del(&old->task_list); } void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key); void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key); -void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, - void *key); +void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key); void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr); void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr); void __wake_up_bit(wait_queue_head_t *, void *, int); @@ -170,27 +170,64 @@ wait_queue_head_t *bit_waitqueue(void *, int); /* * Wakeup macros to be used to report events to the targets. */ -#define wake_up_poll(x, m) \ +#define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, (void *) (m)) -#define wake_up_locked_poll(x, m) \ +#define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, (void *) (m)) -#define wake_up_interruptible_poll(x, m) \ +#define wake_up_interruptible_poll(x, m) \ __wake_up(x, TASK_INTERRUPTIBLE, 1, (void *) (m)) #define wake_up_interruptible_sync_poll(x, m) \ __wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m)) -#define __wait_event(wq, condition) \ -do { \ - DEFINE_WAIT(__wait); \ +#define ___wait_cond_timeout(condition) \ +({ \ + bool __cond = (condition); \ + if (__cond && !__ret) \ + __ret = 1; \ + __cond || !__ret; \ +}) + +#define ___wait_is_interruptible(state) \ + (!__builtin_constant_p(state) || \ + state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \ + +#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ +({ \ + __label__ __out; \ + wait_queue_t __wait; \ + long __ret = ret; \ + \ + INIT_LIST_HEAD(&__wait.task_list); \ + if (exclusive) \ + __wait.flags = WQ_FLAG_EXCLUSIVE; \ + else \ + __wait.flags = 0; \ \ for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ + long __int = prepare_to_wait_event(&wq, &__wait, state);\ + \ if (condition) \ break; \ - schedule(); \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + if (exclusive) { \ + abort_exclusive_wait(&wq, &__wait, \ + state, NULL); \ + goto __out; \ + } \ + break; \ + } \ + \ + cmd; \ } \ finish_wait(&wq, &__wait); \ -} while (0) +__out: __ret; \ +}) + +#define __wait_event(wq, condition) \ + (void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ + schedule()) /** * wait_event - sleep until a condition gets true @@ -204,29 +241,17 @@ do { \ * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ -#define wait_event(wq, condition) \ +#define wait_event(wq, condition) \ do { \ - if (condition) \ + if (condition) \ break; \ __wait_event(wq, condition); \ } while (0) -#define __wait_event_timeout(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ - if (condition) \ - break; \ - ret = schedule_timeout(ret); \ - if (!ret) \ - break; \ - } \ - if (!ret && (condition)) \ - ret = 1; \ - finish_wait(&wq, &__wait); \ -} while (0) +#define __wait_event_timeout(wq, condition, timeout) \ + ___wait_event(wq, ___wait_cond_timeout(condition), \ + TASK_UNINTERRUPTIBLE, 0, timeout, \ + __ret = schedule_timeout(__ret)) /** * wait_event_timeout - sleep until a condition gets true or a timeout elapses @@ -248,28 +273,14 @@ do { \ #define wait_event_timeout(wq, condition, timeout) \ ({ \ long __ret = timeout; \ - if (!(condition)) \ - __wait_event_timeout(wq, condition, __ret); \ + if (!___wait_cond_timeout(condition)) \ + __ret = __wait_event_timeout(wq, condition, timeout); \ __ret; \ }) -#define __wait_event_interruptible(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (!signal_pending(current)) { \ - schedule(); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) +#define __wait_event_interruptible(wq, condition) \ + ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \ + schedule()) /** * wait_event_interruptible - sleep until a condition gets true @@ -290,31 +301,14 @@ do { \ ({ \ int __ret = 0; \ if (!(condition)) \ - __wait_event_interruptible(wq, condition, __ret); \ + __ret = __wait_event_interruptible(wq, condition); \ __ret; \ }) -#define __wait_event_interruptible_timeout(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (!signal_pending(current)) { \ - ret = schedule_timeout(ret); \ - if (!ret) \ - break; \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - if (!ret && (condition)) \ - ret = 1; \ - finish_wait(&wq, &__wait); \ -} while (0) +#define __wait_event_interruptible_timeout(wq, condition, timeout) \ + ___wait_event(wq, ___wait_cond_timeout(condition), \ + TASK_INTERRUPTIBLE, 0, timeout, \ + __ret = schedule_timeout(__ret)) /** * wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses @@ -337,15 +331,15 @@ do { \ #define wait_event_interruptible_timeout(wq, condition, timeout) \ ({ \ long __ret = timeout; \ - if (!(condition)) \ - __wait_event_interruptible_timeout(wq, condition, __ret); \ + if (!___wait_cond_timeout(condition)) \ + __ret = __wait_event_interruptible_timeout(wq, \ + condition, timeout); \ __ret; \ }) #define __wait_event_hrtimeout(wq, condition, timeout, state) \ ({ \ int __ret = 0; \ - DEFINE_WAIT(__wait); \ struct hrtimer_sleeper __t; \ \ hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, \ @@ -356,25 +350,15 @@ do { \ current->timer_slack_ns, \ HRTIMER_MODE_REL); \ \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, state); \ - if (condition) \ - break; \ - if (state == TASK_INTERRUPTIBLE && \ - signal_pending(current)) { \ - __ret = -ERESTARTSYS; \ - break; \ - } \ + __ret = ___wait_event(wq, condition, state, 0, 0, \ if (!__t.task) { \ __ret = -ETIME; \ break; \ } \ - schedule(); \ - } \ + schedule()); \ \ hrtimer_cancel(&__t.timer); \ destroy_hrtimer_on_stack(&__t.timer); \ - finish_wait(&wq, &__wait); \ __ret; \ }) @@ -428,33 +412,15 @@ do { \ __ret; \ }) -#define __wait_event_interruptible_exclusive(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait_exclusive(&wq, &__wait, \ - TASK_INTERRUPTIBLE); \ - if (condition) { \ - finish_wait(&wq, &__wait); \ - break; \ - } \ - if (!signal_pending(current)) { \ - schedule(); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - abort_exclusive_wait(&wq, &__wait, \ - TASK_INTERRUPTIBLE, NULL); \ - break; \ - } \ -} while (0) +#define __wait_event_interruptible_exclusive(wq, condition) \ + ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ + schedule()) #define wait_event_interruptible_exclusive(wq, condition) \ ({ \ int __ret = 0; \ if (!(condition)) \ - __wait_event_interruptible_exclusive(wq, condition, __ret);\ + __ret = __wait_event_interruptible_exclusive(wq, condition);\ __ret; \ }) @@ -606,24 +572,8 @@ do { \ ? 0 : __wait_event_interruptible_locked(wq, condition, 1, 1)) - -#define __wait_event_killable(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_KILLABLE); \ - if (condition) \ - break; \ - if (!fatal_signal_pending(current)) { \ - schedule(); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) +#define __wait_event_killable(wq, condition) \ + ___wait_event(wq, condition, TASK_KILLABLE, 0, 0, schedule()) /** * wait_event_killable - sleep until a condition gets true @@ -644,26 +594,17 @@ do { \ ({ \ int __ret = 0; \ if (!(condition)) \ - __wait_event_killable(wq, condition, __ret); \ + __ret = __wait_event_killable(wq, condition); \ __ret; \ }) #define __wait_event_lock_irq(wq, condition, lock, cmd) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ - if (condition) \ - break; \ - spin_unlock_irq(&lock); \ - cmd; \ - schedule(); \ - spin_lock_irq(&lock); \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) + (void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ + spin_unlock_irq(&lock); \ + cmd; \ + schedule(); \ + spin_lock_irq(&lock)) /** * wait_event_lock_irq_cmd - sleep until a condition gets true. The @@ -723,26 +664,12 @@ do { \ } while (0) -#define __wait_event_interruptible_lock_irq(wq, condition, \ - lock, ret, cmd) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (signal_pending(current)) { \ - ret = -ERESTARTSYS; \ - break; \ - } \ - spin_unlock_irq(&lock); \ - cmd; \ - schedule(); \ - spin_lock_irq(&lock); \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) +#define __wait_event_interruptible_lock_irq(wq, condition, lock, cmd) \ + ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \ + spin_unlock_irq(&lock); \ + cmd; \ + schedule(); \ + spin_lock_irq(&lock)) /** * wait_event_interruptible_lock_irq_cmd - sleep until a condition gets true. @@ -772,10 +699,9 @@ do { \ #define wait_event_interruptible_lock_irq_cmd(wq, condition, lock, cmd) \ ({ \ int __ret = 0; \ - \ if (!(condition)) \ - __wait_event_interruptible_lock_irq(wq, condition, \ - lock, __ret, cmd); \ + __ret = __wait_event_interruptible_lock_irq(wq, \ + condition, lock, cmd); \ __ret; \ }) @@ -804,39 +730,24 @@ do { \ #define wait_event_interruptible_lock_irq(wq, condition, lock) \ ({ \ int __ret = 0; \ - \ if (!(condition)) \ - __wait_event_interruptible_lock_irq(wq, condition, \ - lock, __ret, ); \ + __ret = __wait_event_interruptible_lock_irq(wq, \ + condition, lock,) \ __ret; \ }) #define __wait_event_interruptible_lock_irq_timeout(wq, condition, \ - lock, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (signal_pending(current)) { \ - ret = -ERESTARTSYS; \ - break; \ - } \ - spin_unlock_irq(&lock); \ - ret = schedule_timeout(ret); \ - spin_lock_irq(&lock); \ - if (!ret) \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) + lock, timeout) \ + ___wait_event(wq, ___wait_cond_timeout(condition), \ + TASK_INTERRUPTIBLE, 0, ret, \ + spin_unlock_irq(&lock); \ + __ret = schedule_timeout(__ret); \ + spin_lock_irq(&lock)); /** - * wait_event_interruptible_lock_irq_timeout - sleep until a condition gets true or a timeout elapses. - * The condition is checked under the lock. This is expected - * to be called with the lock taken. + * wait_event_interruptible_lock_irq_timeout - sleep until a condition gets + * true or a timeout elapses. The condition is checked under + * the lock. This is expected to be called with the lock taken. * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before schedule() @@ -860,11 +771,10 @@ do { \ #define wait_event_interruptible_lock_irq_timeout(wq, condition, lock, \ timeout) \ ({ \ - int __ret = timeout; \ - \ - if (!(condition)) \ - __wait_event_interruptible_lock_irq_timeout( \ - wq, condition, lock, __ret); \ + long __ret = timeout; \ + if (!___wait_cond_timeout(condition)) \ + __ret = __wait_event_interruptible_lock_irq_timeout( \ + wq, condition, lock, timeout); \ __ret; \ }) @@ -875,20 +785,18 @@ do { \ * We plan to remove these interfaces. */ extern void sleep_on(wait_queue_head_t *q); -extern long sleep_on_timeout(wait_queue_head_t *q, - signed long timeout); +extern long sleep_on_timeout(wait_queue_head_t *q, signed long timeout); extern void interruptible_sleep_on(wait_queue_head_t *q); -extern long interruptible_sleep_on_timeout(wait_queue_head_t *q, - signed long timeout); +extern long interruptible_sleep_on_timeout(wait_queue_head_t *q, signed long timeout); /* * Waitqueues which are removed from the waitqueue_head at wakeup time */ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state); void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state); +long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state); void finish_wait(wait_queue_head_t *q, wait_queue_t *wait); -void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, - unsigned int mode, void *key); +void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key); int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key); int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key); @@ -934,8 +842,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key); * One uses wait_on_bit() where one is waiting for the bit to clear, * but has no intention of setting it. */ -static inline int wait_on_bit(void *word, int bit, - int (*action)(void *), unsigned mode) +static inline int +wait_on_bit(void *word, int bit, int (*action)(void *), unsigned mode) { if (!test_bit(bit, word)) return 0; @@ -958,8 +866,8 @@ static inline int wait_on_bit(void *word, int bit, * One uses wait_on_bit_lock() where one is waiting for the bit to * clear with the intention of setting it, and when done, clearing it. */ -static inline int wait_on_bit_lock(void *word, int bit, - int (*action)(void *), unsigned mode) +static inline int +wait_on_bit_lock(void *word, int bit, int (*action)(void *), unsigned mode) { if (!test_and_set_bit(bit, word)) return 0; @@ -983,5 +891,5 @@ int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode) return 0; return out_of_line_wait_on_atomic_t(val, action, mode); } - -#endif + +#endif /* _LINUX_WAIT_H */ diff --git a/include/net/addrconf.h b/include/net/addrconf.h index fb314de2b61b..86505bfa5d2c 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -67,6 +67,10 @@ int ipv6_chk_addr(struct net *net, const struct in6_addr *addr, int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr); #endif +bool ipv6_chk_custom_prefix(const struct in6_addr *addr, + const unsigned int prefix_len, + struct net_device *dev); + int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev); struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index aaeaf0938ec0..15f10841e2b5 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -104,6 +104,7 @@ enum { enum { HCI_SETUP, HCI_AUTO_OFF, + HCI_RFKILLED, HCI_MGMT, HCI_PAIRABLE, HCI_SERVICE_CACHE, diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index f0d70f066f3d..9c4d37ec45a1 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -723,8 +723,6 @@ struct ip_vs_dest_dst { struct rcu_head rcu_head; }; -/* In grace period after removing */ -#define IP_VS_DEST_STATE_REMOVING 0x01 /* * The real server destination forwarding entry * with ip address, port number, and so on. @@ -742,7 +740,7 @@ struct ip_vs_dest { atomic_t refcnt; /* reference counter */ struct ip_vs_stats stats; /* statistics */ - unsigned long state; /* state flags */ + unsigned long idle_start; /* start time, jiffies */ /* connection counters and thresholds */ atomic_t activeconns; /* active connections */ @@ -756,14 +754,13 @@ struct ip_vs_dest { struct ip_vs_dest_dst __rcu *dest_dst; /* cached dst info */ /* for virtual service */ - struct ip_vs_service *svc; /* service it belongs to */ + struct ip_vs_service __rcu *svc; /* service it belongs to */ __u16 protocol; /* which protocol (TCP/UDP) */ __be16 vport; /* virtual port number */ union nf_inet_addr vaddr; /* virtual IP address */ __u32 vfwmark; /* firewall mark of service */ struct list_head t_list; /* in dest_trash */ - struct rcu_head rcu_head; unsigned int in_rs_table:1; /* we are in rs_table */ }; @@ -1649,7 +1646,7 @@ static inline void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp) /* CONFIG_IP_VS_NFCT */ #endif -static inline unsigned int +static inline int ip_vs_dest_conn_overhead(struct ip_vs_dest *dest) { /* diff --git a/include/net/mrp.h b/include/net/mrp.h index 4fbf02aa2ec1..0f7558b638ae 100644 --- a/include/net/mrp.h +++ b/include/net/mrp.h @@ -112,6 +112,7 @@ struct mrp_applicant { struct mrp_application *app; struct net_device *dev; struct timer_list join_timer; + struct timer_list periodic_timer; spinlock_t lock; struct sk_buff_head queue; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 1313456a0994..9d22f08896c6 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -74,6 +74,7 @@ struct net { struct hlist_head *dev_index_head; unsigned int dev_base_seq; /* protected by rtnl_mutex */ int ifindex; + unsigned int dev_unreg_count; /* core fib_rules */ struct list_head rules_ops; diff --git a/include/net/netfilter/nf_conntrack_synproxy.h b/include/net/netfilter/nf_conntrack_synproxy.h index 806f54a290d6..f572f313d6f1 100644 --- a/include/net/netfilter/nf_conntrack_synproxy.h +++ b/include/net/netfilter/nf_conntrack_synproxy.h @@ -56,7 +56,7 @@ struct synproxy_options { struct tcphdr; struct xt_synproxy_info; -extern void synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, +extern bool synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, const struct tcphdr *th, struct synproxy_options *opts); extern unsigned int synproxy_options_size(const struct synproxy_options *opts); diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h index 6ca975bebd37..c2e542b27a5a 100644 --- a/include/net/secure_seq.h +++ b/include/net/secure_seq.h @@ -3,7 +3,6 @@ #include -extern void net_secret_init(void); extern __u32 secure_ip_id(__be32 daddr); extern __u32 secure_ipv6_id(const __be32 daddr[4]); extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport); diff --git a/include/net/sock.h b/include/net/sock.h index 6ba2e7b0e2b1..1d37a8086bed 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -409,6 +409,11 @@ struct sock { void (*sk_destruct)(struct sock *sk); }; +#define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) + +#define rcu_dereference_sk_user_data(sk) rcu_dereference(__sk_user_data((sk))) +#define rcu_assign_sk_user_data(sk, ptr) rcu_assign_pointer(__sk_user_data((sk)), ptr) + /* * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK * or not whether his port will be reused by someone else. SK_FORCE_REUSE diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index ee2376cfaab3..aca382266411 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -39,15 +39,26 @@ TRACE_EVENT(rcu_utilization, #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU) /* - * Tracepoint for grace-period events: starting and ending a grace - * period ("start" and "end", respectively), a CPU noting the start - * of a new grace period or the end of an old grace period ("cpustart" - * and "cpuend", respectively), a CPU passing through a quiescent - * state ("cpuqs"), a CPU coming online or going offline ("cpuonl" - * and "cpuofl", respectively), a CPU being kicked for being too - * long in dyntick-idle mode ("kick"), a CPU accelerating its new - * callbacks to RCU_NEXT_READY_TAIL ("AccReadyCB"), and a CPU - * accelerating its new callbacks to RCU_WAIT_TAIL ("AccWaitCB"). + * Tracepoint for grace-period events. Takes a string identifying the + * RCU flavor, the grace-period number, and a string identifying the + * grace-period-related event as follows: + * + * "AccReadyCB": CPU acclerates new callbacks to RCU_NEXT_READY_TAIL. + * "AccWaitCB": CPU accelerates new callbacks to RCU_WAIT_TAIL. + * "newreq": Request a new grace period. + * "start": Start a grace period. + * "cpustart": CPU first notices a grace-period start. + * "cpuqs": CPU passes through a quiescent state. + * "cpuonl": CPU comes online. + * "cpuofl": CPU goes offline. + * "reqwait": GP kthread sleeps waiting for grace-period request. + * "reqwaitsig": GP kthread awakened by signal from reqwait state. + * "fqswait": GP kthread waiting until time to force quiescent states. + * "fqsstart": GP kthread starts forcing quiescent states. + * "fqsend": GP kthread done forcing quiescent states. + * "fqswaitsig": GP kthread awakened by signal from fqswait state. + * "end": End a grace period. + * "cpuend": CPU first notices a grace-period end. */ TRACE_EVENT(rcu_grace_period, @@ -160,6 +171,46 @@ TRACE_EVENT(rcu_grace_period_init, __entry->grplo, __entry->grphi, __entry->qsmask) ); +/* + * Tracepoint for RCU no-CBs CPU callback handoffs. This event is intended + * to assist debugging of these handoffs. + * + * The first argument is the name of the RCU flavor, and the second is + * the number of the offloaded CPU are extracted. The third and final + * argument is a string as follows: + * + * "WakeEmpty": Wake rcuo kthread, first CB to empty list. + * "WakeOvf": Wake rcuo kthread, CB list is huge. + * "WakeNot": Don't wake rcuo kthread. + * "WakeNotPoll": Don't wake rcuo kthread because it is polling. + * "Poll": Start of new polling cycle for rcu_nocb_poll. + * "Sleep": Sleep waiting for CBs for !rcu_nocb_poll. + * "WokeEmpty": rcuo kthread woke to find empty list. + * "WokeNonEmpty": rcuo kthread woke to find non-empty list. + * "WaitQueue": Enqueue partially done, timed wait for it to complete. + * "WokeQueue": Partial enqueue now complete. + */ +TRACE_EVENT(rcu_nocb_wake, + + TP_PROTO(const char *rcuname, int cpu, const char *reason), + + TP_ARGS(rcuname, cpu, reason), + + TP_STRUCT__entry( + __field(const char *, rcuname) + __field(int, cpu) + __field(const char *, reason) + ), + + TP_fast_assign( + __entry->rcuname = rcuname; + __entry->cpu = cpu; + __entry->reason = reason; + ), + + TP_printk("%s %d %s", __entry->rcuname, __entry->cpu, __entry->reason) +); + /* * Tracepoint for tasks blocking within preemptible-RCU read-side * critical sections. Track the type of RCU (which one day might @@ -540,17 +591,17 @@ TRACE_EVENT(rcu_invoke_kfree_callback, TRACE_EVENT(rcu_batch_end, TP_PROTO(const char *rcuname, int callbacks_invoked, - bool cb, bool nr, bool iit, bool risk), + char cb, char nr, char iit, char risk), TP_ARGS(rcuname, callbacks_invoked, cb, nr, iit, risk), TP_STRUCT__entry( __field(const char *, rcuname) __field(int, callbacks_invoked) - __field(bool, cb) - __field(bool, nr) - __field(bool, iit) - __field(bool, risk) + __field(char, cb) + __field(char, nr) + __field(char, iit) + __field(char, risk) ), TP_fast_assign( @@ -656,6 +707,7 @@ TRACE_EVENT(rcu_barrier, #define trace_rcu_future_grace_period(rcuname, gpnum, completed, c, \ level, grplo, grphi, event) \ do { } while (0) +#define trace_rcu_nocb_wake(rcuname, cpu, reason) do { } while (0) #define trace_rcu_preempt_task(rcuname, pid, gpnum) do { } while (0) #define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0) #define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, \ diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 2e7d9947a10d..613381bcde40 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -100,7 +100,7 @@ static inline long __trace_sched_switch_state(struct task_struct *p) /* * For all intents and purposes a preempted task is a running task. */ - if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE) + if (task_preempt_count(p) & PREEMPT_ACTIVE) state = TASK_RUNNING | TASK_STATE_MAX; #endif diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h index fa8b3adf9ffb..46d41e8b0dcc 100644 --- a/include/uapi/drm/radeon_drm.h +++ b/include/uapi/drm/radeon_drm.h @@ -1007,4 +1007,6 @@ struct drm_radeon_info { #define SI_TILE_MODE_DEPTH_STENCIL_2D_4AA 3 #define SI_TILE_MODE_DEPTH_STENCIL_2D_8AA 2 +#define CIK_TILE_MODE_DEPTH_STENCIL_1D 5 + #endif diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 009a655a5d35..da48837d617d 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -136,8 +136,9 @@ enum perf_event_sample_format { PERF_SAMPLE_WEIGHT = 1U << 14, PERF_SAMPLE_DATA_SRC = 1U << 15, PERF_SAMPLE_IDENTIFIER = 1U << 16, + PERF_SAMPLE_TRANSACTION = 1U << 17, - PERF_SAMPLE_MAX = 1U << 17, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */ }; /* @@ -180,6 +181,28 @@ enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_64 = 2, }; +/* + * Values for the memory transaction event qualifier, mostly for + * abort events. Multiple bits can be set. + */ +enum { + PERF_TXN_ELISION = (1 << 0), /* From elision */ + PERF_TXN_TRANSACTION = (1 << 1), /* From transaction */ + PERF_TXN_SYNC = (1 << 2), /* Instruction is related */ + PERF_TXN_ASYNC = (1 << 3), /* Instruction not related */ + PERF_TXN_RETRY = (1 << 4), /* Retry possible */ + PERF_TXN_CONFLICT = (1 << 5), /* Conflict abort */ + PERF_TXN_CAPACITY_WRITE = (1 << 6), /* Capacity write abort */ + PERF_TXN_CAPACITY_READ = (1 << 7), /* Capacity read abort */ + + PERF_TXN_MAX = (1 << 8), /* non-ABI */ + + /* bits 32..63 are reserved for the abort code */ + + PERF_TXN_ABORT_MASK = (0xffffffffULL << 32), + PERF_TXN_ABORT_SHIFT = 32, +}; + /* * The format of the data returned by read() on a perf event fd, * as specified by attr.read_format: diff --git a/init/Kconfig b/init/Kconfig index 3ecd8a1178f1..841e79cb8bb3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -354,7 +354,8 @@ config VIRT_CPU_ACCOUNTING_NATIVE config VIRT_CPU_ACCOUNTING_GEN bool "Full dynticks CPU time accounting" - depends on HAVE_CONTEXT_TRACKING && 64BIT + depends on HAVE_CONTEXT_TRACKING + depends on HAVE_VIRT_CPU_ACCOUNTING_GEN select VIRT_CPU_ACCOUNTING select CONTEXT_TRACKING help diff --git a/init/main.c b/init/main.c index af310afbef28..379090fadac9 100644 --- a/init/main.c +++ b/init/main.c @@ -76,6 +76,7 @@ #include #include #include +#include #include #include @@ -692,7 +693,7 @@ int __init_or_module do_one_initcall(initcall_t fn) if (preempt_count() != count) { sprintf(msgbuf, "preemption imbalance "); - preempt_count() = count; + preempt_count_set(count); } if (irqs_disabled()) { strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf)); @@ -780,6 +781,7 @@ static void __init do_basic_setup(void) do_ctors(); usermodehelper_enable(); do_initcalls(); + random_int_secret_init(); } static void __init do_pre_smp_initcalls(void) diff --git a/ipc/msg.c b/ipc/msg.c index 9e4310c546ae..558aa91186b6 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -695,6 +695,12 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext, if (ipcperms(ns, &msq->q_perm, S_IWUGO)) goto out_unlock0; + /* raced with RMID? */ + if (msq->q_perm.deleted) { + err = -EIDRM; + goto out_unlock0; + } + err = security_msg_queue_msgsnd(msq, msg, msgflg); if (err) goto out_unlock0; @@ -901,6 +907,13 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl goto out_unlock1; ipc_lock_object(&msq->q_perm); + + /* raced with RMID? */ + if (msq->q_perm.deleted) { + msg = ERR_PTR(-EIDRM); + goto out_unlock0; + } + msg = find_msg(msq, &msgtyp, mode); if (!IS_ERR(msg)) { /* diff --git a/ipc/sem.c b/ipc/sem.c index 19c8b980d1fe..db9d241af133 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -252,71 +252,113 @@ static void sem_rcu_free(struct rcu_head *head) ipc_rcu_free(head); } +/* + * Wait until all currently ongoing simple ops have completed. + * Caller must own sem_perm.lock. + * New simple ops cannot start, because simple ops first check + * that sem_perm.lock is free. + * that a) sem_perm.lock is free and b) complex_count is 0. + */ +static void sem_wait_array(struct sem_array *sma) +{ + int i; + struct sem *sem; + + if (sma->complex_count) { + /* The thread that increased sma->complex_count waited on + * all sem->lock locks. Thus we don't need to wait again. + */ + return; + } + + for (i = 0; i < sma->sem_nsems; i++) { + sem = sma->sem_base + i; + spin_unlock_wait(&sem->lock); + } +} + /* * If the request contains only one semaphore operation, and there are * no complex transactions pending, lock only the semaphore involved. * Otherwise, lock the entire semaphore array, since we either have * multiple semaphores in our own semops, or we need to look at * semaphores from other pending complex operations. - * - * Carefully guard against sma->complex_count changing between zero - * and non-zero while we are spinning for the lock. The value of - * sma->complex_count cannot change while we are holding the lock, - * so sem_unlock should be fine. - * - * The global lock path checks that all the local locks have been released, - * checking each local lock once. This means that the local lock paths - * cannot start their critical sections while the global lock is held. */ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, int nsops) { - int locknum; - again: - if (nsops == 1 && !sma->complex_count) { - struct sem *sem = sma->sem_base + sops->sem_num; + struct sem *sem; - /* Lock just the semaphore we are interested in. */ - spin_lock(&sem->lock); + if (nsops != 1) { + /* Complex operation - acquire a full lock */ + ipc_lock_object(&sma->sem_perm); - /* - * If sma->complex_count was set while we were spinning, - * we may need to look at things we did not lock here. + /* And wait until all simple ops that are processed + * right now have dropped their locks. */ - if (unlikely(sma->complex_count)) { - spin_unlock(&sem->lock); - goto lock_array; - } + sem_wait_array(sma); + return -1; + } + + /* + * Only one semaphore affected - try to optimize locking. + * The rules are: + * - optimized locking is possible if no complex operation + * is either enqueued or processed right now. + * - The test for enqueued complex ops is simple: + * sma->complex_count != 0 + * - Testing for complex ops that are processed right now is + * a bit more difficult. Complex ops acquire the full lock + * and first wait that the running simple ops have completed. + * (see above) + * Thus: If we own a simple lock and the global lock is free + * and complex_count is now 0, then it will stay 0 and + * thus just locking sem->lock is sufficient. + */ + sem = sma->sem_base + sops->sem_num; + if (sma->complex_count == 0) { /* - * Another process is holding the global lock on the - * sem_array; we cannot enter our critical section, - * but have to wait for the global lock to be released. + * It appears that no complex operation is around. + * Acquire the per-semaphore lock. */ - if (unlikely(spin_is_locked(&sma->sem_perm.lock))) { - spin_unlock(&sem->lock); - spin_unlock_wait(&sma->sem_perm.lock); - goto again; + spin_lock(&sem->lock); + + /* Then check that the global lock is free */ + if (!spin_is_locked(&sma->sem_perm.lock)) { + /* spin_is_locked() is not a memory barrier */ + smp_mb(); + + /* Now repeat the test of complex_count: + * It can't change anymore until we drop sem->lock. + * Thus: if is now 0, then it will stay 0. + */ + if (sma->complex_count == 0) { + /* fast path successful! */ + return sops->sem_num; + } } + spin_unlock(&sem->lock); + } + + /* slow path: acquire the full lock */ + ipc_lock_object(&sma->sem_perm); - locknum = sops->sem_num; + if (sma->complex_count == 0) { + /* False alarm: + * There is no complex operation, thus we can switch + * back to the fast path. + */ + spin_lock(&sem->lock); + ipc_unlock_object(&sma->sem_perm); + return sops->sem_num; } else { - int i; - /* - * Lock the semaphore array, and wait for all of the - * individual semaphore locks to go away. The code - * above ensures no new single-lock holders will enter - * their critical section while the array lock is held. + /* Not a false alarm, thus complete the sequence for a + * full lock. */ - lock_array: - ipc_lock_object(&sma->sem_perm); - for (i = 0; i < sma->sem_nsems; i++) { - struct sem *sem = sma->sem_base + i; - spin_unlock_wait(&sem->lock); - } - locknum = -1; + sem_wait_array(sma); + return -1; } - return locknum; } static inline void sem_unlock(struct sem_array *sma, int locknum) @@ -875,6 +917,24 @@ again: return semop_completed; } +/** + * set_semotime(sma, sops) - set sem_otime + * @sma: semaphore array + * @sops: operations that modified the array, may be NULL + * + * sem_otime is replicated to avoid cache line trashing. + * This function sets one instance to the current time. + */ +static void set_semotime(struct sem_array *sma, struct sembuf *sops) +{ + if (sops == NULL) { + sma->sem_base[0].sem_otime = get_seconds(); + } else { + sma->sem_base[sops[0].sem_num].sem_otime = + get_seconds(); + } +} + /** * do_smart_update(sma, sops, nsops, otime, pt) - optimized update_queue * @sma: semaphore array @@ -925,17 +985,10 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop } } } - if (otime) { - if (sops == NULL) { - sma->sem_base[0].sem_otime = get_seconds(); - } else { - sma->sem_base[sops[0].sem_num].sem_otime = - get_seconds(); - } - } + if (otime) + set_semotime(sma, sops); } - /* The following counts are associated to each semaphore: * semncnt number of tasks waiting on semval being nonzero * semzcnt number of tasks waiting on semval being zero @@ -1229,6 +1282,12 @@ static int semctl_setval(struct ipc_namespace *ns, int semid, int semnum, sem_lock(sma, NULL, -1); + if (sma->sem_perm.deleted) { + sem_unlock(sma, -1); + rcu_read_unlock(); + return -EIDRM; + } + curr = &sma->sem_base[semnum]; ipc_assert_locked_object(&sma->sem_perm); @@ -1283,12 +1342,14 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum, int i; sem_lock(sma, NULL, -1); + if (sma->sem_perm.deleted) { + err = -EIDRM; + goto out_unlock; + } if(nsems > SEMMSL_FAST) { if (!ipc_rcu_getref(sma)) { - sem_unlock(sma, -1); - rcu_read_unlock(); err = -EIDRM; - goto out_free; + goto out_unlock; } sem_unlock(sma, -1); rcu_read_unlock(); @@ -1301,10 +1362,8 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum, rcu_read_lock(); sem_lock_and_putref(sma); if (sma->sem_perm.deleted) { - sem_unlock(sma, -1); - rcu_read_unlock(); err = -EIDRM; - goto out_free; + goto out_unlock; } } for (i = 0; i < sma->sem_nsems; i++) @@ -1322,8 +1381,8 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum, struct sem_undo *un; if (!ipc_rcu_getref(sma)) { - rcu_read_unlock(); - return -EIDRM; + err = -EIDRM; + goto out_rcu_wakeup; } rcu_read_unlock(); @@ -1351,10 +1410,8 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum, rcu_read_lock(); sem_lock_and_putref(sma); if (sma->sem_perm.deleted) { - sem_unlock(sma, -1); - rcu_read_unlock(); err = -EIDRM; - goto out_free; + goto out_unlock; } for (i = 0; i < nsems; i++) @@ -1378,6 +1435,10 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum, goto out_rcu_wakeup; sem_lock(sma, NULL, -1); + if (sma->sem_perm.deleted) { + err = -EIDRM; + goto out_unlock; + } curr = &sma->sem_base[semnum]; switch (cmd) { @@ -1783,6 +1844,10 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, if (error) goto out_rcu_wakeup; + error = -EIDRM; + locknum = sem_lock(sma, sops, nsops); + if (sma->sem_perm.deleted) + goto out_unlock_free; /* * semid identifiers are not unique - find_alloc_undo may have * allocated an undo structure, it was invalidated by an RMID @@ -1790,19 +1855,22 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, * This case can be detected checking un->semid. The existence of * "un" itself is guaranteed by rcu. */ - error = -EIDRM; - locknum = sem_lock(sma, sops, nsops); if (un && un->semid == -1) goto out_unlock_free; error = perform_atomic_semop(sma, sops, nsops, un, task_tgid_vnr(current)); - if (error <= 0) { - if (alter && error == 0) + if (error == 0) { + /* If the operation was successful, then do + * the required updates. + */ + if (alter) do_smart_update(sma, sops, nsops, 1, &tasks); - - goto out_unlock_free; + else + set_semotime(sma, sops); } + if (error <= 0) + goto out_unlock_free; /* We need to sleep on this operation, so we put the current * task into the pending queue and go to sleep. @@ -1999,6 +2067,12 @@ void exit_sem(struct task_struct *tsk) } sem_lock(sma, NULL, -1); + /* exit_sem raced with IPC_RMID, nothing to do */ + if (sma->sem_perm.deleted) { + sem_unlock(sma, -1); + rcu_read_unlock(); + continue; + } un = __lookup_undo(ulp, semid); if (un == NULL) { /* exit_sem raced with IPC_RMID+semget() that created @@ -2061,6 +2135,14 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) struct sem_array *sma = it; time_t sem_otime; + /* + * The proc interface isn't aware of sem_lock(), it calls + * ipc_lock_object() directly (in sysvipc_find_ipc). + * In order to stay compatible with sem_lock(), we must wait until + * all simple semop() calls have left their critical regions. + */ + sem_wait_array(sma); + sem_otime = get_semotime(sma); return seq_printf(s, diff --git a/ipc/util.c b/ipc/util.c index fdb8ae740775..7684f41bce76 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -17,12 +17,27 @@ * Pavel Emelianov * * General sysv ipc locking scheme: - * when doing ipc id lookups, take the ids->rwsem - * rcu_read_lock() - * obtain the ipc object (kern_ipc_perm) - * perform security, capabilities, auditing and permission checks, etc. - * acquire the ipc lock (kern_ipc_perm.lock) throught ipc_lock_object() - * perform data updates (ie: SET, RMID, LOCK/UNLOCK commands) + * rcu_read_lock() + * obtain the ipc object (kern_ipc_perm) by looking up the id in an idr + * tree. + * - perform initial checks (capabilities, auditing and permission, + * etc). + * - perform read-only operations, such as STAT, INFO commands. + * acquire the ipc lock (kern_ipc_perm.lock) through + * ipc_lock_object() + * - perform data updates, such as SET, RMID commands and + * mechanism-specific operations (semop/semtimedop, + * msgsnd/msgrcv, shmat/shmdt). + * drop the ipc lock, through ipc_unlock_object(). + * rcu_read_unlock() + * + * The ids->rwsem must be taken when: + * - creating, removing and iterating the existing entries in ipc + * identifier sets. + * - iterating through files under /proc/sysvipc/ + * + * Note that sems have a special fast path that avoids kern_ipc_perm.lock - + * see sem_lock(). */ #include diff --git a/kernel/Makefile b/kernel/Makefile index 1ce47553fb02..f99d908b5550 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -6,9 +6,9 @@ obj-y = fork.o exec_domain.o panic.o \ cpu.o exit.o itimer.o time.o softirq.o resource.o \ sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o task_work.o \ - rcupdate.o extable.o params.o posix-timers.o \ + extable.o params.o posix-timers.o \ kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \ - hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \ + hrtimer.o rwsem.o nsproxy.o semaphore.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o groups.o lglock.o smpboot.o @@ -27,6 +27,7 @@ obj-y += power/ obj-y += printk/ obj-y += cpu/ obj-y += irq/ +obj-y += rcu/ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o @@ -81,12 +82,6 @@ obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o obj-$(CONFIG_SECCOMP) += seccomp.o -obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o -obj-$(CONFIG_TREE_RCU) += rcutree.o -obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o -obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o -obj-$(CONFIG_TINY_RCU) += rcutiny.o -obj-$(CONFIG_TINY_PREEMPT_RCU) += rcutiny.o obj-$(CONFIG_RELAY) += relay.o obj-$(CONFIG_SYSCTL) += utsname_sysctl.o obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o diff --git a/kernel/bounds.c b/kernel/bounds.c index 0c9b862292b2..e8ca97b5c386 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -10,6 +10,7 @@ #include #include #include +#include void foo(void) { @@ -17,5 +18,8 @@ void foo(void) DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS); DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES); DEFINE(NR_PCG_FLAGS, __NR_PCG_FLAGS); +#ifdef CONFIG_SMP + DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS)); +#endif /* End of constants */ } diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 247091bf0587..e5f3917aa05b 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -50,6 +50,15 @@ void context_tracking_user_enter(void) { unsigned long flags; + /* + * Repeat the user_enter() check here because some archs may be calling + * this from asm and if no CPU needs context tracking, they shouldn't + * go further. Repeat the check here until they support the static key + * check. + */ + if (!static_key_false(&context_tracking_enabled)) + return; + /* * Some contexts may involve an exception occuring in an irq, * leading to that nesting: @@ -111,7 +120,7 @@ void context_tracking_user_enter(void) * instead of preempt_schedule() to exit user context if needed before * calling the scheduler. */ -void __sched notrace preempt_schedule_context(void) +asmlinkage void __sched notrace preempt_schedule_context(void) { enum ctx_state prev_ctx; @@ -151,6 +160,9 @@ void context_tracking_user_exit(void) { unsigned long flags; + if (!static_key_false(&context_tracking_enabled)) + return; + if (in_interrupt()) return; diff --git a/kernel/cpu.c b/kernel/cpu.c index d7f07a2da5a6..63aa50d7ce1e 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) } smpboot_park_threads(cpu); + /* + * By now we've cleared cpu_active_mask, wait for all preempt-disabled + * and RCU users of this state to go away such that all new such users + * will observe it. + * + * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might + * not imply sync_sched(), so explicitly call both. + */ +#ifdef CONFIG_PREEMPT + synchronize_sched(); +#endif + synchronize_rcu(); + + /* + * So now all preempt/rcu users must observe !cpu_active(). + */ + err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu)); if (err) { /* CPU didn't die: tell everyone. Can't complain. */ diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index e695c0a0bcb5..988573a9a387 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void) rcu_idle_enter(); trace_cpu_idle_rcuidle(0, smp_processor_id()); local_irq_enable(); - while (!need_resched()) + while (!tif_need_resched()) cpu_relax(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); rcu_idle_exit(); @@ -92,8 +92,7 @@ static void cpu_idle_loop(void) if (cpu_idle_force_poll || tick_check_broadcast_expired()) { cpu_idle_poll(); } else { - current_clr_polling(); - if (!need_resched()) { + if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); arch_cpu_idle(); @@ -103,9 +102,16 @@ static void cpu_idle_loop(void) } else { local_irq_enable(); } - current_set_polling(); + __current_set_polling(); } arch_cpu_idle_exit(); + /* + * We need to test and propagate the TIF_NEED_RESCHED + * bit here because we might not have send the + * reschedule IPI to idle tasks. + */ + if (tif_need_resched()) + set_preempt_need_resched(); } tick_nohz_idle_exit(); schedule_preempt_disabled(); @@ -129,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state state) */ boot_init_stack_canary(); #endif - current_set_polling(); + __current_set_polling(); arch_cpu_idle_prepare(); cpu_idle_loop(); } diff --git a/kernel/events/core.c b/kernel/events/core.c index cb4238e85b38..5bd7fe43a7a2 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -193,7 +193,7 @@ int perf_proc_update_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { - int ret = proc_dointvec(table, write, buffer, lenp, ppos); + int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (ret || !write) return ret; @@ -1201,6 +1201,9 @@ static void perf_event__header_size(struct perf_event *event) if (sample_type & PERF_SAMPLE_DATA_SRC) size += sizeof(data->data_src.val); + if (sample_type & PERF_SAMPLE_TRANSACTION) + size += sizeof(data->txn); + event->header_size = size; } @@ -4572,6 +4575,9 @@ void perf_output_sample(struct perf_output_handle *handle, if (sample_type & PERF_SAMPLE_DATA_SRC) perf_output_put(handle, data->data_src.val); + if (sample_type & PERF_SAMPLE_TRANSACTION) + perf_output_put(handle, data->txn); + if (!event->attr.watermark) { int wakeup_events = event->attr.wakeup_events; @@ -6767,6 +6773,10 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr, if (ret) return -EFAULT; + /* disabled for now */ + if (attr->mmap2) + return -EINVAL; + if (attr->__reserved_1) return -EINVAL; @@ -7234,15 +7244,15 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu) perf_remove_from_context(event); unaccount_event_cpu(event, src_cpu); put_ctx(src_ctx); - list_add(&event->event_entry, &events); + list_add(&event->migrate_entry, &events); } mutex_unlock(&src_ctx->mutex); synchronize_rcu(); mutex_lock(&dst_ctx->mutex); - list_for_each_entry_safe(event, tmp, &events, event_entry) { - list_del(&event->event_entry); + list_for_each_entry_safe(event, tmp, &events, migrate_entry) { + list_del(&event->migrate_entry); if (event->state >= PERF_EVENT_STATE_OFF) event->state = PERF_EVENT_STATE_INACTIVE; account_event_cpu(event, dst_cpu); diff --git a/kernel/fork.c b/kernel/fork.c index 086fe73ad6bd..c93be06dee87 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -816,9 +816,6 @@ struct mm_struct *dup_mm(struct task_struct *tsk) #ifdef CONFIG_TRANSPARENT_HUGEPAGE mm->pmd_huge_pte = NULL; -#endif -#ifdef CONFIG_NUMA_BALANCING - mm->first_nid = NUMA_PTE_SCAN_INIT; #endif if (!mm_init(mm, tsk)) goto fail_nomem; @@ -1313,7 +1310,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, #endif /* Perform scheduler related setup. Assign this task to a CPU. */ - sched_fork(p); + sched_fork(clone_flags, p); retval = perf_event_init_task(p); if (retval) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 3e97fb126e6b..042252383fd2 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -20,7 +20,7 @@ /* * The number of tasks checked: */ -unsigned long __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT; +int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT; /* * Limit number of tasks checked in a batch. diff --git a/kernel/kmod.c b/kernel/kmod.c index fb326365b694..b086006c59e7 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -571,6 +571,10 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) DECLARE_COMPLETION_ONSTACK(done); int retval = 0; + if (!sub_info->path) { + call_usermodehelper_freeinfo(sub_info); + return -EINVAL; + } helper_lock(); if (!khelper_wq || usermodehelper_disabled) { retval = -EBUSY; diff --git a/kernel/lockdep.c b/kernel/lockdep.c index e16c45b9ee77..4e8e14c34e42 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -4224,7 +4224,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) printk("\n%srcu_scheduler_active = %d, debug_locks = %d\n", !rcu_lockdep_current_cpu_online() ? "RCU used illegally from offline CPU!\n" - : rcu_is_cpu_idle() + : !rcu_is_watching() ? "RCU used illegally from idle CPU!\n" : "", rcu_scheduler_active, debug_locks); @@ -4247,7 +4247,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) * So complain bitterly if someone does call rcu_read_lock(), * rcu_read_lock_bh() and so on from extended quiescent states. */ - if (rcu_is_cpu_idle()) + if (!rcu_is_watching()) printk("RCU used illegally from extended quiescent state!\n"); lockdep_print_held_locks(curr); diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c index b2c71c5873e4..09220656d888 100644 --- a/kernel/lockdep_proc.c +++ b/kernel/lockdep_proc.c @@ -421,6 +421,7 @@ static void seq_lock_time(struct seq_file *m, struct lock_time *lt) seq_time(m, lt->min); seq_time(m, lt->max); seq_time(m, lt->total); + seq_time(m, lt->nr ? do_div(lt->total, lt->nr) : 0); } static void seq_stats(struct seq_file *m, struct lock_stat_data *data) @@ -518,20 +519,20 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data) } if (i) { seq_puts(m, "\n"); - seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1)); + seq_line(m, '.', 0, 40 + 1 + 12 * (14 + 1)); seq_puts(m, "\n"); } } static void seq_header(struct seq_file *m) { - seq_printf(m, "lock_stat version 0.3\n"); + seq_puts(m, "lock_stat version 0.4\n"); if (unlikely(!debug_locks)) seq_printf(m, "*WARNING* lock debugging disabled!! - possibly due to a lockdep warning\n"); - seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1)); - seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s " + seq_line(m, '-', 0, 40 + 1 + 12 * (14 + 1)); + seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s %14s %14s " "%14s %14s\n", "class name", "con-bounces", @@ -539,12 +540,14 @@ static void seq_header(struct seq_file *m) "waittime-min", "waittime-max", "waittime-total", + "waittime-avg", "acq-bounces", "acquisitions", "holdtime-min", "holdtime-max", - "holdtime-total"); - seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1)); + "holdtime-total", + "holdtime-avg"); + seq_line(m, '-', 0, 40 + 1 + 12 * (14 + 1)); seq_printf(m, "\n"); } diff --git a/kernel/mutex.c b/kernel/mutex.c index 6d647aedffea..d24105b1b794 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -410,7 +410,7 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, static __always_inline int __sched __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct lockdep_map *nest_lock, unsigned long ip, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) { struct task_struct *task = current; struct mutex_waiter waiter; @@ -450,7 +450,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *owner; struct mspin_node node; - if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) { + if (use_ww_ctx && ww_ctx->acquired > 0) { struct ww_mutex *ww; ww = container_of(lock, struct ww_mutex, base); @@ -480,7 +480,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, if ((atomic_read(&lock->count) == 1) && (atomic_cmpxchg(&lock->count, 1, 0) == 1)) { lock_acquired(&lock->dep_map, ip); - if (!__builtin_constant_p(ww_ctx == NULL)) { + if (use_ww_ctx) { struct ww_mutex *ww; ww = container_of(lock, struct ww_mutex, base); @@ -551,7 +551,7 @@ slowpath: goto err; } - if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) { + if (use_ww_ctx && ww_ctx->acquired > 0) { ret = __mutex_lock_check_stamp(lock, ww_ctx); if (ret) goto err; @@ -575,7 +575,7 @@ skip_wait: lock_acquired(&lock->dep_map, ip); mutex_set_owner(lock); - if (!__builtin_constant_p(ww_ctx == NULL)) { + if (use_ww_ctx) { struct ww_mutex *ww = container_of(lock, struct ww_mutex, base); struct mutex_waiter *cur; @@ -615,7 +615,7 @@ mutex_lock_nested(struct mutex *lock, unsigned int subclass) { might_sleep(); __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, - subclass, NULL, _RET_IP_, NULL); + subclass, NULL, _RET_IP_, NULL, 0); } EXPORT_SYMBOL_GPL(mutex_lock_nested); @@ -625,7 +625,7 @@ _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest) { might_sleep(); __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, - 0, nest, _RET_IP_, NULL); + 0, nest, _RET_IP_, NULL, 0); } EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock); @@ -635,7 +635,7 @@ mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass) { might_sleep(); return __mutex_lock_common(lock, TASK_KILLABLE, - subclass, NULL, _RET_IP_, NULL); + subclass, NULL, _RET_IP_, NULL, 0); } EXPORT_SYMBOL_GPL(mutex_lock_killable_nested); @@ -644,7 +644,7 @@ mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass) { might_sleep(); return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, - subclass, NULL, _RET_IP_, NULL); + subclass, NULL, _RET_IP_, NULL, 0); } EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested); @@ -682,7 +682,7 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) might_sleep(); ret = __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE, - 0, &ctx->dep_map, _RET_IP_, ctx); + 0, &ctx->dep_map, _RET_IP_, ctx, 1); if (!ret && ctx->acquired > 1) return ww_mutex_deadlock_injection(lock, ctx); @@ -697,7 +697,7 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) might_sleep(); ret = __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE, - 0, &ctx->dep_map, _RET_IP_, ctx); + 0, &ctx->dep_map, _RET_IP_, ctx, 1); if (!ret && ctx->acquired > 1) return ww_mutex_deadlock_injection(lock, ctx); @@ -809,28 +809,28 @@ __mutex_lock_slowpath(atomic_t *lock_count) struct mutex *lock = container_of(lock_count, struct mutex, count); __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, - NULL, _RET_IP_, NULL); + NULL, _RET_IP_, NULL, 0); } static noinline int __sched __mutex_lock_killable_slowpath(struct mutex *lock) { return __mutex_lock_common(lock, TASK_KILLABLE, 0, - NULL, _RET_IP_, NULL); + NULL, _RET_IP_, NULL, 0); } static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock) { return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, - NULL, _RET_IP_, NULL); + NULL, _RET_IP_, NULL, 0); } static noinline int __sched __ww_mutex_lock_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { return __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE, 0, - NULL, _RET_IP_, ctx); + NULL, _RET_IP_, ctx, 1); } static noinline int __sched @@ -838,7 +838,7 @@ __ww_mutex_lock_interruptible_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { return __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE, 0, - NULL, _RET_IP_, ctx); + NULL, _RET_IP_, ctx, 1); } #endif diff --git a/kernel/params.c b/kernel/params.c index 81c4e78c8f4c..c00d5b502aa4 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -254,11 +254,11 @@ int parse_args(const char *doing, STANDARD_PARAM_DEF(byte, unsigned char, "%hhu", unsigned long, kstrtoul); -STANDARD_PARAM_DEF(short, short, "%hi", long, kstrtoul); +STANDARD_PARAM_DEF(short, short, "%hi", long, kstrtol); STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, kstrtoul); -STANDARD_PARAM_DEF(int, int, "%i", long, kstrtoul); +STANDARD_PARAM_DEF(int, int, "%i", long, kstrtol); STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, kstrtoul); -STANDARD_PARAM_DEF(long, long, "%li", long, kstrtoul); +STANDARD_PARAM_DEF(long, long, "%li", long, kstrtol); STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, kstrtoul); int param_set_charp(const char *val, const struct kernel_param *kp) diff --git a/kernel/pid.c b/kernel/pid.c index ebe5e80b10f8..9b9a26698144 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -273,6 +273,11 @@ void free_pid(struct pid *pid) */ wake_up_process(ns->child_reaper); break; + case PIDNS_HASH_ADDING: + /* Handle a fork failure of the first process */ + WARN_ON(ns->child_reaper); + ns->nr_hashed = 0; + /* fall through */ case 0: schedule_work(&ns->proc_work); break; diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 358a146fd4da..98c3b34a4cff 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -743,7 +743,10 @@ int create_basic_memory_bitmaps(void) struct memory_bitmap *bm1, *bm2; int error = 0; - BUG_ON(forbidden_pages_map || free_pages_map); + if (forbidden_pages_map && free_pages_map) + return 0; + else + BUG_ON(forbidden_pages_map || free_pages_map); bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL); if (!bm1) diff --git a/kernel/power/user.c b/kernel/power/user.c index 72e8f4fd616d..957f06164ad1 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -39,6 +39,7 @@ static struct snapshot_data { char frozen; char ready; char platform_support; + bool free_bitmaps; } snapshot_state; atomic_t snapshot_device_available = ATOMIC_INIT(1); @@ -82,6 +83,10 @@ static int snapshot_open(struct inode *inode, struct file *filp) data->swap = -1; data->mode = O_WRONLY; error = pm_notifier_call_chain(PM_RESTORE_PREPARE); + if (!error) { + error = create_basic_memory_bitmaps(); + data->free_bitmaps = !error; + } if (error) pm_notifier_call_chain(PM_POST_RESTORE); } @@ -111,6 +116,8 @@ static int snapshot_release(struct inode *inode, struct file *filp) pm_restore_gfp_mask(); free_basic_memory_bitmaps(); thaw_processes(); + } else if (data->free_bitmaps) { + free_basic_memory_bitmaps(); } pm_notifier_call_chain(data->mode == O_RDONLY ? PM_POST_HIBERNATION : PM_POST_RESTORE); @@ -231,6 +238,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd, break; pm_restore_gfp_mask(); free_basic_memory_bitmaps(); + data->free_bitmaps = false; thaw_processes(); data->frozen = 0; break; diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile new file mode 100644 index 000000000000..01e9ec37a3e3 --- /dev/null +++ b/kernel/rcu/Makefile @@ -0,0 +1,6 @@ +obj-y += update.o srcu.o +obj-$(CONFIG_RCU_TORTURE_TEST) += torture.o +obj-$(CONFIG_TREE_RCU) += tree.o +obj-$(CONFIG_TREE_PREEMPT_RCU) += tree.o +obj-$(CONFIG_TREE_RCU_TRACE) += tree_trace.o +obj-$(CONFIG_TINY_RCU) += tiny.o diff --git a/kernel/rcu.h b/kernel/rcu/rcu.h similarity index 94% rename from kernel/rcu.h rename to kernel/rcu/rcu.h index 77131966c4ad..7859a0a3951e 100644 --- a/kernel/rcu.h +++ b/kernel/rcu/rcu.h @@ -122,4 +122,11 @@ int rcu_jiffies_till_stall_check(void); #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ +/* + * Strings used in tracepoints need to be exported via the + * tracing system such that tools like perf and trace-cmd can + * translate the string address pointers to actual text. + */ +#define TPS(x) tracepoint_string(x) + #endif /* __LINUX_RCU_H */ diff --git a/kernel/srcu.c b/kernel/rcu/srcu.c similarity index 100% rename from kernel/srcu.c rename to kernel/rcu/srcu.c diff --git a/kernel/rcutiny.c b/kernel/rcu/tiny.c similarity index 90% rename from kernel/rcutiny.c rename to kernel/rcu/tiny.c index 9ed6075dc562..0c9a934cfec1 100644 --- a/kernel/rcutiny.c +++ b/kernel/rcu/tiny.c @@ -35,6 +35,7 @@ #include #include #include +#include #ifdef CONFIG_RCU_TRACE #include @@ -42,7 +43,7 @@ #include "rcu.h" -/* Forward declarations for rcutiny_plugin.h. */ +/* Forward declarations for tiny_plugin.h. */ struct rcu_ctrlblk; static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp); static void rcu_process_callbacks(struct softirq_action *unused); @@ -52,22 +53,23 @@ static void __call_rcu(struct rcu_head *head, static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; -#include "rcutiny_plugin.h" +#include "tiny_plugin.h" /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */ static void rcu_idle_enter_common(long long newval) { if (newval) { - RCU_TRACE(trace_rcu_dyntick("--=", + RCU_TRACE(trace_rcu_dyntick(TPS("--="), rcu_dynticks_nesting, newval)); rcu_dynticks_nesting = newval; return; } - RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting, newval)); + RCU_TRACE(trace_rcu_dyntick(TPS("Start"), + rcu_dynticks_nesting, newval)); if (!is_idle_task(current)) { - struct task_struct *idle = idle_task(smp_processor_id()); + struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); - RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task", + RCU_TRACE(trace_rcu_dyntick(TPS("Entry error: not idle task"), rcu_dynticks_nesting, newval)); ftrace_dump(DUMP_ALL); WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", @@ -120,15 +122,15 @@ EXPORT_SYMBOL_GPL(rcu_irq_exit); static void rcu_idle_exit_common(long long oldval) { if (oldval) { - RCU_TRACE(trace_rcu_dyntick("++=", + RCU_TRACE(trace_rcu_dyntick(TPS("++="), oldval, rcu_dynticks_nesting)); return; } - RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting)); + RCU_TRACE(trace_rcu_dyntick(TPS("End"), oldval, rcu_dynticks_nesting)); if (!is_idle_task(current)) { - struct task_struct *idle = idle_task(smp_processor_id()); + struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); - RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task", + RCU_TRACE(trace_rcu_dyntick(TPS("Exit error: not idle task"), oldval, rcu_dynticks_nesting)); ftrace_dump(DUMP_ALL); WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", @@ -174,18 +176,18 @@ void rcu_irq_enter(void) } EXPORT_SYMBOL_GPL(rcu_irq_enter); -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) /* * Test whether RCU thinks that the current CPU is idle. */ -int rcu_is_cpu_idle(void) +bool __rcu_is_watching(void) { - return !rcu_dynticks_nesting; + return rcu_dynticks_nesting; } -EXPORT_SYMBOL(rcu_is_cpu_idle); +EXPORT_SYMBOL(__rcu_is_watching); -#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ +#endif /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */ /* * Test whether the current CPU was interrupted from idle. Nested @@ -273,7 +275,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp) if (&rcp->rcucblist == rcp->donetail) { RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, 0, -1)); RCU_TRACE(trace_rcu_batch_end(rcp->name, 0, - ACCESS_ONCE(rcp->rcucblist), + !!ACCESS_ONCE(rcp->rcucblist), need_resched(), is_idle_task(current), false)); @@ -304,7 +306,8 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp) RCU_TRACE(cb_count++); } RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count)); - RCU_TRACE(trace_rcu_batch_end(rcp->name, cb_count, 0, need_resched(), + RCU_TRACE(trace_rcu_batch_end(rcp->name, + cb_count, 0, need_resched(), is_idle_task(current), false)); } diff --git a/kernel/rcutiny_plugin.h b/kernel/rcu/tiny_plugin.h similarity index 100% rename from kernel/rcutiny_plugin.h rename to kernel/rcu/tiny_plugin.h diff --git a/kernel/rcutorture.c b/kernel/rcu/torture.c similarity index 99% rename from kernel/rcutorture.c rename to kernel/rcu/torture.c index be63101c6175..3929cd451511 100644 --- a/kernel/rcutorture.c +++ b/kernel/rcu/torture.c @@ -52,6 +52,12 @@ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Paul E. McKenney and Josh Triplett "); +MODULE_ALIAS("rcutorture"); +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +#define MODULE_PARAM_PREFIX "rcutorture." + static int fqs_duration; module_param(fqs_duration, int, 0444); MODULE_PARM_DESC(fqs_duration, "Duration of fqs bursts (us), 0 to disable"); diff --git a/kernel/rcutree.c b/kernel/rcu/tree.c similarity index 94% rename from kernel/rcutree.c rename to kernel/rcu/tree.c index 32618b3fe4e6..4c06ddfea7cd 100644 --- a/kernel/rcutree.c +++ b/kernel/rcu/tree.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -56,17 +57,16 @@ #include #include -#include "rcutree.h" +#include "tree.h" #include #include "rcu.h" -/* - * Strings used in tracepoints need to be exported via the - * tracing system such that tools like perf and trace-cmd can - * translate the string address pointers to actual text. - */ -#define TPS(x) tracepoint_string(x) +MODULE_ALIAS("rcutree"); +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +#define MODULE_PARAM_PREFIX "rcutree." /* Data structures. */ @@ -222,7 +222,7 @@ void rcu_note_context_switch(int cpu) } EXPORT_SYMBOL_GPL(rcu_note_context_switch); -DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { +static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, .dynticks = ATOMIC_INIT(1), #ifdef CONFIG_NO_HZ_FULL_SYSIDLE @@ -371,7 +371,8 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval, { trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); if (!user && !is_idle_task(current)) { - struct task_struct *idle = idle_task(smp_processor_id()); + struct task_struct *idle __maybe_unused = + idle_task(smp_processor_id()); trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); ftrace_dump(DUMP_ORIG); @@ -407,7 +408,7 @@ static void rcu_eqs_enter(bool user) long long oldval; struct rcu_dynticks *rdtp; - rdtp = &__get_cpu_var(rcu_dynticks); + rdtp = this_cpu_ptr(&rcu_dynticks); oldval = rdtp->dynticks_nesting; WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0); if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) @@ -435,7 +436,7 @@ void rcu_idle_enter(void) local_irq_save(flags); rcu_eqs_enter(false); - rcu_sysidle_enter(&__get_cpu_var(rcu_dynticks), 0); + rcu_sysidle_enter(this_cpu_ptr(&rcu_dynticks), 0); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_enter); @@ -478,7 +479,7 @@ void rcu_irq_exit(void) struct rcu_dynticks *rdtp; local_irq_save(flags); - rdtp = &__get_cpu_var(rcu_dynticks); + rdtp = this_cpu_ptr(&rcu_dynticks); oldval = rdtp->dynticks_nesting; rdtp->dynticks_nesting--; WARN_ON_ONCE(rdtp->dynticks_nesting < 0); @@ -508,7 +509,8 @@ static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval, rcu_cleanup_after_idle(smp_processor_id()); trace_rcu_dyntick(TPS("End"), oldval, rdtp->dynticks_nesting); if (!user && !is_idle_task(current)) { - struct task_struct *idle = idle_task(smp_processor_id()); + struct task_struct *idle __maybe_unused = + idle_task(smp_processor_id()); trace_rcu_dyntick(TPS("Error on exit: not idle task"), oldval, rdtp->dynticks_nesting); @@ -528,7 +530,7 @@ static void rcu_eqs_exit(bool user) struct rcu_dynticks *rdtp; long long oldval; - rdtp = &__get_cpu_var(rcu_dynticks); + rdtp = this_cpu_ptr(&rcu_dynticks); oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(oldval < 0); if (oldval & DYNTICK_TASK_NEST_MASK) @@ -555,7 +557,7 @@ void rcu_idle_exit(void) local_irq_save(flags); rcu_eqs_exit(false); - rcu_sysidle_exit(&__get_cpu_var(rcu_dynticks), 0); + rcu_sysidle_exit(this_cpu_ptr(&rcu_dynticks), 0); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_exit); @@ -599,7 +601,7 @@ void rcu_irq_enter(void) long long oldval; local_irq_save(flags); - rdtp = &__get_cpu_var(rcu_dynticks); + rdtp = this_cpu_ptr(&rcu_dynticks); oldval = rdtp->dynticks_nesting; rdtp->dynticks_nesting++; WARN_ON_ONCE(rdtp->dynticks_nesting == 0); @@ -620,7 +622,7 @@ void rcu_irq_enter(void) */ void rcu_nmi_enter(void) { - struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); if (rdtp->dynticks_nmi_nesting == 0 && (atomic_read(&rdtp->dynticks) & 0x1)) @@ -642,7 +644,7 @@ void rcu_nmi_enter(void) */ void rcu_nmi_exit(void) { - struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); if (rdtp->dynticks_nmi_nesting == 0 || --rdtp->dynticks_nmi_nesting != 0) @@ -655,21 +657,34 @@ void rcu_nmi_exit(void) } /** - * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle + * __rcu_is_watching - are RCU read-side critical sections safe? + * + * Return true if RCU is watching the running CPU, which means that + * this CPU can safely enter RCU read-side critical sections. Unlike + * rcu_is_watching(), the caller of __rcu_is_watching() must have at + * least disabled preemption. + */ +bool __rcu_is_watching(void) +{ + return atomic_read(this_cpu_ptr(&rcu_dynticks.dynticks)) & 0x1; +} + +/** + * rcu_is_watching - see if RCU thinks that the current CPU is idle * * If the current CPU is in its idle loop and is neither in an interrupt * or NMI handler, return true. */ -int rcu_is_cpu_idle(void) +bool rcu_is_watching(void) { int ret; preempt_disable(); - ret = (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0; + ret = __rcu_is_watching(); preempt_enable(); return ret; } -EXPORT_SYMBOL(rcu_is_cpu_idle); +EXPORT_SYMBOL_GPL(rcu_is_watching); #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) @@ -703,7 +718,7 @@ bool rcu_lockdep_current_cpu_online(void) if (in_nmi()) return 1; preempt_disable(); - rdp = &__get_cpu_var(rcu_sched_data); + rdp = this_cpu_ptr(&rcu_sched_data); rnp = rdp->mynode; ret = (rdp->grpmask & rnp->qsmaskinit) || !rcu_scheduler_fully_active; @@ -723,7 +738,7 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online); */ static int rcu_is_cpu_rrupt_from_idle(void) { - return __get_cpu_var(rcu_dynticks).dynticks_nesting <= 1; + return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1; } /* @@ -802,8 +817,11 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp, static void record_gp_stall_check_time(struct rcu_state *rsp) { - rsp->gp_start = jiffies; - rsp->jiffies_stall = jiffies + rcu_jiffies_till_stall_check(); + unsigned long j = ACCESS_ONCE(jiffies); + + rsp->gp_start = j; + smp_wmb(); /* Record start time before stall time. */ + rsp->jiffies_stall = j + rcu_jiffies_till_stall_check(); } /* @@ -898,6 +916,12 @@ static void print_other_cpu_stall(struct rcu_state *rsp) force_quiescent_state(rsp); /* Kick them all. */ } +/* + * This function really isn't for public consumption, but RCU is special in + * that context switches can allow the state machine to make progress. + */ +extern void resched_cpu(int cpu); + static void print_cpu_stall(struct rcu_state *rsp) { int cpu; @@ -927,22 +951,60 @@ static void print_cpu_stall(struct rcu_state *rsp) 3 * rcu_jiffies_till_stall_check() + 3; raw_spin_unlock_irqrestore(&rnp->lock, flags); - set_need_resched(); /* kick ourselves to get things going. */ + /* + * Attempt to revive the RCU machinery by forcing a context switch. + * + * A context switch would normally allow the RCU state machine to make + * progress and it could be we're stuck in kernel space without context + * switches for an entirely unreasonable amount of time. + */ + resched_cpu(smp_processor_id()); } static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp) { + unsigned long completed; + unsigned long gpnum; + unsigned long gps; unsigned long j; unsigned long js; struct rcu_node *rnp; - if (rcu_cpu_stall_suppress) + if (rcu_cpu_stall_suppress || !rcu_gp_in_progress(rsp)) return; j = ACCESS_ONCE(jiffies); + + /* + * Lots of memory barriers to reject false positives. + * + * The idea is to pick up rsp->gpnum, then rsp->jiffies_stall, + * then rsp->gp_start, and finally rsp->completed. These values + * are updated in the opposite order with memory barriers (or + * equivalent) during grace-period initialization and cleanup. + * Now, a false positive can occur if we get an new value of + * rsp->gp_start and a old value of rsp->jiffies_stall. But given + * the memory barriers, the only way that this can happen is if one + * grace period ends and another starts between these two fetches. + * Detect this by comparing rsp->completed with the previous fetch + * from rsp->gpnum. + * + * Given this check, comparisons of jiffies, rsp->jiffies_stall, + * and rsp->gp_start suffice to forestall false positives. + */ + gpnum = ACCESS_ONCE(rsp->gpnum); + smp_rmb(); /* Pick up ->gpnum first... */ js = ACCESS_ONCE(rsp->jiffies_stall); + smp_rmb(); /* ...then ->jiffies_stall before the rest... */ + gps = ACCESS_ONCE(rsp->gp_start); + smp_rmb(); /* ...and finally ->gp_start before ->completed. */ + completed = ACCESS_ONCE(rsp->completed); + if (ULONG_CMP_GE(completed, gpnum) || + ULONG_CMP_LT(j, js) || + ULONG_CMP_GE(gps, js)) + return; /* No stall or GP completed since entering function. */ rnp = rdp->mynode; if (rcu_gp_in_progress(rsp) && - (ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && ULONG_CMP_GE(j, js)) { + (ACCESS_ONCE(rnp->qsmask) & rdp->grpmask)) { /* We haven't checked in, so go dump stack. */ print_cpu_stall(rsp); @@ -1297,7 +1359,7 @@ static void note_gp_changes(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * Initialize a new grace period. + * Initialize a new grace period. Return 0 if no grace period required. */ static int rcu_gp_init(struct rcu_state *rsp) { @@ -1306,18 +1368,27 @@ static int rcu_gp_init(struct rcu_state *rsp) rcu_bind_gp_kthread(); raw_spin_lock_irq(&rnp->lock); + if (rsp->gp_flags == 0) { + /* Spurious wakeup, tell caller to go back to sleep. */ + raw_spin_unlock_irq(&rnp->lock); + return 0; + } rsp->gp_flags = 0; /* Clear all flags: New grace period. */ - if (rcu_gp_in_progress(rsp)) { - /* Grace period already in progress, don't start another. */ + if (WARN_ON_ONCE(rcu_gp_in_progress(rsp))) { + /* + * Grace period already in progress, don't start another. + * Not supposed to be able to happen. + */ raw_spin_unlock_irq(&rnp->lock); return 0; } /* Advance to a new grace period and initialize state. */ + record_gp_stall_check_time(rsp); + smp_wmb(); /* Record GP times before starting GP. */ rsp->gpnum++; trace_rcu_grace_period(rsp->name, rsp->gpnum, TPS("start")); - record_gp_stall_check_time(rsp); raw_spin_unlock_irq(&rnp->lock); /* Exclude any concurrent CPU-hotplug operations. */ @@ -1366,7 +1437,7 @@ static int rcu_gp_init(struct rcu_state *rsp) /* * Do one round of quiescent-state forcing. */ -int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) +static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) { int fqs_state = fqs_state_in; bool isidle = false; @@ -1451,8 +1522,12 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) rsp->fqs_state = RCU_GP_IDLE; rdp = this_cpu_ptr(rsp->rda); rcu_advance_cbs(rsp, rnp, rdp); /* Reduce false positives below. */ - if (cpu_needs_another_gp(rsp, rdp)) - rsp->gp_flags = 1; + if (cpu_needs_another_gp(rsp, rdp)) { + rsp->gp_flags = RCU_GP_FLAG_INIT; + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("newreq")); + } raw_spin_unlock_irq(&rnp->lock); } @@ -1462,6 +1537,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp) static int __noreturn rcu_gp_kthread(void *arg) { int fqs_state; + int gf; unsigned long j; int ret; struct rcu_state *rsp = arg; @@ -1471,14 +1547,19 @@ static int __noreturn rcu_gp_kthread(void *arg) /* Handle grace-period start. */ for (;;) { + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("reqwait")); wait_event_interruptible(rsp->gp_wq, - rsp->gp_flags & + ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_INIT); - if ((rsp->gp_flags & RCU_GP_FLAG_INIT) && - rcu_gp_init(rsp)) + if (rcu_gp_init(rsp)) break; cond_resched(); flush_signals(current); + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("reqwaitsig")); } /* Handle quiescent-state forcing. */ @@ -1488,10 +1569,16 @@ static int __noreturn rcu_gp_kthread(void *arg) j = HZ; jiffies_till_first_fqs = HZ; } + ret = 0; for (;;) { - rsp->jiffies_force_qs = jiffies + j; + if (!ret) + rsp->jiffies_force_qs = jiffies + j; + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("fqswait")); ret = wait_event_interruptible_timeout(rsp->gp_wq, - (rsp->gp_flags & RCU_GP_FLAG_FQS) || + ((gf = ACCESS_ONCE(rsp->gp_flags)) & + RCU_GP_FLAG_FQS) || (!ACCESS_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp)), j); @@ -1500,13 +1587,23 @@ static int __noreturn rcu_gp_kthread(void *arg) !rcu_preempt_blocked_readers_cgp(rnp)) break; /* If time for quiescent-state forcing, do it. */ - if (ret == 0 || (rsp->gp_flags & RCU_GP_FLAG_FQS)) { + if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) || + (gf & RCU_GP_FLAG_FQS)) { + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("fqsstart")); fqs_state = rcu_gp_fqs(rsp, fqs_state); + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("fqsend")); cond_resched(); } else { /* Deal with stray signal. */ cond_resched(); flush_signals(current); + trace_rcu_grace_period(rsp->name, + ACCESS_ONCE(rsp->gpnum), + TPS("fqswaitsig")); } j = jiffies_till_next_fqs; if (j > HZ) { @@ -1554,6 +1651,8 @@ rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp, return; } rsp->gp_flags = RCU_GP_FLAG_INIT; + trace_rcu_grace_period(rsp->name, ACCESS_ONCE(rsp->gpnum), + TPS("newreq")); /* * We can't do wakeups while holding the rnp->lock, as that @@ -2255,7 +2354,7 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp, * If called from an extended quiescent state, invoke the RCU * core in order to force a re-evaluation of RCU's idleness. */ - if (rcu_is_cpu_idle() && cpu_online(smp_processor_id())) + if (!rcu_is_watching() && cpu_online(smp_processor_id())) invoke_rcu_core(); /* If interrupts were disabled or CPU offline, don't invoke RCU core. */ @@ -2725,10 +2824,13 @@ static int rcu_cpu_has_callbacks(int cpu, bool *all_lazy) for_each_rcu_flavor(rsp) { rdp = per_cpu_ptr(rsp->rda, cpu); - if (rdp->qlen != rdp->qlen_lazy) + if (!rdp->nxtlist) + continue; + hc = true; + if (rdp->qlen != rdp->qlen_lazy || !all_lazy) { al = false; - if (rdp->nxtlist) - hc = true; + break; + } } if (all_lazy) *all_lazy = al; @@ -3216,7 +3318,7 @@ static void __init rcu_init_one(struct rcu_state *rsp, /* * Compute the rcu_node tree geometry from kernel parameters. This cannot - * replace the definitions in rcutree.h because those are needed to size + * replace the definitions in tree.h because those are needed to size * the ->node array in the rcu_state structure. */ static void __init rcu_init_geometry(void) @@ -3295,8 +3397,8 @@ void __init rcu_init(void) rcu_bootup_announce(); rcu_init_geometry(); - rcu_init_one(&rcu_sched_state, &rcu_sched_data); rcu_init_one(&rcu_bh_state, &rcu_bh_data); + rcu_init_one(&rcu_sched_state, &rcu_sched_data); __rcu_init_preempt(); open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); @@ -3311,4 +3413,4 @@ void __init rcu_init(void) rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu); } -#include "rcutree_plugin.h" +#include "tree_plugin.h" diff --git a/kernel/rcutree.h b/kernel/rcu/tree.h similarity index 99% rename from kernel/rcutree.h rename to kernel/rcu/tree.h index 5f97eab602cd..52be957c9fe2 100644 --- a/kernel/rcutree.h +++ b/kernel/rcu/tree.h @@ -104,6 +104,8 @@ struct rcu_dynticks { /* idle-period nonlazy_posted snapshot. */ unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */ + unsigned long last_advance_all; + /* Last jiffy CBs were all advanced. */ int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ }; diff --git a/kernel/rcutree_plugin.h b/kernel/rcu/tree_plugin.h similarity index 97% rename from kernel/rcutree_plugin.h rename to kernel/rcu/tree_plugin.h index 130c97b027f2..3822ac0c4b27 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -28,7 +28,7 @@ #include #include #include -#include "time/tick-internal.h" +#include "../time/tick-internal.h" #define RCU_KTHREAD_PRIO 1 @@ -96,10 +96,15 @@ static void __init rcu_bootup_announce_oddness(void) #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */ #ifdef CONFIG_RCU_NOCB_CPU_ALL pr_info("\tOffload RCU callbacks from all CPUs\n"); - cpumask_setall(rcu_nocb_mask); + cpumask_copy(rcu_nocb_mask, cpu_possible_mask); #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */ #endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */ if (have_rcu_nocb_mask) { + if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { + pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.\n"); + cpumask_and(rcu_nocb_mask, cpu_possible_mask, + rcu_nocb_mask); + } cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask); pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf); if (rcu_nocb_poll) @@ -660,7 +665,7 @@ static void rcu_preempt_check_callbacks(int cpu) static void rcu_preempt_do_callbacks(void) { - rcu_do_batch(&rcu_preempt_state, &__get_cpu_var(rcu_preempt_data)); + rcu_do_batch(&rcu_preempt_state, this_cpu_ptr(&rcu_preempt_data)); } #endif /* #ifdef CONFIG_RCU_BOOST */ @@ -1128,7 +1133,7 @@ void exit_rcu(void) #ifdef CONFIG_RCU_BOOST -#include "rtmutex_common.h" +#include "../rtmutex_common.h" #ifdef CONFIG_RCU_TRACE @@ -1332,7 +1337,7 @@ static void invoke_rcu_callbacks_kthread(void) */ static bool rcu_is_callbacks_kthread(void) { - return __get_cpu_var(rcu_cpu_kthread_task) == current; + return __this_cpu_read(rcu_cpu_kthread_task) == current; } #define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000) @@ -1382,8 +1387,8 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp, static void rcu_kthread_do_work(void) { - rcu_do_batch(&rcu_sched_state, &__get_cpu_var(rcu_sched_data)); - rcu_do_batch(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); + rcu_do_batch(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data)); + rcu_do_batch(&rcu_bh_state, this_cpu_ptr(&rcu_bh_data)); rcu_preempt_do_callbacks(); } @@ -1402,7 +1407,7 @@ static void rcu_cpu_kthread_park(unsigned int cpu) static int rcu_cpu_kthread_should_run(unsigned int cpu) { - return __get_cpu_var(rcu_cpu_has_work); + return __this_cpu_read(rcu_cpu_has_work); } /* @@ -1412,8 +1417,8 @@ static int rcu_cpu_kthread_should_run(unsigned int cpu) */ static void rcu_cpu_kthread(unsigned int cpu) { - unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status); - char work, *workp = &__get_cpu_var(rcu_cpu_has_work); + unsigned int *statusp = this_cpu_ptr(&rcu_cpu_kthread_status); + char work, *workp = this_cpu_ptr(&rcu_cpu_has_work); int spincnt; for (spincnt = 0; spincnt < 10; spincnt++) { @@ -1630,17 +1635,23 @@ module_param(rcu_idle_lazy_gp_delay, int, 0644); extern int tick_nohz_enabled; /* - * Try to advance callbacks for all flavors of RCU on the current CPU. - * Afterwards, if there are any callbacks ready for immediate invocation, - * return true. + * Try to advance callbacks for all flavors of RCU on the current CPU, but + * only if it has been awhile since the last time we did so. Afterwards, + * if there are any callbacks ready for immediate invocation, return true. */ static bool rcu_try_advance_all_cbs(void) { bool cbs_ready = false; struct rcu_data *rdp; + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); struct rcu_node *rnp; struct rcu_state *rsp; + /* Exit early if we advanced recently. */ + if (jiffies == rdtp->last_advance_all) + return 0; + rdtp->last_advance_all = jiffies; + for_each_rcu_flavor(rsp) { rdp = this_cpu_ptr(rsp->rda); rnp = rdp->mynode; @@ -1739,6 +1750,8 @@ static void rcu_prepare_for_idle(int cpu) */ if (rdtp->all_lazy && rdtp->nonlazy_posted != rdtp->nonlazy_posted_snap) { + rdtp->all_lazy = false; + rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted; invoke_rcu_core(); return; } @@ -1768,17 +1781,11 @@ static void rcu_prepare_for_idle(int cpu) */ static void rcu_cleanup_after_idle(int cpu) { - struct rcu_data *rdp; - struct rcu_state *rsp; if (rcu_is_nocb_cpu(cpu)) return; - rcu_try_advance_all_cbs(); - for_each_rcu_flavor(rsp) { - rdp = per_cpu_ptr(rsp->rda, cpu); - if (cpu_has_callbacks_ready_to_invoke(rdp)) - invoke_rcu_core(); - } + if (rcu_try_advance_all_cbs()) + invoke_rcu_core(); } /* @@ -2108,15 +2115,22 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, /* If we are not being polled and there is a kthread, awaken it ... */ t = ACCESS_ONCE(rdp->nocb_kthread); - if (rcu_nocb_poll | !t) + if (rcu_nocb_poll || !t) { + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WakeNotPoll")); return; + } len = atomic_long_read(&rdp->nocb_q_count); if (old_rhpp == &rdp->nocb_head) { wake_up(&rdp->nocb_wq); /* ... only if queue was empty ... */ rdp->qlen_last_fqs_check = 0; + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeEmpty")); } else if (len > rdp->qlen_last_fqs_check + qhimark) { wake_up_process(t); /* ... or if many callbacks queued. */ rdp->qlen_last_fqs_check = LONG_MAX / 2; + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeOvf")); + } else { + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeNot")); } return; } @@ -2140,10 +2154,12 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, if (__is_kfree_rcu_offset((unsigned long)rhp->func)) trace_rcu_kfree_callback(rdp->rsp->name, rhp, (unsigned long)rhp->func, - rdp->qlen_lazy, rdp->qlen); + -atomic_long_read(&rdp->nocb_q_count_lazy), + -atomic_long_read(&rdp->nocb_q_count)); else trace_rcu_callback(rdp->rsp->name, rhp, - rdp->qlen_lazy, rdp->qlen); + -atomic_long_read(&rdp->nocb_q_count_lazy), + -atomic_long_read(&rdp->nocb_q_count)); return 1; } @@ -2221,6 +2237,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp) static int rcu_nocb_kthread(void *arg) { int c, cl; + bool firsttime = 1; struct rcu_head *list; struct rcu_head *next; struct rcu_head **tail; @@ -2229,14 +2246,27 @@ static int rcu_nocb_kthread(void *arg) /* Each pass through this loop invokes one batch of callbacks */ for (;;) { /* If not polling, wait for next batch of callbacks. */ - if (!rcu_nocb_poll) + if (!rcu_nocb_poll) { + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("Sleep")); wait_event_interruptible(rdp->nocb_wq, rdp->nocb_head); + } else if (firsttime) { + firsttime = 0; + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("Poll")); + } list = ACCESS_ONCE(rdp->nocb_head); if (!list) { + if (!rcu_nocb_poll) + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WokeEmpty")); schedule_timeout_interruptible(1); flush_signals(current); continue; } + firsttime = 1; + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WokeNonEmpty")); /* * Extract queued callbacks, update counts, and wait @@ -2257,7 +2287,11 @@ static int rcu_nocb_kthread(void *arg) next = list->next; /* Wait for enqueuing to complete, if needed. */ while (next == NULL && &list->next != tail) { + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WaitQueue")); schedule_timeout_interruptible(1); + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, + TPS("WokeQueue")); next = list->next; } debug_rcu_head_unqueue(list); diff --git a/kernel/rcutree_trace.c b/kernel/rcu/tree_trace.c similarity index 99% rename from kernel/rcutree_trace.c rename to kernel/rcu/tree_trace.c index cf6c17412932..3596797b7e46 100644 --- a/kernel/rcutree_trace.c +++ b/kernel/rcu/tree_trace.c @@ -44,7 +44,7 @@ #include #define RCU_TREE_NONCORE -#include "rcutree.h" +#include "tree.h" static int r_open(struct inode *inode, struct file *file, const struct seq_operations *op) diff --git a/kernel/rcupdate.c b/kernel/rcu/update.c similarity index 97% rename from kernel/rcupdate.c rename to kernel/rcu/update.c index b02a339836b4..6cb3dff89e2b 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcu/update.c @@ -53,6 +53,12 @@ #include "rcu.h" +MODULE_ALIAS("rcupdate"); +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +#define MODULE_PARAM_PREFIX "rcupdate." + module_param(rcu_expedited, int, 0); #ifdef CONFIG_PREEMPT_RCU @@ -148,7 +154,7 @@ int rcu_read_lock_bh_held(void) { if (!debug_lockdep_rcu_enabled()) return 1; - if (rcu_is_cpu_idle()) + if (!rcu_is_watching()) return 0; if (!rcu_lockdep_current_cpu_online()) return 0; @@ -298,7 +304,7 @@ EXPORT_SYMBOL_GPL(do_trace_rcu_torture_read); #endif int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */ -int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT; +static int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT; module_param(rcu_cpu_stall_suppress, int, 0644); module_param(rcu_cpu_stall_timeout, int, 0644); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5ac63c9a995a..c06b8d345fae 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -513,12 +513,11 @@ static inline void init_hrtick(void) * might also involve a cross-CPU call to trigger the scheduler on * the target CPU. */ -#ifdef CONFIG_SMP void resched_task(struct task_struct *p) { int cpu; - assert_raw_spin_locked(&task_rq(p)->lock); + lockdep_assert_held(&task_rq(p)->lock); if (test_tsk_need_resched(p)) return; @@ -526,8 +525,10 @@ void resched_task(struct task_struct *p) set_tsk_need_resched(p); cpu = task_cpu(p); - if (cpu == smp_processor_id()) + if (cpu == smp_processor_id()) { + set_preempt_need_resched(); return; + } /* NEED_RESCHED must be visible before we test polling */ smp_mb(); @@ -546,6 +547,7 @@ void resched_cpu(int cpu) raw_spin_unlock_irqrestore(&rq->lock, flags); } +#ifdef CONFIG_SMP #ifdef CONFIG_NO_HZ_COMMON /* * In the semi idle case, use the nearest busy cpu for migrating timers @@ -693,12 +695,6 @@ void sched_avg_update(struct rq *rq) } } -#else /* !CONFIG_SMP */ -void resched_task(struct task_struct *p) -{ - assert_raw_spin_locked(&task_rq(p)->lock); - set_tsk_need_resched(p); -} #endif /* CONFIG_SMP */ #if defined(CONFIG_RT_GROUP_SCHED) || (defined(CONFIG_FAIR_GROUP_SCHED) && \ @@ -767,14 +763,14 @@ static void set_load_weight(struct task_struct *p) static void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { update_rq_clock(rq); - sched_info_queued(p); + sched_info_queued(rq, p); p->sched_class->enqueue_task(rq, p, flags); } static void dequeue_task(struct rq *rq, struct task_struct *p, int flags) { update_rq_clock(rq); - sched_info_dequeued(p); + sched_info_dequeued(rq, p); p->sched_class->dequeue_task(rq, p, flags); } @@ -987,7 +983,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) * ttwu() will sort out the placement. */ WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING && - !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); + !(task_preempt_count(p) & PREEMPT_ACTIVE)); #ifdef CONFIG_LOCKDEP /* @@ -1017,6 +1013,107 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) __set_task_cpu(p, new_cpu); } +static void __migrate_swap_task(struct task_struct *p, int cpu) +{ + if (p->on_rq) { + struct rq *src_rq, *dst_rq; + + src_rq = task_rq(p); + dst_rq = cpu_rq(cpu); + + deactivate_task(src_rq, p, 0); + set_task_cpu(p, cpu); + activate_task(dst_rq, p, 0); + check_preempt_curr(dst_rq, p, 0); + } else { + /* + * Task isn't running anymore; make it appear like we migrated + * it before it went to sleep. This means on wakeup we make the + * previous cpu our targer instead of where it really is. + */ + p->wake_cpu = cpu; + } +} + +struct migration_swap_arg { + struct task_struct *src_task, *dst_task; + int src_cpu, dst_cpu; +}; + +static int migrate_swap_stop(void *data) +{ + struct migration_swap_arg *arg = data; + struct rq *src_rq, *dst_rq; + int ret = -EAGAIN; + + src_rq = cpu_rq(arg->src_cpu); + dst_rq = cpu_rq(arg->dst_cpu); + + double_raw_lock(&arg->src_task->pi_lock, + &arg->dst_task->pi_lock); + double_rq_lock(src_rq, dst_rq); + if (task_cpu(arg->dst_task) != arg->dst_cpu) + goto unlock; + + if (task_cpu(arg->src_task) != arg->src_cpu) + goto unlock; + + if (!cpumask_test_cpu(arg->dst_cpu, tsk_cpus_allowed(arg->src_task))) + goto unlock; + + if (!cpumask_test_cpu(arg->src_cpu, tsk_cpus_allowed(arg->dst_task))) + goto unlock; + + __migrate_swap_task(arg->src_task, arg->dst_cpu); + __migrate_swap_task(arg->dst_task, arg->src_cpu); + + ret = 0; + +unlock: + double_rq_unlock(src_rq, dst_rq); + raw_spin_unlock(&arg->dst_task->pi_lock); + raw_spin_unlock(&arg->src_task->pi_lock); + + return ret; +} + +/* + * Cross migrate two tasks + */ +int migrate_swap(struct task_struct *cur, struct task_struct *p) +{ + struct migration_swap_arg arg; + int ret = -EINVAL; + + arg = (struct migration_swap_arg){ + .src_task = cur, + .src_cpu = task_cpu(cur), + .dst_task = p, + .dst_cpu = task_cpu(p), + }; + + if (arg.src_cpu == arg.dst_cpu) + goto out; + + /* + * These three tests are all lockless; this is OK since all of them + * will be re-checked with proper locks held further down the line. + */ + if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu)) + goto out; + + if (!cpumask_test_cpu(arg.dst_cpu, tsk_cpus_allowed(arg.src_task))) + goto out; + + if (!cpumask_test_cpu(arg.src_cpu, tsk_cpus_allowed(arg.dst_task))) + goto out; + + ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, migrate_swap_stop, &arg); + +out: + return ret; +} + struct migration_arg { struct task_struct *task; int dest_cpu; @@ -1236,9 +1333,9 @@ out: * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable. */ static inline -int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags) +int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags) { - int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags); + cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags); /* * In order not to call set_task_cpu() on a blocking task we need @@ -1330,12 +1427,13 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags) if (rq->idle_stamp) { u64 delta = rq_clock(rq) - rq->idle_stamp; - u64 max = 2*sysctl_sched_migration_cost; + u64 max = 2*rq->max_idle_balance_cost; + + update_avg(&rq->avg_idle, delta); - if (delta > max) + if (rq->avg_idle > max) rq->avg_idle = max; - else - update_avg(&rq->avg_idle, delta); + rq->idle_stamp = 0; } #endif @@ -1396,6 +1494,14 @@ static void sched_ttwu_pending(void) void scheduler_ipi(void) { + /* + * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting + * TIF_NEED_RESCHED remotely (for the first time) will also send + * this IPI. + */ + if (tif_need_resched()) + set_preempt_need_resched(); + if (llist_empty(&this_rq()->wake_list) && !tick_nohz_full_cpu(smp_processor_id()) && !got_nohz_idle_kick()) @@ -1513,7 +1619,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) if (p->sched_class->task_waking) p->sched_class->task_waking(p); - cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags); + cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags); if (task_cpu(p) != cpu) { wake_flags |= WF_MIGRATED; set_task_cpu(p, cpu); @@ -1595,7 +1701,7 @@ int wake_up_state(struct task_struct *p, unsigned int state) * * __sched_fork() is basic setup used by init_idle() too: */ -static void __sched_fork(struct task_struct *p) +static void __sched_fork(unsigned long clone_flags, struct task_struct *p) { p->on_rq = 0; @@ -1619,16 +1725,24 @@ static void __sched_fork(struct task_struct *p) #ifdef CONFIG_NUMA_BALANCING if (p->mm && atomic_read(&p->mm->mm_users) == 1) { - p->mm->numa_next_scan = jiffies; - p->mm->numa_next_reset = jiffies; + p->mm->numa_next_scan = jiffies + msecs_to_jiffies(sysctl_numa_balancing_scan_delay); p->mm->numa_scan_seq = 0; } + if (clone_flags & CLONE_VM) + p->numa_preferred_nid = current->numa_preferred_nid; + else + p->numa_preferred_nid = -1; + p->node_stamp = 0ULL; p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0; - p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0; p->numa_scan_period = sysctl_numa_balancing_scan_delay; p->numa_work.next = &p->numa_work; + p->numa_faults = NULL; + p->numa_faults_buffer = NULL; + + INIT_LIST_HEAD(&p->numa_entry); + p->numa_group = NULL; #endif /* CONFIG_NUMA_BALANCING */ } @@ -1654,12 +1768,12 @@ void set_numabalancing_state(bool enabled) /* * fork()/clone()-time setup: */ -void sched_fork(struct task_struct *p) +void sched_fork(unsigned long clone_flags, struct task_struct *p) { unsigned long flags; int cpu = get_cpu(); - __sched_fork(p); + __sched_fork(clone_flags, p); /* * We mark the process as running here. This guarantees that * nobody will actually run it, and a signal or other external @@ -1717,10 +1831,7 @@ void sched_fork(struct task_struct *p) #if defined(CONFIG_SMP) p->on_cpu = 0; #endif -#ifdef CONFIG_PREEMPT_COUNT - /* Want to start with kernel preemption disabled. */ - task_thread_info(p)->preempt_count = 1; -#endif + init_task_preempt_count(p); #ifdef CONFIG_SMP plist_node_init(&p->pushable_tasks, MAX_PRIO); #endif @@ -1747,7 +1858,7 @@ void wake_up_new_task(struct task_struct *p) * - cpus_allowed can change in the fork path * - any previously selected cpu might disappear through hotplug */ - set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0)); + set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); #endif /* Initialize new task's runnable average */ @@ -1838,7 +1949,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next) { trace_sched_switch(prev, next); - sched_info_switch(prev, next); + sched_info_switch(rq, prev, next); perf_event_task_sched_out(prev, next); fire_sched_out_preempt_notifiers(prev, next); prepare_lock_switch(rq, next); @@ -1890,6 +2001,8 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { + task_numa_free(prev); + /* * Remove function-return probe instances associated with this * task and put them back on the free list. @@ -2073,7 +2186,7 @@ void sched_exec(void) int dest_cpu; raw_spin_lock_irqsave(&p->pi_lock, flags); - dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0); + dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0); if (dest_cpu == smp_processor_id()) goto unlock; @@ -2215,7 +2328,7 @@ notrace unsigned long get_parent_ip(unsigned long addr) #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ defined(CONFIG_PREEMPT_TRACER)) -void __kprobes add_preempt_count(int val) +void __kprobes preempt_count_add(int val) { #ifdef CONFIG_DEBUG_PREEMPT /* @@ -2224,7 +2337,7 @@ void __kprobes add_preempt_count(int val) if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0))) return; #endif - preempt_count() += val; + __preempt_count_add(val); #ifdef CONFIG_DEBUG_PREEMPT /* * Spinlock count overflowing soon? @@ -2235,9 +2348,9 @@ void __kprobes add_preempt_count(int val) if (preempt_count() == val) trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); } -EXPORT_SYMBOL(add_preempt_count); +EXPORT_SYMBOL(preempt_count_add); -void __kprobes sub_preempt_count(int val) +void __kprobes preempt_count_sub(int val) { #ifdef CONFIG_DEBUG_PREEMPT /* @@ -2255,9 +2368,9 @@ void __kprobes sub_preempt_count(int val) if (preempt_count() == val) trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); - preempt_count() -= val; + __preempt_count_sub(val); } -EXPORT_SYMBOL(sub_preempt_count); +EXPORT_SYMBOL(preempt_count_sub); #endif @@ -2430,6 +2543,7 @@ need_resched: put_prev_task(rq, prev); next = pick_next_task(rq); clear_tsk_need_resched(prev); + clear_preempt_need_resched(); rq->skip_clock_update = 0; if (likely(prev != next)) { @@ -2520,9 +2634,9 @@ asmlinkage void __sched notrace preempt_schedule(void) return; do { - add_preempt_count_notrace(PREEMPT_ACTIVE); + __preempt_count_add(PREEMPT_ACTIVE); __schedule(); - sub_preempt_count_notrace(PREEMPT_ACTIVE); + __preempt_count_sub(PREEMPT_ACTIVE); /* * Check again in case we missed a preemption opportunity @@ -2541,20 +2655,19 @@ EXPORT_SYMBOL(preempt_schedule); */ asmlinkage void __sched preempt_schedule_irq(void) { - struct thread_info *ti = current_thread_info(); enum ctx_state prev_state; /* Catch callers which need to be fixed */ - BUG_ON(ti->preempt_count || !irqs_disabled()); + BUG_ON(preempt_count() || !irqs_disabled()); prev_state = exception_enter(); do { - add_preempt_count(PREEMPT_ACTIVE); + __preempt_count_add(PREEMPT_ACTIVE); local_irq_enable(); __schedule(); local_irq_disable(); - sub_preempt_count(PREEMPT_ACTIVE); + __preempt_count_sub(PREEMPT_ACTIVE); /* * Check again in case we missed a preemption opportunity @@ -3598,7 +3711,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) struct task_struct *p; int retval; - get_online_cpus(); rcu_read_lock(); p = find_process_by_pid(pid); @@ -3661,7 +3773,6 @@ out_free_cpus_allowed: free_cpumask_var(cpus_allowed); out_put_task: put_task_struct(p); - put_online_cpus(); return retval; } @@ -3706,7 +3817,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) unsigned long flags; int retval; - get_online_cpus(); rcu_read_lock(); retval = -ESRCH; @@ -3719,12 +3829,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) goto out_unlock; raw_spin_lock_irqsave(&p->pi_lock, flags); - cpumask_and(mask, &p->cpus_allowed, cpu_online_mask); + cpumask_and(mask, &p->cpus_allowed, cpu_active_mask); raw_spin_unlock_irqrestore(&p->pi_lock, flags); out_unlock: rcu_read_unlock(); - put_online_cpus(); return retval; } @@ -3794,16 +3903,11 @@ SYSCALL_DEFINE0(sched_yield) return 0; } -static inline int should_resched(void) -{ - return need_resched() && !(preempt_count() & PREEMPT_ACTIVE); -} - static void __cond_resched(void) { - add_preempt_count(PREEMPT_ACTIVE); + __preempt_count_add(PREEMPT_ACTIVE); __schedule(); - sub_preempt_count(PREEMPT_ACTIVE); + __preempt_count_sub(PREEMPT_ACTIVE); } int __sched _cond_resched(void) @@ -4186,7 +4290,7 @@ void init_idle(struct task_struct *idle, int cpu) raw_spin_lock_irqsave(&rq->lock, flags); - __sched_fork(idle); + __sched_fork(0, idle); idle->state = TASK_RUNNING; idle->se.exec_start = sched_clock(); @@ -4212,7 +4316,7 @@ void init_idle(struct task_struct *idle, int cpu) raw_spin_unlock_irqrestore(&rq->lock, flags); /* Set the preempt count _outside_ the spinlocks! */ - task_thread_info(idle)->preempt_count = 0; + init_idle_preempt_count(idle, cpu); /* * The idle tasks have their own, simple scheduling class: @@ -4346,6 +4450,53 @@ fail: return ret; } +#ifdef CONFIG_NUMA_BALANCING +/* Migrate current task p to target_cpu */ +int migrate_task_to(struct task_struct *p, int target_cpu) +{ + struct migration_arg arg = { p, target_cpu }; + int curr_cpu = task_cpu(p); + + if (curr_cpu == target_cpu) + return 0; + + if (!cpumask_test_cpu(target_cpu, tsk_cpus_allowed(p))) + return -EINVAL; + + /* TODO: This is not properly updating schedstats */ + + return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg); +} + +/* + * Requeue a task on a given node and accurately track the number of NUMA + * tasks on the runqueues + */ +void sched_setnuma(struct task_struct *p, int nid) +{ + struct rq *rq; + unsigned long flags; + bool on_rq, running; + + rq = task_rq_lock(p, &flags); + on_rq = p->on_rq; + running = task_current(rq, p); + + if (on_rq) + dequeue_task(rq, p, 0); + if (running) + p->sched_class->put_prev_task(rq, p); + + p->numa_preferred_nid = nid; + + if (running) + p->sched_class->set_curr_task(rq); + if (on_rq) + enqueue_task(rq, p, 0); + task_rq_unlock(rq, p, &flags); +} +#endif + /* * migration_cpu_stop - this will be executed by a highprio stopper thread * and performs thread migration by bumping thread off CPU then @@ -5119,6 +5270,7 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu) DEFINE_PER_CPU(struct sched_domain *, sd_llc); DEFINE_PER_CPU(int, sd_llc_size); DEFINE_PER_CPU(int, sd_llc_id); +DEFINE_PER_CPU(struct sched_domain *, sd_numa); static void update_top_cache_domain(int cpu) { @@ -5135,6 +5287,9 @@ static void update_top_cache_domain(int cpu) rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) = size; per_cpu(sd_llc_id, cpu) = id; + + sd = lowest_flag_domain(cpu, SD_NUMA); + rcu_assign_pointer(per_cpu(sd_numa, cpu), sd); } /* @@ -5654,6 +5809,7 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu) | 0*SD_SHARE_PKG_RESOURCES | 1*SD_SERIALIZE | 0*SD_PREFER_SIBLING + | 1*SD_NUMA | sd_local_flags(level) , .last_balance = jiffies, @@ -6335,14 +6491,17 @@ void __init sched_init_smp(void) sched_init_numa(); - get_online_cpus(); + /* + * There's no userspace yet to cause hotplug operations; hence all the + * cpu masks are stable and all blatant races in the below code cannot + * happen. + */ mutex_lock(&sched_domains_mutex); init_sched_domains(cpu_active_mask); cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map); if (cpumask_empty(non_isolated_cpus)) cpumask_set_cpu(smp_processor_id(), non_isolated_cpus); mutex_unlock(&sched_domains_mutex); - put_online_cpus(); hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE); hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE); @@ -6505,6 +6664,7 @@ void __init sched_init(void) rq->online = 0; rq->idle_stamp = 0; rq->avg_idle = 2*sysctl_sched_migration_cost; + rq->max_idle_balance_cost = sysctl_sched_migration_cost; INIT_LIST_HEAD(&rq->cfs_tasks); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 196559994f7c..e6ba5e31c7ca 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "sched.h" @@ -137,6 +138,9 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p) SEQ_printf(m, "%15Ld %15Ld %15Ld.%06ld %15Ld.%06ld %15Ld.%06ld", 0LL, 0LL, 0LL, 0L, 0LL, 0L, 0LL, 0L); #endif +#ifdef CONFIG_NUMA_BALANCING + SEQ_printf(m, " %d", cpu_to_node(task_cpu(p))); +#endif #ifdef CONFIG_CGROUP_SCHED SEQ_printf(m, " %s", task_group_path(task_group(p))); #endif @@ -159,7 +163,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu) read_lock_irqsave(&tasklist_lock, flags); do_each_thread(g, p) { - if (!p->on_rq || task_cpu(p) != rq_cpu) + if (task_cpu(p) != rq_cpu) continue; print_task(m, rq, p); @@ -345,7 +349,7 @@ static void sched_debug_header(struct seq_file *m) cpu_clk = local_clock(); local_irq_restore(flags); - SEQ_printf(m, "Sched Debug Version: v0.10, %s %.*s\n", + SEQ_printf(m, "Sched Debug Version: v0.11, %s %.*s\n", init_utsname()->release, (int)strcspn(init_utsname()->version, " "), init_utsname()->version); @@ -488,6 +492,56 @@ static int __init init_sched_debug_procfs(void) __initcall(init_sched_debug_procfs); +#define __P(F) \ + SEQ_printf(m, "%-45s:%21Ld\n", #F, (long long)F) +#define P(F) \ + SEQ_printf(m, "%-45s:%21Ld\n", #F, (long long)p->F) +#define __PN(F) \ + SEQ_printf(m, "%-45s:%14Ld.%06ld\n", #F, SPLIT_NS((long long)F)) +#define PN(F) \ + SEQ_printf(m, "%-45s:%14Ld.%06ld\n", #F, SPLIT_NS((long long)p->F)) + + +static void sched_show_numa(struct task_struct *p, struct seq_file *m) +{ +#ifdef CONFIG_NUMA_BALANCING + struct mempolicy *pol; + int node, i; + + if (p->mm) + P(mm->numa_scan_seq); + + task_lock(p); + pol = p->mempolicy; + if (pol && !(pol->flags & MPOL_F_MORON)) + pol = NULL; + mpol_get(pol); + task_unlock(p); + + SEQ_printf(m, "numa_migrations, %ld\n", xchg(&p->numa_pages_migrated, 0)); + + for_each_online_node(node) { + for (i = 0; i < 2; i++) { + unsigned long nr_faults = -1; + int cpu_current, home_node; + + if (p->numa_faults) + nr_faults = p->numa_faults[2*node + i]; + + cpu_current = !i ? (task_node(p) == node) : + (pol && node_isset(node, pol->v.nodes)); + + home_node = (p->numa_preferred_nid == node); + + SEQ_printf(m, "numa_faults, %d, %d, %d, %d, %ld\n", + i, node, cpu_current, home_node, nr_faults); + } + } + + mpol_put(pol); +#endif +} + void proc_sched_show_task(struct task_struct *p, struct seq_file *m) { unsigned long nr_switches; @@ -591,6 +645,8 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) SEQ_printf(m, "%-45s:%21Ld\n", "clock-delta", (long long)(t1-t0)); } + + sched_show_numa(p, m); } void proc_sched_set_task(struct task_struct *p) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7c70201fbc61..813dd61a9b43 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -681,6 +681,8 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) } #ifdef CONFIG_SMP +static unsigned long task_h_load(struct task_struct *p); + static inline void __update_task_entity_contrib(struct sched_entity *se); /* Give new task start runnable values to heavy its load in infant time */ @@ -818,11 +820,12 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se) #ifdef CONFIG_NUMA_BALANCING /* - * numa task sample period in ms + * Approximate time to scan a full NUMA task in ms. The task scan period is + * calculated based on the tasks virtual memory size and + * numa_balancing_scan_size. */ -unsigned int sysctl_numa_balancing_scan_period_min = 100; -unsigned int sysctl_numa_balancing_scan_period_max = 100*50; -unsigned int sysctl_numa_balancing_scan_period_reset = 100*600; +unsigned int sysctl_numa_balancing_scan_period_min = 1000; +unsigned int sysctl_numa_balancing_scan_period_max = 60000; /* Portion of address space to scan in MB */ unsigned int sysctl_numa_balancing_scan_size = 256; @@ -830,41 +833,810 @@ unsigned int sysctl_numa_balancing_scan_size = 256; /* Scan @scan_size MB every @scan_period after an initial @scan_delay in ms */ unsigned int sysctl_numa_balancing_scan_delay = 1000; -static void task_numa_placement(struct task_struct *p) +/* + * After skipping a page migration on a shared page, skip N more numa page + * migrations unconditionally. This reduces the number of NUMA migrations + * in shared memory workloads, and has the effect of pulling tasks towards + * where their memory lives, over pulling the memory towards the task. + */ +unsigned int sysctl_numa_balancing_migrate_deferred = 16; + +static unsigned int task_nr_scan_windows(struct task_struct *p) +{ + unsigned long rss = 0; + unsigned long nr_scan_pages; + + /* + * Calculations based on RSS as non-present and empty pages are skipped + * by the PTE scanner and NUMA hinting faults should be trapped based + * on resident pages + */ + nr_scan_pages = sysctl_numa_balancing_scan_size << (20 - PAGE_SHIFT); + rss = get_mm_rss(p->mm); + if (!rss) + rss = nr_scan_pages; + + rss = round_up(rss, nr_scan_pages); + return rss / nr_scan_pages; +} + +/* For sanitys sake, never scan more PTEs than MAX_SCAN_WINDOW MB/sec. */ +#define MAX_SCAN_WINDOW 2560 + +static unsigned int task_scan_min(struct task_struct *p) { - int seq; + unsigned int scan, floor; + unsigned int windows = 1; + + if (sysctl_numa_balancing_scan_size < MAX_SCAN_WINDOW) + windows = MAX_SCAN_WINDOW / sysctl_numa_balancing_scan_size; + floor = 1000 / windows; + + scan = sysctl_numa_balancing_scan_period_min / task_nr_scan_windows(p); + return max_t(unsigned int, floor, scan); +} + +static unsigned int task_scan_max(struct task_struct *p) +{ + unsigned int smin = task_scan_min(p); + unsigned int smax; + + /* Watch for min being lower than max due to floor calculations */ + smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p); + return max(smin, smax); +} + +/* + * Once a preferred node is selected the scheduler balancer will prefer moving + * a task to that node for sysctl_numa_balancing_settle_count number of PTE + * scans. This will give the process the chance to accumulate more faults on + * the preferred node but still allow the scheduler to move the task again if + * the nodes CPUs are overloaded. + */ +unsigned int sysctl_numa_balancing_settle_count __read_mostly = 4; + +static void account_numa_enqueue(struct rq *rq, struct task_struct *p) +{ + rq->nr_numa_running += (p->numa_preferred_nid != -1); + rq->nr_preferred_running += (p->numa_preferred_nid == task_node(p)); +} + +static void account_numa_dequeue(struct rq *rq, struct task_struct *p) +{ + rq->nr_numa_running -= (p->numa_preferred_nid != -1); + rq->nr_preferred_running -= (p->numa_preferred_nid == task_node(p)); +} + +struct numa_group { + atomic_t refcount; + + spinlock_t lock; /* nr_tasks, tasks */ + int nr_tasks; + pid_t gid; + struct list_head task_list; + + struct rcu_head rcu; + unsigned long total_faults; + unsigned long faults[0]; +}; + +pid_t task_numa_group_id(struct task_struct *p) +{ + return p->numa_group ? p->numa_group->gid : 0; +} + +static inline int task_faults_idx(int nid, int priv) +{ + return 2 * nid + priv; +} + +static inline unsigned long task_faults(struct task_struct *p, int nid) +{ + if (!p->numa_faults) + return 0; + + return p->numa_faults[task_faults_idx(nid, 0)] + + p->numa_faults[task_faults_idx(nid, 1)]; +} + +static inline unsigned long group_faults(struct task_struct *p, int nid) +{ + if (!p->numa_group) + return 0; + + return p->numa_group->faults[2*nid] + p->numa_group->faults[2*nid+1]; +} + +/* + * These return the fraction of accesses done by a particular task, or + * task group, on a particular numa node. The group weight is given a + * larger multiplier, in order to group tasks together that are almost + * evenly spread out between numa nodes. + */ +static inline unsigned long task_weight(struct task_struct *p, int nid) +{ + unsigned long total_faults; + + if (!p->numa_faults) + return 0; + + total_faults = p->total_numa_faults; + + if (!total_faults) + return 0; + + return 1000 * task_faults(p, nid) / total_faults; +} + +static inline unsigned long group_weight(struct task_struct *p, int nid) +{ + if (!p->numa_group || !p->numa_group->total_faults) + return 0; + + return 1000 * group_faults(p, nid) / p->numa_group->total_faults; +} + +static unsigned long weighted_cpuload(const int cpu); +static unsigned long source_load(int cpu, int type); +static unsigned long target_load(int cpu, int type); +static unsigned long power_of(int cpu); +static long effective_load(struct task_group *tg, int cpu, long wl, long wg); + +/* Cached statistics for all CPUs within a node */ +struct numa_stats { + unsigned long nr_running; + unsigned long load; + + /* Total compute capacity of CPUs on a node */ + unsigned long power; + + /* Approximate capacity in terms of runnable tasks on a node */ + unsigned long capacity; + int has_capacity; +}; + +/* + * XXX borrowed from update_sg_lb_stats + */ +static void update_numa_stats(struct numa_stats *ns, int nid) +{ + int cpu; + + memset(ns, 0, sizeof(*ns)); + for_each_cpu(cpu, cpumask_of_node(nid)) { + struct rq *rq = cpu_rq(cpu); + + ns->nr_running += rq->nr_running; + ns->load += weighted_cpuload(cpu); + ns->power += power_of(cpu); + } + + ns->load = (ns->load * SCHED_POWER_SCALE) / ns->power; + ns->capacity = DIV_ROUND_CLOSEST(ns->power, SCHED_POWER_SCALE); + ns->has_capacity = (ns->nr_running < ns->capacity); +} + +struct task_numa_env { + struct task_struct *p; + + int src_cpu, src_nid; + int dst_cpu, dst_nid; + + struct numa_stats src_stats, dst_stats; + + int imbalance_pct, idx; + + struct task_struct *best_task; + long best_imp; + int best_cpu; +}; + +static void task_numa_assign(struct task_numa_env *env, + struct task_struct *p, long imp) +{ + if (env->best_task) + put_task_struct(env->best_task); + if (p) + get_task_struct(p); + + env->best_task = p; + env->best_imp = imp; + env->best_cpu = env->dst_cpu; +} + +/* + * This checks if the overall compute and NUMA accesses of the system would + * be improved if the source tasks was migrated to the target dst_cpu taking + * into account that it might be best if task running on the dst_cpu should + * be exchanged with the source task + */ +static void task_numa_compare(struct task_numa_env *env, + long taskimp, long groupimp) +{ + struct rq *src_rq = cpu_rq(env->src_cpu); + struct rq *dst_rq = cpu_rq(env->dst_cpu); + struct task_struct *cur; + long dst_load, src_load; + long load; + long imp = (groupimp > 0) ? groupimp : taskimp; + + rcu_read_lock(); + cur = ACCESS_ONCE(dst_rq->curr); + if (cur->pid == 0) /* idle */ + cur = NULL; + + /* + * "imp" is the fault differential for the source task between the + * source and destination node. Calculate the total differential for + * the source task and potential destination task. The more negative + * the value is, the more rmeote accesses that would be expected to + * be incurred if the tasks were swapped. + */ + if (cur) { + /* Skip this swap candidate if cannot move to the source cpu */ + if (!cpumask_test_cpu(env->src_cpu, tsk_cpus_allowed(cur))) + goto unlock; + + /* + * If dst and source tasks are in the same NUMA group, or not + * in any group then look only at task weights. + */ + if (cur->numa_group == env->p->numa_group) { + imp = taskimp + task_weight(cur, env->src_nid) - + task_weight(cur, env->dst_nid); + /* + * Add some hysteresis to prevent swapping the + * tasks within a group over tiny differences. + */ + if (cur->numa_group) + imp -= imp/16; + } else { + /* + * Compare the group weights. If a task is all by + * itself (not part of a group), use the task weight + * instead. + */ + if (env->p->numa_group) + imp = groupimp; + else + imp = taskimp; + + if (cur->numa_group) + imp += group_weight(cur, env->src_nid) - + group_weight(cur, env->dst_nid); + else + imp += task_weight(cur, env->src_nid) - + task_weight(cur, env->dst_nid); + } + } + + if (imp < env->best_imp) + goto unlock; + + if (!cur) { + /* Is there capacity at our destination? */ + if (env->src_stats.has_capacity && + !env->dst_stats.has_capacity) + goto unlock; + + goto balance; + } + + /* Balance doesn't matter much if we're running a task per cpu */ + if (src_rq->nr_running == 1 && dst_rq->nr_running == 1) + goto assign; + + /* + * In the overloaded case, try and keep the load balanced. + */ +balance: + dst_load = env->dst_stats.load; + src_load = env->src_stats.load; + + /* XXX missing power terms */ + load = task_h_load(env->p); + dst_load += load; + src_load -= load; + + if (cur) { + load = task_h_load(cur); + dst_load -= load; + src_load += load; + } + + /* make src_load the smaller */ + if (dst_load < src_load) + swap(dst_load, src_load); + + if (src_load * env->imbalance_pct < dst_load * 100) + goto unlock; + +assign: + task_numa_assign(env, cur, imp); +unlock: + rcu_read_unlock(); +} + +static void task_numa_find_cpu(struct task_numa_env *env, + long taskimp, long groupimp) +{ + int cpu; + + for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) { + /* Skip this CPU if the source task cannot migrate */ + if (!cpumask_test_cpu(cpu, tsk_cpus_allowed(env->p))) + continue; + + env->dst_cpu = cpu; + task_numa_compare(env, taskimp, groupimp); + } +} + +static int task_numa_migrate(struct task_struct *p) +{ + struct task_numa_env env = { + .p = p, + + .src_cpu = task_cpu(p), + .src_nid = task_node(p), + + .imbalance_pct = 112, + + .best_task = NULL, + .best_imp = 0, + .best_cpu = -1 + }; + struct sched_domain *sd; + unsigned long taskweight, groupweight; + int nid, ret; + long taskimp, groupimp; + + /* + * Pick the lowest SD_NUMA domain, as that would have the smallest + * imbalance and would be the first to start moving tasks about. + * + * And we want to avoid any moving of tasks about, as that would create + * random movement of tasks -- counter the numa conditions we're trying + * to satisfy here. + */ + rcu_read_lock(); + sd = rcu_dereference(per_cpu(sd_numa, env.src_cpu)); + env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2; + rcu_read_unlock(); + + taskweight = task_weight(p, env.src_nid); + groupweight = group_weight(p, env.src_nid); + update_numa_stats(&env.src_stats, env.src_nid); + env.dst_nid = p->numa_preferred_nid; + taskimp = task_weight(p, env.dst_nid) - taskweight; + groupimp = group_weight(p, env.dst_nid) - groupweight; + update_numa_stats(&env.dst_stats, env.dst_nid); + + /* If the preferred nid has capacity, try to use it. */ + if (env.dst_stats.has_capacity) + task_numa_find_cpu(&env, taskimp, groupimp); + + /* No space available on the preferred nid. Look elsewhere. */ + if (env.best_cpu == -1) { + for_each_online_node(nid) { + if (nid == env.src_nid || nid == p->numa_preferred_nid) + continue; + + /* Only consider nodes where both task and groups benefit */ + taskimp = task_weight(p, nid) - taskweight; + groupimp = group_weight(p, nid) - groupweight; + if (taskimp < 0 && groupimp < 0) + continue; + + env.dst_nid = nid; + update_numa_stats(&env.dst_stats, env.dst_nid); + task_numa_find_cpu(&env, taskimp, groupimp); + } + } + + /* No better CPU than the current one was found. */ + if (env.best_cpu == -1) + return -EAGAIN; + + sched_setnuma(p, env.dst_nid); + + /* + * Reset the scan period if the task is being rescheduled on an + * alternative node to recheck if the tasks is now properly placed. + */ + p->numa_scan_period = task_scan_min(p); + + if (env.best_task == NULL) { + int ret = migrate_task_to(p, env.best_cpu); + return ret; + } + + ret = migrate_swap(p, env.best_task); + put_task_struct(env.best_task); + return ret; +} + +/* Attempt to migrate a task to a CPU on the preferred node. */ +static void numa_migrate_preferred(struct task_struct *p) +{ + /* This task has no NUMA fault statistics yet */ + if (unlikely(p->numa_preferred_nid == -1 || !p->numa_faults)) + return; + + /* Periodically retry migrating the task to the preferred node */ + p->numa_migrate_retry = jiffies + HZ; + + /* Success if task is already running on preferred CPU */ + if (cpu_to_node(task_cpu(p)) == p->numa_preferred_nid) + return; + + /* Otherwise, try migrate to a CPU on the preferred node */ + task_numa_migrate(p); +} + +/* + * When adapting the scan rate, the period is divided into NUMA_PERIOD_SLOTS + * increments. The more local the fault statistics are, the higher the scan + * period will be for the next scan window. If local/remote ratio is below + * NUMA_PERIOD_THRESHOLD (where range of ratio is 1..NUMA_PERIOD_SLOTS) the + * scan period will decrease + */ +#define NUMA_PERIOD_SLOTS 10 +#define NUMA_PERIOD_THRESHOLD 3 + +/* + * Increase the scan period (slow down scanning) if the majority of + * our memory is already on our local node, or if the majority of + * the page accesses are shared with other processes. + * Otherwise, decrease the scan period. + */ +static void update_task_scan_period(struct task_struct *p, + unsigned long shared, unsigned long private) +{ + unsigned int period_slot; + int ratio; + int diff; + + unsigned long remote = p->numa_faults_locality[0]; + unsigned long local = p->numa_faults_locality[1]; + + /* + * If there were no record hinting faults then either the task is + * completely idle or all activity is areas that are not of interest + * to automatic numa balancing. Scan slower + */ + if (local + shared == 0) { + p->numa_scan_period = min(p->numa_scan_period_max, + p->numa_scan_period << 1); + + p->mm->numa_next_scan = jiffies + + msecs_to_jiffies(p->numa_scan_period); - if (!p->mm) /* for example, ksmd faulting in a user's mm */ return; + } + + /* + * Prepare to scale scan period relative to the current period. + * == NUMA_PERIOD_THRESHOLD scan period stays the same + * < NUMA_PERIOD_THRESHOLD scan period decreases (scan faster) + * >= NUMA_PERIOD_THRESHOLD scan period increases (scan slower) + */ + period_slot = DIV_ROUND_UP(p->numa_scan_period, NUMA_PERIOD_SLOTS); + ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote); + if (ratio >= NUMA_PERIOD_THRESHOLD) { + int slot = ratio - NUMA_PERIOD_THRESHOLD; + if (!slot) + slot = 1; + diff = slot * period_slot; + } else { + diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; + + /* + * Scale scan rate increases based on sharing. There is an + * inverse relationship between the degree of sharing and + * the adjustment made to the scanning period. Broadly + * speaking the intent is that there is little point + * scanning faster if shared accesses dominate as it may + * simply bounce migrations uselessly + */ + period_slot = DIV_ROUND_UP(diff, NUMA_PERIOD_SLOTS); + ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared)); + diff = (diff * ratio) / NUMA_PERIOD_SLOTS; + } + + p->numa_scan_period = clamp(p->numa_scan_period + diff, + task_scan_min(p), task_scan_max(p)); + memset(p->numa_faults_locality, 0, sizeof(p->numa_faults_locality)); +} + +static void task_numa_placement(struct task_struct *p) +{ + int seq, nid, max_nid = -1, max_group_nid = -1; + unsigned long max_faults = 0, max_group_faults = 0; + unsigned long fault_types[2] = { 0, 0 }; + spinlock_t *group_lock = NULL; + seq = ACCESS_ONCE(p->mm->numa_scan_seq); if (p->numa_scan_seq == seq) return; p->numa_scan_seq = seq; + p->numa_scan_period_max = task_scan_max(p); + + /* If the task is part of a group prevent parallel updates to group stats */ + if (p->numa_group) { + group_lock = &p->numa_group->lock; + spin_lock(group_lock); + } + + /* Find the node with the highest number of faults */ + for_each_online_node(nid) { + unsigned long faults = 0, group_faults = 0; + int priv, i; + + for (priv = 0; priv < 2; priv++) { + long diff; + + i = task_faults_idx(nid, priv); + diff = -p->numa_faults[i]; + + /* Decay existing window, copy faults since last scan */ + p->numa_faults[i] >>= 1; + p->numa_faults[i] += p->numa_faults_buffer[i]; + fault_types[priv] += p->numa_faults_buffer[i]; + p->numa_faults_buffer[i] = 0; + + faults += p->numa_faults[i]; + diff += p->numa_faults[i]; + p->total_numa_faults += diff; + if (p->numa_group) { + /* safe because we can only change our own group */ + p->numa_group->faults[i] += diff; + p->numa_group->total_faults += diff; + group_faults += p->numa_group->faults[i]; + } + } + + if (faults > max_faults) { + max_faults = faults; + max_nid = nid; + } + + if (group_faults > max_group_faults) { + max_group_faults = group_faults; + max_group_nid = nid; + } + } + + update_task_scan_period(p, fault_types[0], fault_types[1]); + + if (p->numa_group) { + /* + * If the preferred task and group nids are different, + * iterate over the nodes again to find the best place. + */ + if (max_nid != max_group_nid) { + unsigned long weight, max_weight = 0; + + for_each_online_node(nid) { + weight = task_weight(p, nid) + group_weight(p, nid); + if (weight > max_weight) { + max_weight = weight; + max_nid = nid; + } + } + } + + spin_unlock(group_lock); + } + + /* Preferred node as the node with the most faults */ + if (max_faults && max_nid != p->numa_preferred_nid) { + /* Update the preferred nid and migrate task if possible */ + sched_setnuma(p, max_nid); + numa_migrate_preferred(p); + } +} + +static inline int get_numa_group(struct numa_group *grp) +{ + return atomic_inc_not_zero(&grp->refcount); +} + +static inline void put_numa_group(struct numa_group *grp) +{ + if (atomic_dec_and_test(&grp->refcount)) + kfree_rcu(grp, rcu); +} + +static void task_numa_group(struct task_struct *p, int cpupid, int flags, + int *priv) +{ + struct numa_group *grp, *my_grp; + struct task_struct *tsk; + bool join = false; + int cpu = cpupid_to_cpu(cpupid); + int i; + + if (unlikely(!p->numa_group)) { + unsigned int size = sizeof(struct numa_group) + + 2*nr_node_ids*sizeof(unsigned long); + + grp = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); + if (!grp) + return; + + atomic_set(&grp->refcount, 1); + spin_lock_init(&grp->lock); + INIT_LIST_HEAD(&grp->task_list); + grp->gid = p->pid; + + for (i = 0; i < 2*nr_node_ids; i++) + grp->faults[i] = p->numa_faults[i]; + + grp->total_faults = p->total_numa_faults; + + list_add(&p->numa_entry, &grp->task_list); + grp->nr_tasks++; + rcu_assign_pointer(p->numa_group, grp); + } + + rcu_read_lock(); + tsk = ACCESS_ONCE(cpu_rq(cpu)->curr); + + if (!cpupid_match_pid(tsk, cpupid)) + goto no_join; - /* FIXME: Scheduling placement policy hints go here */ + grp = rcu_dereference(tsk->numa_group); + if (!grp) + goto no_join; + + my_grp = p->numa_group; + if (grp == my_grp) + goto no_join; + + /* + * Only join the other group if its bigger; if we're the bigger group, + * the other task will join us. + */ + if (my_grp->nr_tasks > grp->nr_tasks) + goto no_join; + + /* + * Tie-break on the grp address. + */ + if (my_grp->nr_tasks == grp->nr_tasks && my_grp > grp) + goto no_join; + + /* Always join threads in the same process. */ + if (tsk->mm == current->mm) + join = true; + + /* Simple filter to avoid false positives due to PID collisions */ + if (flags & TNF_SHARED) + join = true; + + /* Update priv based on whether false sharing was detected */ + *priv = !join; + + if (join && !get_numa_group(grp)) + goto no_join; + + rcu_read_unlock(); + + if (!join) + return; + + double_lock(&my_grp->lock, &grp->lock); + + for (i = 0; i < 2*nr_node_ids; i++) { + my_grp->faults[i] -= p->numa_faults[i]; + grp->faults[i] += p->numa_faults[i]; + } + my_grp->total_faults -= p->total_numa_faults; + grp->total_faults += p->total_numa_faults; + + list_move(&p->numa_entry, &grp->task_list); + my_grp->nr_tasks--; + grp->nr_tasks++; + + spin_unlock(&my_grp->lock); + spin_unlock(&grp->lock); + + rcu_assign_pointer(p->numa_group, grp); + + put_numa_group(my_grp); + return; + +no_join: + rcu_read_unlock(); + return; +} + +void task_numa_free(struct task_struct *p) +{ + struct numa_group *grp = p->numa_group; + int i; + void *numa_faults = p->numa_faults; + + if (grp) { + spin_lock(&grp->lock); + for (i = 0; i < 2*nr_node_ids; i++) + grp->faults[i] -= p->numa_faults[i]; + grp->total_faults -= p->total_numa_faults; + + list_del(&p->numa_entry); + grp->nr_tasks--; + spin_unlock(&grp->lock); + rcu_assign_pointer(p->numa_group, NULL); + put_numa_group(grp); + } + + p->numa_faults = NULL; + p->numa_faults_buffer = NULL; + kfree(numa_faults); } /* * Got a PROT_NONE fault for a page on @node. */ -void task_numa_fault(int node, int pages, bool migrated) +void task_numa_fault(int last_cpupid, int node, int pages, int flags) { struct task_struct *p = current; + bool migrated = flags & TNF_MIGRATED; + int priv; if (!numabalancing_enabled) return; - /* FIXME: Allocate task-specific structure for placement policy here */ + /* for example, ksmd faulting in a user's mm */ + if (!p->mm) + return; + + /* Do not worry about placement if exiting */ + if (p->state == TASK_DEAD) + return; + + /* Allocate buffer to track faults on a per-node basis */ + if (unlikely(!p->numa_faults)) { + int size = sizeof(*p->numa_faults) * 2 * nr_node_ids; + + /* numa_faults and numa_faults_buffer share the allocation */ + p->numa_faults = kzalloc(size * 2, GFP_KERNEL|__GFP_NOWARN); + if (!p->numa_faults) + return; + + BUG_ON(p->numa_faults_buffer); + p->numa_faults_buffer = p->numa_faults + (2 * nr_node_ids); + p->total_numa_faults = 0; + memset(p->numa_faults_locality, 0, sizeof(p->numa_faults_locality)); + } /* - * If pages are properly placed (did not migrate) then scan slower. - * This is reset periodically in case of phase changes + * First accesses are treated as private, otherwise consider accesses + * to be private if the accessing pid has not changed */ - if (!migrated) - p->numa_scan_period = min(sysctl_numa_balancing_scan_period_max, - p->numa_scan_period + jiffies_to_msecs(10)); + if (unlikely(last_cpupid == (-1 & LAST_CPUPID_MASK))) { + priv = 1; + } else { + priv = cpupid_match_pid(p, last_cpupid); + if (!priv && !(flags & TNF_NO_GROUP)) + task_numa_group(p, last_cpupid, flags, &priv); + } task_numa_placement(p); + + /* + * Retry task to preferred node migration periodically, in case it + * case it previously failed, or the scheduler moved us. + */ + if (time_after(jiffies, p->numa_migrate_retry)) + numa_migrate_preferred(p); + + if (migrated) + p->numa_pages_migrated += pages; + + p->numa_faults_buffer[task_faults_idx(node, priv)] += pages; + p->numa_faults_locality[!!(flags & TNF_FAULT_LOCAL)] += pages; } static void reset_ptenuma_scan(struct task_struct *p) @@ -884,6 +1656,7 @@ void task_numa_work(struct callback_head *work) struct mm_struct *mm = p->mm; struct vm_area_struct *vma; unsigned long start, end; + unsigned long nr_pte_updates = 0; long pages; WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work)); @@ -900,35 +1673,9 @@ void task_numa_work(struct callback_head *work) if (p->flags & PF_EXITING) return; - /* - * We do not care about task placement until a task runs on a node - * other than the first one used by the address space. This is - * largely because migrations are driven by what CPU the task - * is running on. If it's never scheduled on another node, it'll - * not migrate so why bother trapping the fault. - */ - if (mm->first_nid == NUMA_PTE_SCAN_INIT) - mm->first_nid = numa_node_id(); - if (mm->first_nid != NUMA_PTE_SCAN_ACTIVE) { - /* Are we running on a new node yet? */ - if (numa_node_id() == mm->first_nid && - !sched_feat_numa(NUMA_FORCE)) - return; - - mm->first_nid = NUMA_PTE_SCAN_ACTIVE; - } - - /* - * Reset the scan period if enough time has gone by. Objective is that - * scanning will be reduced if pages are properly placed. As tasks - * can enter different phases this needs to be re-examined. Lacking - * proper tracking of reference behaviour, this blunt hammer is used. - */ - migrate = mm->numa_next_reset; - if (time_after(now, migrate)) { - p->numa_scan_period = sysctl_numa_balancing_scan_period_min; - next_scan = now + msecs_to_jiffies(sysctl_numa_balancing_scan_period_reset); - xchg(&mm->numa_next_reset, next_scan); + if (!mm->numa_next_scan) { + mm->numa_next_scan = now + + msecs_to_jiffies(sysctl_numa_balancing_scan_delay); } /* @@ -938,20 +1685,20 @@ void task_numa_work(struct callback_head *work) if (time_before(now, migrate)) return; - if (p->numa_scan_period == 0) - p->numa_scan_period = sysctl_numa_balancing_scan_period_min; + if (p->numa_scan_period == 0) { + p->numa_scan_period_max = task_scan_max(p); + p->numa_scan_period = task_scan_min(p); + } next_scan = now + msecs_to_jiffies(p->numa_scan_period); if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate) return; /* - * Do not set pte_numa if the current running node is rate-limited. - * This loses statistics on the fault but if we are unwilling to - * migrate to this node, it is less likely we can do useful work + * Delay this task enough that another task of this mm will likely win + * the next time around. */ - if (migrate_ratelimited(numa_node_id())) - return; + p->node_stamp += 2 * TICK_NSEC; start = mm->numa_scan_offset; pages = sysctl_numa_balancing_scan_size; @@ -967,18 +1714,32 @@ void task_numa_work(struct callback_head *work) vma = mm->mmap; } for (; vma; vma = vma->vm_next) { - if (!vma_migratable(vma)) + if (!vma_migratable(vma) || !vma_policy_mof(p, vma)) continue; - /* Skip small VMAs. They are not likely to be of relevance */ - if (vma->vm_end - vma->vm_start < HPAGE_SIZE) + /* + * Shared library pages mapped by multiple processes are not + * migrated as it is expected they are cache replicated. Avoid + * hinting faults in read-only file-backed mappings or the vdso + * as migrating the pages will be of marginal benefit. + */ + if (!vma->vm_mm || + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ))) continue; do { start = max(start, vma->vm_start); end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE); end = min(end, vma->vm_end); - pages -= change_prot_numa(vma, start, end); + nr_pte_updates += change_prot_numa(vma, start, end); + + /* + * Scan sysctl_numa_balancing_scan_size but ensure that + * at least one PTE is updated so that unused virtual + * address space is quickly skipped. + */ + if (nr_pte_updates) + pages -= (end - start) >> PAGE_SHIFT; start = end; if (pages <= 0) @@ -988,10 +1749,10 @@ void task_numa_work(struct callback_head *work) out: /* - * It is possible to reach the end of the VMA list but the last few VMAs are - * not guaranteed to the vma_migratable. If they are not, we would find the - * !migratable VMA on the next scan but not reset the scanner to the start - * so check it now. + * It is possible to reach the end of the VMA list but the last few + * VMAs are not guaranteed to the vma_migratable. If they are not, we + * would find the !migratable VMA on the next scan but not reset the + * scanner to the start so check it now. */ if (vma) mm->numa_scan_offset = start; @@ -1025,8 +1786,8 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr) if (now - curr->node_stamp > period) { if (!curr->node_stamp) - curr->numa_scan_period = sysctl_numa_balancing_scan_period_min; - curr->node_stamp = now; + curr->numa_scan_period = task_scan_min(curr); + curr->node_stamp += period; if (!time_before(jiffies, curr->mm->numa_next_scan)) { init_task_work(work, task_numa_work); /* TODO: move this into sched_fork() */ @@ -1038,6 +1799,14 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr) static void task_tick_numa(struct rq *rq, struct task_struct *curr) { } + +static inline void account_numa_enqueue(struct rq *rq, struct task_struct *p) +{ +} + +static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p) +{ +} #endif /* CONFIG_NUMA_BALANCING */ static void @@ -1047,8 +1816,12 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) if (!parent_entity(se)) update_load_add(&rq_of(cfs_rq)->load, se->load.weight); #ifdef CONFIG_SMP - if (entity_is_task(se)) - list_add(&se->group_node, &rq_of(cfs_rq)->cfs_tasks); + if (entity_is_task(se)) { + struct rq *rq = rq_of(cfs_rq); + + account_numa_enqueue(rq, task_of(se)); + list_add(&se->group_node, &rq->cfs_tasks); + } #endif cfs_rq->nr_running++; } @@ -1059,8 +1832,10 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) update_load_sub(&cfs_rq->load, se->load.weight); if (!parent_entity(se)) update_load_sub(&rq_of(cfs_rq)->load, se->load.weight); - if (entity_is_task(se)) + if (entity_is_task(se)) { + account_numa_dequeue(rq_of(cfs_rq), task_of(se)); list_del_init(&se->group_node); + } cfs_rq->nr_running--; } @@ -3113,7 +3888,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg) { struct sched_entity *se = tg->se[cpu]; - if (!tg->parent) /* the trivial, non-cgroup case */ + if (!tg->parent || !wl) /* the trivial, non-cgroup case */ return wl; for_each_sched_entity(se) { @@ -3166,8 +3941,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg) } #else -static inline unsigned long effective_load(struct task_group *tg, int cpu, - unsigned long wl, unsigned long wg) +static long effective_load(struct task_group *tg, int cpu, long wl, long wg) { return wl; } @@ -3420,11 +4194,10 @@ done: * preempt must be disabled. */ static int -select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) +select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags) { struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL; int cpu = smp_processor_id(); - int prev_cpu = task_cpu(p); int new_cpu = cpu; int want_affine = 0; int sync = wake_flags & WF_SYNC; @@ -3904,9 +4677,12 @@ static bool yield_to_task_fair(struct rq *rq, struct task_struct *p, bool preemp static unsigned long __read_mostly max_load_balance_interval = HZ/10; +enum fbq_type { regular, remote, all }; + #define LBF_ALL_PINNED 0x01 #define LBF_NEED_BREAK 0x02 -#define LBF_SOME_PINNED 0x04 +#define LBF_DST_PINNED 0x04 +#define LBF_SOME_PINNED 0x08 struct lb_env { struct sched_domain *sd; @@ -3929,6 +4705,8 @@ struct lb_env { unsigned int loop; unsigned int loop_break; unsigned int loop_max; + + enum fbq_type fbq_type; }; /* @@ -3975,6 +4753,78 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd) return delta < (s64)sysctl_sched_migration_cost; } +#ifdef CONFIG_NUMA_BALANCING +/* Returns true if the destination node has incurred more faults */ +static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env) +{ + int src_nid, dst_nid; + + if (!sched_feat(NUMA_FAVOUR_HIGHER) || !p->numa_faults || + !(env->sd->flags & SD_NUMA)) { + return false; + } + + src_nid = cpu_to_node(env->src_cpu); + dst_nid = cpu_to_node(env->dst_cpu); + + if (src_nid == dst_nid) + return false; + + /* Always encourage migration to the preferred node. */ + if (dst_nid == p->numa_preferred_nid) + return true; + + /* If both task and group weight improve, this move is a winner. */ + if (task_weight(p, dst_nid) > task_weight(p, src_nid) && + group_weight(p, dst_nid) > group_weight(p, src_nid)) + return true; + + return false; +} + + +static bool migrate_degrades_locality(struct task_struct *p, struct lb_env *env) +{ + int src_nid, dst_nid; + + if (!sched_feat(NUMA) || !sched_feat(NUMA_RESIST_LOWER)) + return false; + + if (!p->numa_faults || !(env->sd->flags & SD_NUMA)) + return false; + + src_nid = cpu_to_node(env->src_cpu); + dst_nid = cpu_to_node(env->dst_cpu); + + if (src_nid == dst_nid) + return false; + + /* Migrating away from the preferred node is always bad. */ + if (src_nid == p->numa_preferred_nid) + return true; + + /* If either task or group weight get worse, don't do it. */ + if (task_weight(p, dst_nid) < task_weight(p, src_nid) || + group_weight(p, dst_nid) < group_weight(p, src_nid)) + return true; + + return false; +} + +#else +static inline bool migrate_improves_locality(struct task_struct *p, + struct lb_env *env) +{ + return false; +} + +static inline bool migrate_degrades_locality(struct task_struct *p, + struct lb_env *env) +{ + return false; +} +#endif + /* * can_migrate_task - may task p from runqueue rq be migrated to this_cpu? */ @@ -3997,6 +4847,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) schedstat_inc(p, se.statistics.nr_failed_migrations_affine); + env->flags |= LBF_SOME_PINNED; + /* * Remember if this task can be migrated to any other cpu in * our sched_group. We may want to revisit it if we couldn't @@ -4005,13 +4857,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) * Also avoid computing new_dst_cpu if we have already computed * one in current iteration. */ - if (!env->dst_grpmask || (env->flags & LBF_SOME_PINNED)) + if (!env->dst_grpmask || (env->flags & LBF_DST_PINNED)) return 0; /* Prevent to re-select dst_cpu via env's cpus */ for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) { if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) { - env->flags |= LBF_SOME_PINNED; + env->flags |= LBF_DST_PINNED; env->new_dst_cpu = cpu; break; } @@ -4030,11 +4882,24 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) /* * Aggressive migration if: - * 1) task is cache cold, or - * 2) too many balance attempts have failed. + * 1) destination numa is preferred + * 2) task is cache cold, or + * 3) too many balance attempts have failed. */ - tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd); + if (!tsk_cache_hot) + tsk_cache_hot = migrate_degrades_locality(p, env); + + if (migrate_improves_locality(p, env)) { +#ifdef CONFIG_SCHEDSTATS + if (tsk_cache_hot) { + schedstat_inc(env->sd, lb_hot_gained[env->idle]); + schedstat_inc(p, se.statistics.nr_forced_migrations); + } +#endif + return 1; + } + if (!tsk_cache_hot || env->sd->nr_balance_failed > env->sd->cache_nice_tries) { @@ -4077,8 +4942,6 @@ static int move_one_task(struct lb_env *env) return 0; } -static unsigned long task_h_load(struct task_struct *p); - static const unsigned int sched_nr_migrate_break = 32; /* @@ -4291,6 +5154,10 @@ struct sg_lb_stats { unsigned int group_weight; int group_imb; /* Is there an imbalance in the group ? */ int group_has_capacity; /* Is there extra capacity in the group? */ +#ifdef CONFIG_NUMA_BALANCING + unsigned int nr_numa_running; + unsigned int nr_preferred_running; +#endif }; /* @@ -4330,7 +5197,7 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) /** * get_sd_load_idx - Obtain the load index for a given sched domain. * @sd: The sched_domain whose load_idx is to be obtained. - * @idle: The Idle status of the CPU for whose sd load_icx is obtained. + * @idle: The idle status of the CPU for whose sd load_idx is obtained. * * Return: The load index. */ @@ -4447,7 +5314,7 @@ void update_group_power(struct sched_domain *sd, int cpu) { struct sched_domain *child = sd->child; struct sched_group *group, *sdg = sd->groups; - unsigned long power; + unsigned long power, power_orig; unsigned long interval; interval = msecs_to_jiffies(sd->balance_interval); @@ -4459,7 +5326,7 @@ void update_group_power(struct sched_domain *sd, int cpu) return; } - power = 0; + power_orig = power = 0; if (child->flags & SD_OVERLAP) { /* @@ -4467,8 +5334,12 @@ void update_group_power(struct sched_domain *sd, int cpu) * span the current group. */ - for_each_cpu(cpu, sched_group_cpus(sdg)) - power += power_of(cpu); + for_each_cpu(cpu, sched_group_cpus(sdg)) { + struct sched_group *sg = cpu_rq(cpu)->sd->groups; + + power_orig += sg->sgp->power_orig; + power += sg->sgp->power; + } } else { /* * !SD_OVERLAP domains can assume that child groups @@ -4477,12 +5348,14 @@ void update_group_power(struct sched_domain *sd, int cpu) group = child->groups; do { + power_orig += group->sgp->power_orig; power += group->sgp->power; group = group->next; } while (group != child->groups); } - sdg->sgp->power_orig = sdg->sgp->power = power; + sdg->sgp->power_orig = power_orig; + sdg->sgp->power = power; } /* @@ -4526,13 +5399,12 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group) * cpu 3 and leave one of the cpus in the second group unused. * * The current solution to this issue is detecting the skew in the first group - * by noticing it has a cpu that is overloaded while the remaining cpus are - * idle -- or rather, there's a distinct imbalance in the cpus; see - * sg_imbalanced(). + * by noticing the lower domain failed to reach balance and had difficulty + * moving tasks due to affinity constraints. * * When this is so detected; this group becomes a candidate for busiest; see - * update_sd_pick_busiest(). And calculcate_imbalance() and - * find_busiest_group() avoid some of the usual balance conditional to allow it + * update_sd_pick_busiest(). And calculate_imbalance() and + * find_busiest_group() avoid some of the usual balance conditions to allow it * to create an effective group imbalance. * * This is a somewhat tricky proposition since the next run might not find the @@ -4540,49 +5412,36 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group) * subtle and fragile situation. */ -struct sg_imb_stats { - unsigned long max_nr_running, min_nr_running; - unsigned long max_cpu_load, min_cpu_load; -}; - -static inline void init_sg_imb_stats(struct sg_imb_stats *sgi) +static inline int sg_imbalanced(struct sched_group *group) { - sgi->max_cpu_load = sgi->max_nr_running = 0UL; - sgi->min_cpu_load = sgi->min_nr_running = ~0UL; + return group->sgp->imbalance; } -static inline void -update_sg_imb_stats(struct sg_imb_stats *sgi, - unsigned long load, unsigned long nr_running) +/* + * Compute the group capacity. + * + * Avoid the issue where N*frac(smt_power) >= 1 creates 'phantom' cores by + * first dividing out the smt factor and computing the actual number of cores + * and limit power unit capacity with that. + */ +static inline int sg_capacity(struct lb_env *env, struct sched_group *group) { - if (load > sgi->max_cpu_load) - sgi->max_cpu_load = load; - if (sgi->min_cpu_load > load) - sgi->min_cpu_load = load; + unsigned int capacity, smt, cpus; + unsigned int power, power_orig; - if (nr_running > sgi->max_nr_running) - sgi->max_nr_running = nr_running; - if (sgi->min_nr_running > nr_running) - sgi->min_nr_running = nr_running; -} + power = group->sgp->power; + power_orig = group->sgp->power_orig; + cpus = group->group_weight; -static inline int -sg_imbalanced(struct sg_lb_stats *sgs, struct sg_imb_stats *sgi) -{ - /* - * Consider the group unbalanced when the imbalance is larger - * than the average weight of a task. - * - * APZ: with cgroup the avg task weight can vary wildly and - * might not be a suitable number - should we keep a - * normalized nr_running number somewhere that negates - * the hierarchy? - */ - if ((sgi->max_cpu_load - sgi->min_cpu_load) >= sgs->load_per_task && - (sgi->max_nr_running - sgi->min_nr_running) > 1) - return 1; + /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */ + smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig); + capacity = cpus / smt; /* cores */ - return 0; + capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE)); + if (!capacity) + capacity = fix_small_capacity(env->sd, group); + + return capacity; } /** @@ -4597,12 +5456,11 @@ static inline void update_sg_lb_stats(struct lb_env *env, struct sched_group *group, int load_idx, int local_group, struct sg_lb_stats *sgs) { - struct sg_imb_stats sgi; unsigned long nr_running; unsigned long load; int i; - init_sg_imb_stats(&sgi); + memset(sgs, 0, sizeof(*sgs)); for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { struct rq *rq = cpu_rq(i); @@ -4610,24 +5468,22 @@ static inline void update_sg_lb_stats(struct lb_env *env, nr_running = rq->nr_running; /* Bias balancing toward cpus of our domain */ - if (local_group) { + if (local_group) load = target_load(i, load_idx); - } else { + else load = source_load(i, load_idx); - update_sg_imb_stats(&sgi, load, nr_running); - } sgs->group_load += load; sgs->sum_nr_running += nr_running; +#ifdef CONFIG_NUMA_BALANCING + sgs->nr_numa_running += rq->nr_numa_running; + sgs->nr_preferred_running += rq->nr_preferred_running; +#endif sgs->sum_weighted_load += weighted_cpuload(i); if (idle_cpu(i)) sgs->idle_cpus++; } - if (local_group && (env->idle != CPU_NEWLY_IDLE || - time_after_eq(jiffies, group->sgp->next_update))) - update_group_power(env->sd, env->dst_cpu); - /* Adjust by relative CPU power of the group */ sgs->group_power = group->sgp->power; sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / sgs->group_power; @@ -4635,16 +5491,11 @@ static inline void update_sg_lb_stats(struct lb_env *env, if (sgs->sum_nr_running) sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running; - sgs->group_imb = sg_imbalanced(sgs, &sgi); - - sgs->group_capacity = - DIV_ROUND_CLOSEST(sgs->group_power, SCHED_POWER_SCALE); - - if (!sgs->group_capacity) - sgs->group_capacity = fix_small_capacity(env->sd, group); - sgs->group_weight = group->group_weight; + sgs->group_imb = sg_imbalanced(group); + sgs->group_capacity = sg_capacity(env, group); + if (sgs->group_capacity > sgs->sum_nr_running) sgs->group_has_capacity = 1; } @@ -4693,14 +5544,42 @@ static bool update_sd_pick_busiest(struct lb_env *env, return false; } +#ifdef CONFIG_NUMA_BALANCING +static inline enum fbq_type fbq_classify_group(struct sg_lb_stats *sgs) +{ + if (sgs->sum_nr_running > sgs->nr_numa_running) + return regular; + if (sgs->sum_nr_running > sgs->nr_preferred_running) + return remote; + return all; +} + +static inline enum fbq_type fbq_classify_rq(struct rq *rq) +{ + if (rq->nr_running > rq->nr_numa_running) + return regular; + if (rq->nr_running > rq->nr_preferred_running) + return remote; + return all; +} +#else +static inline enum fbq_type fbq_classify_group(struct sg_lb_stats *sgs) +{ + return all; +} + +static inline enum fbq_type fbq_classify_rq(struct rq *rq) +{ + return regular; +} +#endif /* CONFIG_NUMA_BALANCING */ + /** * update_sd_lb_stats - Update sched_domain's statistics for load balancing. * @env: The load balancing environment. - * @balance: Should we balance. * @sds: variable to hold the statistics for this sched_domain. */ -static inline void update_sd_lb_stats(struct lb_env *env, - struct sd_lb_stats *sds) +static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds) { struct sched_domain *child = env->sd->child; struct sched_group *sg = env->sd->groups; @@ -4720,11 +5599,17 @@ static inline void update_sd_lb_stats(struct lb_env *env, if (local_group) { sds->local = sg; sgs = &sds->local_stat; + + if (env->idle != CPU_NEWLY_IDLE || + time_after_eq(jiffies, sg->sgp->next_update)) + update_group_power(env->sd, env->dst_cpu); } - memset(sgs, 0, sizeof(*sgs)); update_sg_lb_stats(env, sg, load_idx, local_group, sgs); + if (local_group) + goto next_group; + /* * In case the child domain prefers tasks go to siblings * first, lower the sg capacity to one so that we'll try @@ -4735,21 +5620,25 @@ static inline void update_sd_lb_stats(struct lb_env *env, * heaviest group when it is already under-utilized (possible * with a large weight task outweighs the tasks on the system). */ - if (prefer_sibling && !local_group && - sds->local && sds->local_stat.group_has_capacity) + if (prefer_sibling && sds->local && + sds->local_stat.group_has_capacity) sgs->group_capacity = min(sgs->group_capacity, 1U); - /* Now, start updating sd_lb_stats */ - sds->total_load += sgs->group_load; - sds->total_pwr += sgs->group_power; - - if (!local_group && update_sd_pick_busiest(env, sds, sg, sgs)) { + if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; sds->busiest_stat = *sgs; } +next_group: + /* Now, start updating sd_lb_stats */ + sds->total_load += sgs->group_load; + sds->total_pwr += sgs->group_power; + sg = sg->next; } while (sg != env->sd->groups); + + if (env->sd->flags & SD_NUMA) + env->fbq_type = fbq_classify_group(&sds->busiest_stat); } /** @@ -5053,15 +5942,39 @@ static struct rq *find_busiest_queue(struct lb_env *env, int i; for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { - unsigned long power = power_of(i); - unsigned long capacity = DIV_ROUND_CLOSEST(power, - SCHED_POWER_SCALE); - unsigned long wl; + unsigned long power, capacity, wl; + enum fbq_type rt; + + rq = cpu_rq(i); + rt = fbq_classify_rq(rq); + /* + * We classify groups/runqueues into three groups: + * - regular: there are !numa tasks + * - remote: there are numa tasks that run on the 'wrong' node + * - all: there is no distinction + * + * In order to avoid migrating ideally placed numa tasks, + * ignore those when there's better options. + * + * If we ignore the actual busiest queue to migrate another + * task, the next balance pass can still reduce the busiest + * queue by moving tasks around inside the node. + * + * If we cannot move enough load due to this classification + * the next pass will adjust the group classification and + * allow migration of more tasks. + * + * Both cases only affect the total convergence complexity. + */ + if (rt > env->fbq_type) + continue; + + power = power_of(i); + capacity = DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE); if (!capacity) capacity = fix_small_capacity(env->sd, group); - rq = cpu_rq(i); wl = weighted_cpuload(i); /* @@ -5164,6 +6077,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, int *continue_balancing) { int ld_moved, cur_ld_moved, active_balance = 0; + struct sched_domain *sd_parent = sd->parent; struct sched_group *group; struct rq *busiest; unsigned long flags; @@ -5177,6 +6091,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, .idle = idle, .loop_break = sched_nr_migrate_break, .cpus = cpus, + .fbq_type = all, }; /* @@ -5268,17 +6183,17 @@ more_balance: * moreover subsequent load balance cycles should correct the * excess load moved. */ - if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) { + if ((env.flags & LBF_DST_PINNED) && env.imbalance > 0) { + + /* Prevent to re-select dst_cpu via env's cpus */ + cpumask_clear_cpu(env.dst_cpu, env.cpus); env.dst_rq = cpu_rq(env.new_dst_cpu); env.dst_cpu = env.new_dst_cpu; - env.flags &= ~LBF_SOME_PINNED; + env.flags &= ~LBF_DST_PINNED; env.loop = 0; env.loop_break = sched_nr_migrate_break; - /* Prevent to re-select dst_cpu via env's cpus */ - cpumask_clear_cpu(env.dst_cpu, env.cpus); - /* * Go back to "more_balance" rather than "redo" since we * need to continue with same src_cpu. @@ -5286,6 +6201,18 @@ more_balance: goto more_balance; } + /* + * We failed to reach balance because of affinity. + */ + if (sd_parent) { + int *group_imbalance = &sd_parent->groups->sgp->imbalance; + + if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) { + *group_imbalance = 1; + } else if (*group_imbalance) + *group_imbalance = 0; + } + /* All tasks on this runqueue were pinned by CPU affinity */ if (unlikely(env.flags & LBF_ALL_PINNED)) { cpumask_clear_cpu(cpu_of(busiest), cpus); @@ -5393,6 +6320,7 @@ void idle_balance(int this_cpu, struct rq *this_rq) struct sched_domain *sd; int pulled_task = 0; unsigned long next_balance = jiffies + HZ; + u64 curr_cost = 0; this_rq->idle_stamp = rq_clock(this_rq); @@ -5409,15 +6337,27 @@ void idle_balance(int this_cpu, struct rq *this_rq) for_each_domain(this_cpu, sd) { unsigned long interval; int continue_balancing = 1; + u64 t0, domain_cost; if (!(sd->flags & SD_LOAD_BALANCE)) continue; + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) + break; + if (sd->flags & SD_BALANCE_NEWIDLE) { + t0 = sched_clock_cpu(this_cpu); + /* If we've pulled tasks over stop searching: */ pulled_task = load_balance(this_cpu, this_rq, sd, CPU_NEWLY_IDLE, &continue_balancing); + + domain_cost = sched_clock_cpu(this_cpu) - t0; + if (domain_cost > sd->max_newidle_lb_cost) + sd->max_newidle_lb_cost = domain_cost; + + curr_cost += domain_cost; } interval = msecs_to_jiffies(sd->balance_interval); @@ -5439,6 +6379,9 @@ void idle_balance(int this_cpu, struct rq *this_rq) */ this_rq->next_balance = next_balance; } + + if (curr_cost > this_rq->max_idle_balance_cost) + this_rq->max_idle_balance_cost = curr_cost; } /* @@ -5662,15 +6605,39 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle) /* Earliest time when we have to do rebalance again */ unsigned long next_balance = jiffies + 60*HZ; int update_next_balance = 0; - int need_serialize; + int need_serialize, need_decay = 0; + u64 max_cost = 0; update_blocked_averages(cpu); rcu_read_lock(); for_each_domain(cpu, sd) { + /* + * Decay the newidle max times here because this is a regular + * visit to all the domains. Decay ~1% per second. + */ + if (time_after(jiffies, sd->next_decay_max_lb_cost)) { + sd->max_newidle_lb_cost = + (sd->max_newidle_lb_cost * 253) / 256; + sd->next_decay_max_lb_cost = jiffies + HZ; + need_decay = 1; + } + max_cost += sd->max_newidle_lb_cost; + if (!(sd->flags & SD_LOAD_BALANCE)) continue; + /* + * Stop the load balance at this level. There is another + * CPU in our sched group which is doing load balancing more + * actively. + */ + if (!continue_balancing) { + if (need_decay) + continue; + break; + } + interval = sd->balance_interval; if (idle != CPU_IDLE) interval *= sd->busy_factor; @@ -5689,7 +6656,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle) if (time_after_eq(jiffies, sd->last_balance + interval)) { if (load_balance(cpu, rq, sd, idle, &continue_balancing)) { /* - * The LBF_SOME_PINNED logic could have changed + * The LBF_DST_PINNED logic could have changed * env->dst_cpu, so we can't know our idle * state even if we migrated tasks. Update it. */ @@ -5704,14 +6671,14 @@ out: next_balance = sd->last_balance + interval; update_next_balance = 1; } - + } + if (need_decay) { /* - * Stop the load balance at this level. There is another - * CPU in our sched group which is doing load balancing more - * actively. + * Ensure the rq-wide value also decays but keep it at a + * reasonable floor to avoid funnies with rq->avg_idle. */ - if (!continue_balancing) - break; + rq->max_idle_balance_cost = + max((u64)sysctl_sched_migration_cost, max_cost); } rcu_read_unlock(); diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 99399f8e4799..5716929a2e3a 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -63,10 +63,23 @@ SCHED_FEAT(LB_MIN, false) /* * Apply the automatic NUMA scheduling policy. Enabled automatically * at runtime if running on a NUMA machine. Can be controlled via - * numa_balancing=. Allow PTE scanning to be forced on UMA machines - * for debugging the core machinery. + * numa_balancing= */ #ifdef CONFIG_NUMA_BALANCING SCHED_FEAT(NUMA, false) -SCHED_FEAT(NUMA_FORCE, false) + +/* + * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a + * higher number of hinting faults are recorded during active load + * balancing. + */ +SCHED_FEAT(NUMA_FAVOUR_HIGHER, true) + +/* + * NUMA_RESIST_LOWER will resist moving tasks towards nodes where a + * lower number of hinting faults have been recorded. As this has + * the potential to prevent a task ever migrating to a new node + * due to CPU overload it is disabled by default. + */ +SCHED_FEAT(NUMA_RESIST_LOWER, false) #endif diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index d8da01008d39..516c3d9ceea1 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -9,7 +9,7 @@ #ifdef CONFIG_SMP static int -select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) +select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 01970c8e64df..a848f526b941 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -246,8 +246,10 @@ static inline void rt_set_overload(struct rq *rq) * if we should look at the mask. It would be a shame * if we looked at the mask, but the mask was not * updated yet. + * + * Matched by the barrier in pull_rt_task(). */ - wmb(); + smp_wmb(); atomic_inc(&rq->rd->rto_count); } @@ -1169,13 +1171,10 @@ static void yield_task_rt(struct rq *rq) static int find_lowest_rq(struct task_struct *task); static int -select_task_rq_rt(struct task_struct *p, int sd_flag, int flags) +select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags) { struct task_struct *curr; struct rq *rq; - int cpu; - - cpu = task_cpu(p); if (p->nr_cpus_allowed == 1) goto out; @@ -1213,8 +1212,7 @@ select_task_rq_rt(struct task_struct *p, int sd_flag, int flags) */ if (curr && unlikely(rt_task(curr)) && (curr->nr_cpus_allowed < 2 || - curr->prio <= p->prio) && - (p->nr_cpus_allowed > 1)) { + curr->prio <= p->prio)) { int target = find_lowest_rq(p); if (target != -1) @@ -1630,6 +1628,12 @@ static int pull_rt_task(struct rq *this_rq) if (likely(!rt_overloaded(this_rq))) return 0; + /* + * Match the barrier from rt_set_overloaded; this guarantees that if we + * see overloaded we must also see the rto_mask bit. + */ + smp_rmb(); + for_each_cpu(cpu, this_rq->rd->rto_mask) { if (this_cpu == cpu) continue; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b3c5653e1dca..ffc708717b70 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -6,6 +6,7 @@ #include #include #include +#include #include "cpupri.h" #include "cpuacct.h" @@ -408,6 +409,10 @@ struct rq { * remote CPUs use both these fields when doing load calculation. */ unsigned int nr_running; +#ifdef CONFIG_NUMA_BALANCING + unsigned int nr_numa_running; + unsigned int nr_preferred_running; +#endif #define CPU_LOAD_IDX_MAX 5 unsigned long cpu_load[CPU_LOAD_IDX_MAX]; unsigned long last_load_update_tick; @@ -476,6 +481,9 @@ struct rq { u64 age_stamp; u64 idle_stamp; u64 avg_idle; + + /* This is used to determine avg_idle's max value */ + u64 max_idle_balance_cost; #endif #ifdef CONFIG_IRQ_TIME_ACCOUNTING @@ -552,6 +560,12 @@ static inline u64 rq_clock_task(struct rq *rq) return rq->clock_task; } +#ifdef CONFIG_NUMA_BALANCING +extern void sched_setnuma(struct task_struct *p, int node); +extern int migrate_task_to(struct task_struct *p, int cpu); +extern int migrate_swap(struct task_struct *, struct task_struct *); +#endif /* CONFIG_NUMA_BALANCING */ + #ifdef CONFIG_SMP #define rcu_dereference_check_sched_domain(p) \ @@ -593,9 +607,22 @@ static inline struct sched_domain *highest_flag_domain(int cpu, int flag) return hsd; } +static inline struct sched_domain *lowest_flag_domain(int cpu, int flag) +{ + struct sched_domain *sd; + + for_each_domain(cpu, sd) { + if (sd->flags & flag) + break; + } + + return sd; +} + DECLARE_PER_CPU(struct sched_domain *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); +DECLARE_PER_CPU(struct sched_domain *, sd_numa); struct sched_group_power { atomic_t ref; @@ -605,6 +632,7 @@ struct sched_group_power { */ unsigned int power, power_orig; unsigned long next_update; + int imbalance; /* XXX unrelated to power but shared group state */ /* * Number of busy cpus in this group. */ @@ -719,6 +747,7 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu) */ smp_wmb(); task_thread_info(p)->cpu = cpu; + p->wake_cpu = cpu; #endif } @@ -974,7 +1003,7 @@ struct sched_class { void (*put_prev_task) (struct rq *rq, struct task_struct *p); #ifdef CONFIG_SMP - int (*select_task_rq)(struct task_struct *p, int sd_flag, int flags); + int (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags); void (*migrate_task_rq)(struct task_struct *p, int next_cpu); void (*pre_schedule) (struct rq *this_rq, struct task_struct *task); @@ -1220,6 +1249,24 @@ static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest) lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_); } +static inline void double_lock(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + +static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + raw_spin_lock(l1); + raw_spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + /* * double_rq_lock - safely lock two runqueues * diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index c7edee71bce8..4ab704339656 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -59,9 +59,9 @@ static inline void sched_info_reset_dequeued(struct task_struct *t) * from dequeue_task() to account for possible rq->clock skew across cpus. The * delta taken on each cpu would annul the skew. */ -static inline void sched_info_dequeued(struct task_struct *t) +static inline void sched_info_dequeued(struct rq *rq, struct task_struct *t) { - unsigned long long now = rq_clock(task_rq(t)), delta = 0; + unsigned long long now = rq_clock(rq), delta = 0; if (unlikely(sched_info_on())) if (t->sched_info.last_queued) @@ -69,7 +69,7 @@ static inline void sched_info_dequeued(struct task_struct *t) sched_info_reset_dequeued(t); t->sched_info.run_delay += delta; - rq_sched_info_dequeued(task_rq(t), delta); + rq_sched_info_dequeued(rq, delta); } /* @@ -77,9 +77,9 @@ static inline void sched_info_dequeued(struct task_struct *t) * long it was waiting to run. We also note when it began so that we * can keep stats on how long its timeslice is. */ -static void sched_info_arrive(struct task_struct *t) +static void sched_info_arrive(struct rq *rq, struct task_struct *t) { - unsigned long long now = rq_clock(task_rq(t)), delta = 0; + unsigned long long now = rq_clock(rq), delta = 0; if (t->sched_info.last_queued) delta = now - t->sched_info.last_queued; @@ -88,7 +88,7 @@ static void sched_info_arrive(struct task_struct *t) t->sched_info.last_arrival = now; t->sched_info.pcount++; - rq_sched_info_arrive(task_rq(t), delta); + rq_sched_info_arrive(rq, delta); } /* @@ -96,11 +96,11 @@ static void sched_info_arrive(struct task_struct *t) * the timestamp if it is already not set. It's assumed that * sched_info_dequeued() will clear that stamp when appropriate. */ -static inline void sched_info_queued(struct task_struct *t) +static inline void sched_info_queued(struct rq *rq, struct task_struct *t) { if (unlikely(sched_info_on())) if (!t->sched_info.last_queued) - t->sched_info.last_queued = rq_clock(task_rq(t)); + t->sched_info.last_queued = rq_clock(rq); } /* @@ -111,15 +111,15 @@ static inline void sched_info_queued(struct task_struct *t) * sched_info_queued() to mark that it has now again started waiting on * the runqueue. */ -static inline void sched_info_depart(struct task_struct *t) +static inline void sched_info_depart(struct rq *rq, struct task_struct *t) { - unsigned long long delta = rq_clock(task_rq(t)) - + unsigned long long delta = rq_clock(rq) - t->sched_info.last_arrival; - rq_sched_info_depart(task_rq(t), delta); + rq_sched_info_depart(rq, delta); if (t->state == TASK_RUNNING) - sched_info_queued(t); + sched_info_queued(rq, t); } /* @@ -128,32 +128,34 @@ static inline void sched_info_depart(struct task_struct *t) * the idle task.) We are only called when prev != next. */ static inline void -__sched_info_switch(struct task_struct *prev, struct task_struct *next) +__sched_info_switch(struct rq *rq, + struct task_struct *prev, struct task_struct *next) { - struct rq *rq = task_rq(prev); - /* * prev now departs the cpu. It's not interesting to record * stats about how efficient we were at scheduling the idle * process, however. */ if (prev != rq->idle) - sched_info_depart(prev); + sched_info_depart(rq, prev); if (next != rq->idle) - sched_info_arrive(next); + sched_info_arrive(rq, next); } static inline void -sched_info_switch(struct task_struct *prev, struct task_struct *next) +sched_info_switch(struct rq *rq, + struct task_struct *prev, struct task_struct *next) { if (unlikely(sched_info_on())) - __sched_info_switch(prev, next); + __sched_info_switch(rq, prev, next); } #else -#define sched_info_queued(t) do { } while (0) +#define sched_info_queued(rq, t) do { } while (0) #define sched_info_reset_dequeued(t) do { } while (0) -#define sched_info_dequeued(t) do { } while (0) -#define sched_info_switch(t, next) do { } while (0) +#define sched_info_dequeued(rq, t) do { } while (0) +#define sched_info_depart(rq, t) do { } while (0) +#define sched_info_arrive(rq, next) do { } while (0) +#define sched_info_switch(rq, t, next) do { } while (0) #endif /* CONFIG_SCHEDSTATS || CONFIG_TASK_DELAY_ACCT */ /* diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index e08fbeeb54b9..47197de8abd9 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -11,7 +11,7 @@ #ifdef CONFIG_SMP static int -select_task_rq_stop(struct task_struct *p, int sd_flag, int flags) +select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags) { return task_cpu(p); /* stop tasks as never migrate */ } diff --git a/kernel/smp.c b/kernel/smp.c index 0564571dcdf7..f5768b0c816a 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -524,6 +524,11 @@ void __init setup_nr_cpu_ids(void) nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; } +void __weak smp_announce(void) +{ + printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus()); +} + /* Called by boot processor to activate the rest. */ void __init smp_init(void) { @@ -540,7 +545,7 @@ void __init smp_init(void) } /* Any cleanup work */ - printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus()); + smp_announce(); smp_cpus_done(setup_max_cpus); } diff --git a/kernel/softirq.c b/kernel/softirq.c index 53cc09ceb0b8..b24988353458 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -29,7 +29,6 @@ #define CREATE_TRACE_POINTS #include -#include /* - No shared variables, all the data are CPU local. - If a softirq needs serialization, let it serialize itself @@ -100,13 +99,13 @@ static void __local_bh_disable(unsigned long ip, unsigned int cnt) raw_local_irq_save(flags); /* - * The preempt tracer hooks into add_preempt_count and will break + * The preempt tracer hooks into preempt_count_add and will break * lockdep because it calls back into lockdep after SOFTIRQ_OFFSET * is set and before current->softirq_enabled is cleared. * We must manually increment preempt_count here and manually * call the trace_preempt_off later. */ - preempt_count() += cnt; + __preempt_count_add(cnt); /* * Were softirqs turned off above: */ @@ -120,7 +119,7 @@ static void __local_bh_disable(unsigned long ip, unsigned int cnt) #else /* !CONFIG_TRACE_IRQFLAGS */ static inline void __local_bh_disable(unsigned long ip, unsigned int cnt) { - add_preempt_count(cnt); + preempt_count_add(cnt); barrier(); } #endif /* CONFIG_TRACE_IRQFLAGS */ @@ -134,12 +133,11 @@ EXPORT_SYMBOL(local_bh_disable); static void __local_bh_enable(unsigned int cnt) { - WARN_ON_ONCE(in_irq()); WARN_ON_ONCE(!irqs_disabled()); if (softirq_count() == cnt) trace_softirqs_on(_RET_IP_); - sub_preempt_count(cnt); + preempt_count_sub(cnt); } /* @@ -149,6 +147,7 @@ static void __local_bh_enable(unsigned int cnt) */ void _local_bh_enable(void) { + WARN_ON_ONCE(in_irq()); __local_bh_enable(SOFTIRQ_DISABLE_OFFSET); } @@ -169,12 +168,17 @@ static inline void _local_bh_enable_ip(unsigned long ip) * Keep preemption disabled until we are done with * softirq processing: */ - sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1); + preempt_count_sub(SOFTIRQ_DISABLE_OFFSET - 1); - if (unlikely(!in_interrupt() && local_softirq_pending())) + if (unlikely(!in_interrupt() && local_softirq_pending())) { + /* + * Run softirq if any pending. And do it in its own stack + * as we may be calling this deep in a task call stack already. + */ do_softirq(); + } - dec_preempt_count(); + preempt_count_dec(); #ifdef CONFIG_TRACE_IRQFLAGS local_irq_enable(); #endif @@ -256,7 +260,7 @@ restart: " exited with %08x?\n", vec_nr, softirq_to_name[vec_nr], h->action, prev_count, preempt_count()); - preempt_count() = prev_count; + preempt_count_set(prev_count); } rcu_bh_qs(cpu); @@ -280,10 +284,11 @@ restart: account_irq_exit_time(current); __local_bh_enable(SOFTIRQ_OFFSET); + WARN_ON_ONCE(in_interrupt()); tsk_restore_flags(current, old_flags, PF_MEMALLOC); } -#ifndef __ARCH_HAS_DO_SOFTIRQ + asmlinkage void do_softirq(void) { @@ -298,13 +303,11 @@ asmlinkage void do_softirq(void) pending = local_softirq_pending(); if (pending) - __do_softirq(); + do_softirq_own_stack(); local_irq_restore(flags); } -#endif - /* * Enter an interrupt context. */ @@ -328,10 +331,25 @@ void irq_enter(void) static inline void invoke_softirq(void) { - if (!force_irqthreads) + if (!force_irqthreads) { +#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK + /* + * We can safely execute softirq on the current stack if + * it is the irq stack, because it should be near empty + * at this stage. + */ __do_softirq(); - else +#else + /* + * Otherwise, irq_exit() is called on the task stack that can + * be potentially deep already. So call softirq in its own stack + * to prevent from any overrun. + */ + do_softirq_own_stack(); +#endif + } else { wakeup_softirqd(); + } } static inline void tick_irq_exit(void) @@ -360,7 +378,7 @@ void irq_exit(void) account_irq_exit_time(current); trace_hardirq_exit(); - sub_preempt_count(HARDIRQ_OFFSET); + preempt_count_sub(HARDIRQ_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); @@ -762,6 +780,10 @@ static void run_ksoftirqd(unsigned int cpu) { local_irq_disable(); if (local_softirq_pending()) { + /* + * We can safely run softirq on inline stack, as we are not deep + * in the task stack here. + */ __do_softirq(); rcu_note_context_switch(cpu); local_irq_enable(); diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index c09f2955ae30..c530bc5be7cf 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -115,6 +115,182 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg) return done.executed ? done.ret : -ENOENT; } +/* This controls the threads on each CPU. */ +enum multi_stop_state { + /* Dummy starting state for thread. */ + MULTI_STOP_NONE, + /* Awaiting everyone to be scheduled. */ + MULTI_STOP_PREPARE, + /* Disable interrupts. */ + MULTI_STOP_DISABLE_IRQ, + /* Run the function */ + MULTI_STOP_RUN, + /* Exit */ + MULTI_STOP_EXIT, +}; + +struct multi_stop_data { + int (*fn)(void *); + void *data; + /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */ + unsigned int num_threads; + const struct cpumask *active_cpus; + + enum multi_stop_state state; + atomic_t thread_ack; +}; + +static void set_state(struct multi_stop_data *msdata, + enum multi_stop_state newstate) +{ + /* Reset ack counter. */ + atomic_set(&msdata->thread_ack, msdata->num_threads); + smp_wmb(); + msdata->state = newstate; +} + +/* Last one to ack a state moves to the next state. */ +static void ack_state(struct multi_stop_data *msdata) +{ + if (atomic_dec_and_test(&msdata->thread_ack)) + set_state(msdata, msdata->state + 1); +} + +/* This is the cpu_stop function which stops the CPU. */ +static int multi_cpu_stop(void *data) +{ + struct multi_stop_data *msdata = data; + enum multi_stop_state curstate = MULTI_STOP_NONE; + int cpu = smp_processor_id(), err = 0; + unsigned long flags; + bool is_active; + + /* + * When called from stop_machine_from_inactive_cpu(), irq might + * already be disabled. Save the state and restore it on exit. + */ + local_save_flags(flags); + + if (!msdata->active_cpus) + is_active = cpu == cpumask_first(cpu_online_mask); + else + is_active = cpumask_test_cpu(cpu, msdata->active_cpus); + + /* Simple state machine */ + do { + /* Chill out and ensure we re-read multi_stop_state. */ + cpu_relax(); + if (msdata->state != curstate) { + curstate = msdata->state; + switch (curstate) { + case MULTI_STOP_DISABLE_IRQ: + local_irq_disable(); + hard_irq_disable(); + break; + case MULTI_STOP_RUN: + if (is_active) + err = msdata->fn(msdata->data); + break; + default: + break; + } + ack_state(msdata); + } + } while (curstate != MULTI_STOP_EXIT); + + local_irq_restore(flags); + return err; +} + +struct irq_cpu_stop_queue_work_info { + int cpu1; + int cpu2; + struct cpu_stop_work *work1; + struct cpu_stop_work *work2; +}; + +/* + * This function is always run with irqs and preemption disabled. + * This guarantees that both work1 and work2 get queued, before + * our local migrate thread gets the chance to preempt us. + */ +static void irq_cpu_stop_queue_work(void *arg) +{ + struct irq_cpu_stop_queue_work_info *info = arg; + cpu_stop_queue_work(info->cpu1, info->work1); + cpu_stop_queue_work(info->cpu2, info->work2); +} + +/** + * stop_two_cpus - stops two cpus + * @cpu1: the cpu to stop + * @cpu2: the other cpu to stop + * @fn: function to execute + * @arg: argument to @fn + * + * Stops both the current and specified CPU and runs @fn on one of them. + * + * returns when both are completed. + */ +int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg) +{ + struct cpu_stop_done done; + struct cpu_stop_work work1, work2; + struct irq_cpu_stop_queue_work_info call_args; + struct multi_stop_data msdata; + + preempt_disable(); + msdata = (struct multi_stop_data){ + .fn = fn, + .data = arg, + .num_threads = 2, + .active_cpus = cpumask_of(cpu1), + }; + + work1 = work2 = (struct cpu_stop_work){ + .fn = multi_cpu_stop, + .arg = &msdata, + .done = &done + }; + + call_args = (struct irq_cpu_stop_queue_work_info){ + .cpu1 = cpu1, + .cpu2 = cpu2, + .work1 = &work1, + .work2 = &work2, + }; + + cpu_stop_init_done(&done, 2); + set_state(&msdata, MULTI_STOP_PREPARE); + + /* + * If we observe both CPUs active we know _cpu_down() cannot yet have + * queued its stop_machine works and therefore ours will get executed + * first. Or its not either one of our CPUs that's getting unplugged, + * in which case we don't care. + * + * This relies on the stopper workqueues to be FIFO. + */ + if (!cpu_active(cpu1) || !cpu_active(cpu2)) { + preempt_enable(); + return -ENOENT; + } + + /* + * Queuing needs to be done by the lowest numbered CPU, to ensure + * that works are always queued in the same order on every CPU. + * This prevents deadlocks. + */ + smp_call_function_single(min(cpu1, cpu2), + &irq_cpu_stop_queue_work, + &call_args, 0); + preempt_enable(); + + wait_for_completion(&done.completion); + + return done.executed ? done.ret : -ENOENT; +} + /** * stop_one_cpu_nowait - stop a cpu but don't wait for completion * @cpu: cpu to stop @@ -359,98 +535,14 @@ early_initcall(cpu_stop_init); #ifdef CONFIG_STOP_MACHINE -/* This controls the threads on each CPU. */ -enum stopmachine_state { - /* Dummy starting state for thread. */ - STOPMACHINE_NONE, - /* Awaiting everyone to be scheduled. */ - STOPMACHINE_PREPARE, - /* Disable interrupts. */ - STOPMACHINE_DISABLE_IRQ, - /* Run the function */ - STOPMACHINE_RUN, - /* Exit */ - STOPMACHINE_EXIT, -}; - -struct stop_machine_data { - int (*fn)(void *); - void *data; - /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */ - unsigned int num_threads; - const struct cpumask *active_cpus; - - enum stopmachine_state state; - atomic_t thread_ack; -}; - -static void set_state(struct stop_machine_data *smdata, - enum stopmachine_state newstate) -{ - /* Reset ack counter. */ - atomic_set(&smdata->thread_ack, smdata->num_threads); - smp_wmb(); - smdata->state = newstate; -} - -/* Last one to ack a state moves to the next state. */ -static void ack_state(struct stop_machine_data *smdata) -{ - if (atomic_dec_and_test(&smdata->thread_ack)) - set_state(smdata, smdata->state + 1); -} - -/* This is the cpu_stop function which stops the CPU. */ -static int stop_machine_cpu_stop(void *data) -{ - struct stop_machine_data *smdata = data; - enum stopmachine_state curstate = STOPMACHINE_NONE; - int cpu = smp_processor_id(), err = 0; - unsigned long flags; - bool is_active; - - /* - * When called from stop_machine_from_inactive_cpu(), irq might - * already be disabled. Save the state and restore it on exit. - */ - local_save_flags(flags); - - if (!smdata->active_cpus) - is_active = cpu == cpumask_first(cpu_online_mask); - else - is_active = cpumask_test_cpu(cpu, smdata->active_cpus); - - /* Simple state machine */ - do { - /* Chill out and ensure we re-read stopmachine_state. */ - cpu_relax(); - if (smdata->state != curstate) { - curstate = smdata->state; - switch (curstate) { - case STOPMACHINE_DISABLE_IRQ: - local_irq_disable(); - hard_irq_disable(); - break; - case STOPMACHINE_RUN: - if (is_active) - err = smdata->fn(smdata->data); - break; - default: - break; - } - ack_state(smdata); - } - } while (curstate != STOPMACHINE_EXIT); - - local_irq_restore(flags); - return err; -} - int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) { - struct stop_machine_data smdata = { .fn = fn, .data = data, - .num_threads = num_online_cpus(), - .active_cpus = cpus }; + struct multi_stop_data msdata = { + .fn = fn, + .data = data, + .num_threads = num_online_cpus(), + .active_cpus = cpus, + }; if (!stop_machine_initialized) { /* @@ -461,7 +553,7 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) unsigned long flags; int ret; - WARN_ON_ONCE(smdata.num_threads != 1); + WARN_ON_ONCE(msdata.num_threads != 1); local_irq_save(flags); hard_irq_disable(); @@ -472,8 +564,8 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) } /* Set the initial state and stop all online cpus. */ - set_state(&smdata, STOPMACHINE_PREPARE); - return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata); + set_state(&msdata, MULTI_STOP_PREPARE); + return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata); } int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) @@ -513,25 +605,25 @@ EXPORT_SYMBOL_GPL(stop_machine); int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data, const struct cpumask *cpus) { - struct stop_machine_data smdata = { .fn = fn, .data = data, + struct multi_stop_data msdata = { .fn = fn, .data = data, .active_cpus = cpus }; struct cpu_stop_done done; int ret; /* Local CPU must be inactive and CPU hotplug in progress. */ BUG_ON(cpu_active(raw_smp_processor_id())); - smdata.num_threads = num_active_cpus() + 1; /* +1 for local */ + msdata.num_threads = num_active_cpus() + 1; /* +1 for local */ /* No proper task established and can't sleep - busy wait for lock. */ while (!mutex_trylock(&stop_cpus_mutex)) cpu_relax(); /* Schedule work on other CPUs and execute directly for local CPU */ - set_state(&smdata, STOPMACHINE_PREPARE); + set_state(&msdata, MULTI_STOP_PREPARE); cpu_stop_init_done(&done, num_active_cpus()); - queue_stop_cpus_work(cpu_active_mask, stop_machine_cpu_stop, &smdata, + queue_stop_cpus_work(cpu_active_mask, multi_cpu_stop, &msdata, &done); - ret = stop_machine_cpu_stop(&smdata); + ret = multi_cpu_stop(&msdata); /* Busy wait for completion. */ while (!completion_done(&done.completion)) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index b2f06f3c6a3f..5c704dbf6d45 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -370,13 +370,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "numa_balancing_scan_period_reset", - .data = &sysctl_numa_balancing_scan_period_reset, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, { .procname = "numa_balancing_scan_period_max_ms", .data = &sysctl_numa_balancing_scan_period_max, @@ -391,6 +384,20 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "numa_balancing_settle_count", + .data = &sysctl_numa_balancing_settle_count, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "numa_balancing_migrate_deferred", + .data = &sysctl_numa_balancing_migrate_deferred, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_SCHED_DEBUG */ { @@ -962,9 +969,10 @@ static struct ctl_table kern_table[] = { { .procname = "hung_task_check_count", .data = &sysctl_hung_task_check_count, - .maxlen = sizeof(unsigned long), + .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_doulongvec_minmax, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, }, { .procname = "hung_task_timeout_secs", @@ -1049,6 +1057,7 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(sysctl_perf_event_sample_rate), .mode = 0644, .proc_handler = perf_proc_update_handler, + .extra1 = &one, }, { .procname = "perf_cpu_time_max_percent", diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 2b62fe86f9ec..3ce6e8c5f3fc 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -100,7 +100,7 @@ config NO_HZ_FULL # RCU_USER_QS dependency depends on HAVE_CONTEXT_TRACKING # VIRT_CPU_ACCOUNTING_GEN dependency - depends on 64BIT + depends on HAVE_VIRT_CPU_ACCOUNTING_GEN select NO_HZ_COMMON select RCU_USER_QS select RCU_NOCB_CPU diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 50a8736757f3..c9317e14aae6 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -479,6 +479,7 @@ static inline void clocksource_dequeue_watchdog(struct clocksource *cs) { } static inline void clocksource_resume_watchdog(void) { } static inline int __clocksource_watchdog_kthread(void) { return 0; } static bool clocksource_is_watchdog(struct clocksource *cs) { return false; } +void clocksource_mark_unstable(struct clocksource *cs) { } #endif /* CONFIG_CLOCKSOURCE_WATCHDOG */ @@ -537,40 +538,55 @@ static u32 clocksource_max_adjustment(struct clocksource *cs) } /** - * clocksource_max_deferment - Returns max time the clocksource can be deferred - * @cs: Pointer to clocksource - * + * clocks_calc_max_nsecs - Returns maximum nanoseconds that can be converted + * @mult: cycle to nanosecond multiplier + * @shift: cycle to nanosecond divisor (power of two) + * @maxadj: maximum adjustment value to mult (~11%) + * @mask: bitmask for two's complement subtraction of non 64 bit counters */ -static u64 clocksource_max_deferment(struct clocksource *cs) +u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask) { u64 max_nsecs, max_cycles; /* * Calculate the maximum number of cycles that we can pass to the * cyc2ns function without overflowing a 64-bit signed result. The - * maximum number of cycles is equal to ULLONG_MAX/(cs->mult+cs->maxadj) + * maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj) * which is equivalent to the below. - * max_cycles < (2^63)/(cs->mult + cs->maxadj) - * max_cycles < 2^(log2((2^63)/(cs->mult + cs->maxadj))) - * max_cycles < 2^(log2(2^63) - log2(cs->mult + cs->maxadj)) - * max_cycles < 2^(63 - log2(cs->mult + cs->maxadj)) - * max_cycles < 1 << (63 - log2(cs->mult + cs->maxadj)) + * max_cycles < (2^63)/(mult + maxadj) + * max_cycles < 2^(log2((2^63)/(mult + maxadj))) + * max_cycles < 2^(log2(2^63) - log2(mult + maxadj)) + * max_cycles < 2^(63 - log2(mult + maxadj)) + * max_cycles < 1 << (63 - log2(mult + maxadj)) * Please note that we add 1 to the result of the log2 to account for * any rounding errors, ensure the above inequality is satisfied and * no overflow will occur. */ - max_cycles = 1ULL << (63 - (ilog2(cs->mult + cs->maxadj) + 1)); + max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1)); /* * The actual maximum number of cycles we can defer the clocksource is - * determined by the minimum of max_cycles and cs->mask. + * determined by the minimum of max_cycles and mask. * Note: Here we subtract the maxadj to make sure we don't sleep for * too long if there's a large negative adjustment. */ - max_cycles = min_t(u64, max_cycles, (u64) cs->mask); - max_nsecs = clocksource_cyc2ns(max_cycles, cs->mult - cs->maxadj, - cs->shift); + max_cycles = min(max_cycles, mask); + max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift); + + return max_nsecs; +} + +/** + * clocksource_max_deferment - Returns max time the clocksource can be deferred + * @cs: Pointer to clocksource + * + */ +static u64 clocksource_max_deferment(struct clocksource *cs) +{ + u64 max_nsecs; + max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj, + cs->mask); /* * To ensure that the clocksource does not wrap whilst we are idle, * limit the time the clocksource can be deferred by 12.5%. Please @@ -924,7 +940,7 @@ static ssize_t sysfs_override_clocksource(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - size_t ret; + ssize_t ret; mutex_lock(&clocksource_mutex); @@ -952,7 +968,7 @@ static ssize_t sysfs_unbind_clocksource(struct device *dev, { struct clocksource *cs; char name[CS_NAME_LEN]; - size_t ret; + ssize_t ret; ret = sysfs_get_uname(buf, name, count); if (ret < 0) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index bb2215174f05..af8d1d4f3d55 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -475,6 +475,7 @@ static void sync_cmos_clock(struct work_struct *work) * called as close as possible to 500 ms before the new second starts. * This code is run on a timer. If the clock is set, that timer * may not expire at the correct time. Thus, we adjust... + * We want the clock to be within a couple of ticks from the target. */ if (!ntp_synced()) { /* @@ -485,7 +486,7 @@ static void sync_cmos_clock(struct work_struct *work) } getnstimeofday(&now); - if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2) { + if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) { struct timespec adjust = now; fail = -ENODEV; diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 0b479a6a22bb..68b799375981 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -8,25 +8,28 @@ #include #include #include +#include #include #include #include #include -#include +#include #include +#include +#include struct clock_data { + ktime_t wrap_kt; u64 epoch_ns; - u32 epoch_cyc; - u32 epoch_cyc_copy; + u64 epoch_cyc; + seqcount_t seq; unsigned long rate; u32 mult; u32 shift; bool suspended; }; -static void sched_clock_poll(unsigned long wrap_ticks); -static DEFINE_TIMER(sched_clock_timer, sched_clock_poll, 0, 0); +static struct hrtimer sched_clock_timer; static int irqtime = -1; core_param(irqtime, irqtime, int, 0400); @@ -35,42 +38,46 @@ static struct clock_data cd = { .mult = NSEC_PER_SEC / HZ, }; -static u32 __read_mostly sched_clock_mask = 0xffffffff; +static u64 __read_mostly sched_clock_mask; -static u32 notrace jiffy_sched_clock_read(void) +static u64 notrace jiffy_sched_clock_read(void) { - return (u32)(jiffies - INITIAL_JIFFIES); + /* + * We don't need to use get_jiffies_64 on 32-bit arches here + * because we register with BITS_PER_LONG + */ + return (u64)(jiffies - INITIAL_JIFFIES); } -static u32 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read; +static u32 __read_mostly (*read_sched_clock_32)(void); + +static u64 notrace read_sched_clock_32_wrapper(void) +{ + return read_sched_clock_32(); +} + +static u64 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read; static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift) { return (cyc * mult) >> shift; } -static unsigned long long notrace sched_clock_32(void) +unsigned long long notrace sched_clock(void) { u64 epoch_ns; - u32 epoch_cyc; - u32 cyc; + u64 epoch_cyc; + u64 cyc; + unsigned long seq; if (cd.suspended) return cd.epoch_ns; - /* - * Load the epoch_cyc and epoch_ns atomically. We do this by - * ensuring that we always write epoch_cyc, epoch_ns and - * epoch_cyc_copy in strict order, and read them in strict order. - * If epoch_cyc and epoch_cyc_copy are not equal, then we're in - * the middle of an update, and we should repeat the load. - */ do { + seq = read_seqcount_begin(&cd.seq); epoch_cyc = cd.epoch_cyc; - smp_rmb(); epoch_ns = cd.epoch_ns; - smp_rmb(); - } while (epoch_cyc != cd.epoch_cyc_copy); + } while (read_seqcount_retry(&cd.seq, seq)); cyc = read_sched_clock(); cyc = (cyc - epoch_cyc) & sched_clock_mask; @@ -83,49 +90,46 @@ static unsigned long long notrace sched_clock_32(void) static void notrace update_sched_clock(void) { unsigned long flags; - u32 cyc; + u64 cyc; u64 ns; cyc = read_sched_clock(); ns = cd.epoch_ns + cyc_to_ns((cyc - cd.epoch_cyc) & sched_clock_mask, cd.mult, cd.shift); - /* - * Write epoch_cyc and epoch_ns in a way that the update is - * detectable in cyc_to_fixed_sched_clock(). - */ + raw_local_irq_save(flags); - cd.epoch_cyc_copy = cyc; - smp_wmb(); + write_seqcount_begin(&cd.seq); cd.epoch_ns = ns; - smp_wmb(); cd.epoch_cyc = cyc; + write_seqcount_end(&cd.seq); raw_local_irq_restore(flags); } -static void sched_clock_poll(unsigned long wrap_ticks) +static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt) { - mod_timer(&sched_clock_timer, round_jiffies(jiffies + wrap_ticks)); update_sched_clock(); + hrtimer_forward_now(hrt, cd.wrap_kt); + return HRTIMER_RESTART; } -void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate) +void __init sched_clock_register(u64 (*read)(void), int bits, + unsigned long rate) { - unsigned long r, w; + unsigned long r; u64 res, wrap; char r_unit; if (cd.rate > rate) return; - BUG_ON(bits > 32); WARN_ON(!irqs_disabled()); read_sched_clock = read; - sched_clock_mask = (1ULL << bits) - 1; + sched_clock_mask = CLOCKSOURCE_MASK(bits); cd.rate = rate; /* calculate the mult/shift to convert counter ticks to ns. */ - clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 0); + clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600); r = rate; if (r >= 4000000) { @@ -138,20 +142,14 @@ void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate) r_unit = ' '; /* calculate how many ns until we wrap */ - wrap = cyc_to_ns((1ULL << bits) - 1, cd.mult, cd.shift); - do_div(wrap, NSEC_PER_MSEC); - w = wrap; + wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask); + cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3)); /* calculate the ns resolution of this counter */ res = cyc_to_ns(1ULL, cd.mult, cd.shift); - pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lums\n", - bits, r, r_unit, res, w); + pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n", + bits, r, r_unit, res, wrap); - /* - * Start the timer to keep sched_clock() properly updated and - * sets the initial epoch. - */ - sched_clock_timer.data = msecs_to_jiffies(w - (w / 10)); update_sched_clock(); /* @@ -166,11 +164,10 @@ void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate) pr_debug("Registered %pF as sched_clock source\n", read); } -unsigned long long __read_mostly (*sched_clock_func)(void) = sched_clock_32; - -unsigned long long notrace sched_clock(void) +void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate) { - return sched_clock_func(); + read_sched_clock_32 = read; + sched_clock_register(read_sched_clock_32_wrapper, bits, rate); } void __init sched_clock_postinit(void) @@ -180,14 +177,22 @@ void __init sched_clock_postinit(void) * make it the final one one. */ if (read_sched_clock == jiffy_sched_clock_read) - setup_sched_clock(jiffy_sched_clock_read, 32, HZ); + sched_clock_register(jiffy_sched_clock_read, BITS_PER_LONG, HZ); - sched_clock_poll(sched_clock_timer.data); + update_sched_clock(); + + /* + * Start the timer to keep sched_clock() properly updated and + * sets the initial epoch. + */ + hrtimer_init(&sched_clock_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + sched_clock_timer.function = sched_clock_poll; + hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL); } static int sched_clock_suspend(void) { - sched_clock_poll(sched_clock_timer.data); + sched_clock_poll(&sched_clock_timer); cd.suspended = true; return 0; } @@ -195,7 +200,6 @@ static int sched_clock_suspend(void) static void sched_clock_resume(void) { cd.epoch_cyc = read_sched_clock(); - cd.epoch_cyc_copy = cd.epoch_cyc; cd.suspended = false; } diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 218bcb565fed..9532690daaa9 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -70,6 +70,7 @@ static bool tick_check_broadcast_device(struct clock_event_device *curdev, struct clock_event_device *newdev) { if ((newdev->features & CLOCK_EVT_FEAT_DUMMY) || + (newdev->features & CLOCK_EVT_FEAT_PERCPU) || (newdev->features & CLOCK_EVT_FEAT_C3STOP)) return false; diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c index 0b537f27b559..1fb08f21302e 100644 --- a/kernel/time/timer_stats.c +++ b/kernel/time/timer_stats.c @@ -298,15 +298,15 @@ static int tstats_show(struct seq_file *m, void *v) period = ktime_to_timespec(time); ms = period.tv_nsec / 1000000; - seq_puts(m, "Timer Stats Version: v0.2\n"); + seq_puts(m, "Timer Stats Version: v0.3\n"); seq_printf(m, "Sample period: %ld.%03ld s\n", period.tv_sec, ms); if (atomic_read(&overflow_count)) - seq_printf(m, "Overflow: %d entries\n", - atomic_read(&overflow_count)); + seq_printf(m, "Overflow: %d entries\n", atomic_read(&overflow_count)); + seq_printf(m, "Collection: %s\n", timer_stats_active ? "active" : "inactive"); for (i = 0; i < nr_entries; i++) { entry = entries + i; - if (entry->timer_flag & TIMER_STATS_FLAG_DEFERRABLE) { + if (entry->timer_flag & TIMER_STATS_FLAG_DEFERRABLE) { seq_printf(m, "%4luD, %5d %-16s ", entry->count, entry->pid, entry->comm); } else { diff --git a/kernel/timer.c b/kernel/timer.c index 4296d13db3d1..6582b82fa966 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1092,7 +1092,7 @@ static int cascade(struct tvec_base *base, struct tvec *tv, int index) static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long), unsigned long data) { - int preempt_count = preempt_count(); + int count = preempt_count(); #ifdef CONFIG_LOCKDEP /* @@ -1119,16 +1119,16 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long), lock_map_release(&lockdep_map); - if (preempt_count != preempt_count()) { + if (count != preempt_count()) { WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n", - fn, preempt_count, preempt_count()); + fn, count, preempt_count()); /* * Restore the preempt count. That gives us a decent * chance to survive and extract information. If the * callback kept a lock held, bad luck, but not worse * than the BUG() we had. */ - preempt_count() = preempt_count; + preempt_count_set(count); } } diff --git a/kernel/wait.c b/kernel/wait.c index d550920e040c..de21c6305a44 100644 --- a/kernel/wait.c +++ b/kernel/wait.c @@ -92,6 +92,30 @@ prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state) } EXPORT_SYMBOL(prepare_to_wait_exclusive); +long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state) +{ + unsigned long flags; + + if (signal_pending_state(state, current)) + return -ERESTARTSYS; + + wait->private = current; + wait->func = autoremove_wake_function; + + spin_lock_irqsave(&q->lock, flags); + if (list_empty(&wait->task_list)) { + if (wait->flags & WQ_FLAG_EXCLUSIVE) + __add_wait_queue_tail(q, wait); + else + __add_wait_queue(q, wait); + } + set_current_state(state); + spin_unlock_irqrestore(&q->lock, flags); + + return 0; +} +EXPORT_SYMBOL(prepare_to_wait_event); + /** * finish_wait - clean up after waiting in a queue * @q: waitqueue waited on diff --git a/lib/hexdump.c b/lib/hexdump.c index 3f0494c9d57a..8499c810909a 100644 --- a/lib/hexdump.c +++ b/lib/hexdump.c @@ -14,6 +14,8 @@ const char hex_asc[] = "0123456789abcdef"; EXPORT_SYMBOL(hex_asc); +const char hex_asc_upper[] = "0123456789ABCDEF"; +EXPORT_SYMBOL(hex_asc_upper); /** * hex_to_bin - convert a hex digit to its real value diff --git a/lib/kobject.c b/lib/kobject.c index 962175134702..084f7b18d0c0 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -592,7 +592,7 @@ static void kobject_release(struct kref *kref) { struct kobject *kobj = container_of(kref, struct kobject, kref); #ifdef CONFIG_DEBUG_KOBJECT_RELEASE - pr_debug("kobject: '%s' (%p): %s, parent %p (delayed)\n", + pr_info("kobject: '%s' (%p): %s, parent %p (delayed)\n", kobject_name(kobj), kobj, __func__, kobj->parent); INIT_DELAYED_WORK(&kobj->release, kobject_delayed_cleanup); schedule_delayed_work(&kobj->release, HZ); @@ -933,10 +933,7 @@ const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj) bool kobj_ns_current_may_mount(enum kobj_ns_type type) { - bool may_mount = false; - - if (type == KOBJ_NS_TYPE_NONE) - return true; + bool may_mount = true; spin_lock(&kobj_ns_type_lock); if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) && diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c index 6dc09d8f4c24..872a15a2a637 100644 --- a/lib/locking-selftest.c +++ b/lib/locking-selftest.c @@ -1002,7 +1002,7 @@ static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask) * Some tests (e.g. double-unlock) might corrupt the preemption * count, so restore it: */ - preempt_count() = saved_preempt_count; + preempt_count_set(saved_preempt_count); #ifdef CONFIG_TRACE_IRQFLAGS if (softirq_count()) current->softirqs_enabled = 0; diff --git a/lib/lockref.c b/lib/lockref.c index 677d036cf3c7..6f9d434c1521 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -3,6 +3,22 @@ #ifdef CONFIG_CMPXCHG_LOCKREF +/* + * Allow weakly-ordered memory architectures to provide barrier-less + * cmpxchg semantics for lockref updates. + */ +#ifndef cmpxchg64_relaxed +# define cmpxchg64_relaxed cmpxchg64 +#endif + +/* + * Allow architectures to override the default cpu_relax() within CMPXCHG_LOOP. + * This is useful for architectures with an expensive cpu_relax(). + */ +#ifndef arch_mutex_cpu_relax +# define arch_mutex_cpu_relax() cpu_relax() +#endif + /* * Note that the "cmpxchg()" reloads the "old" value for the * failure case. @@ -14,12 +30,13 @@ while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \ struct lockref new = old, prev = old; \ CODE \ - old.lock_count = cmpxchg64(&lockref->lock_count, \ - old.lock_count, new.lock_count); \ + old.lock_count = cmpxchg64_relaxed(&lockref->lock_count, \ + old.lock_count, \ + new.lock_count); \ if (likely(old.lock_count == prev.lock_count)) { \ SUCCESS; \ } \ - cpu_relax(); \ + arch_mutex_cpu_relax(); \ } \ } while (0) diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 7deeb6297a48..1a53d497a8c5 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -53,6 +53,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release) ref->release = release; return 0; } +EXPORT_SYMBOL_GPL(percpu_ref_init); /** * percpu_ref_cancel_init - cancel percpu_ref_init() @@ -84,6 +85,7 @@ void percpu_ref_cancel_init(struct percpu_ref *ref) free_percpu(ref->pcpu_count); } } +EXPORT_SYMBOL_GPL(percpu_ref_cancel_init); static void percpu_ref_kill_rcu(struct rcu_head *rcu) { @@ -156,3 +158,4 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu); } +EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm); diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c index 4c0d0e51d49e..04abe53f12a1 100644 --- a/lib/smp_processor_id.c +++ b/lib/smp_processor_id.c @@ -9,10 +9,9 @@ notrace unsigned int debug_smp_processor_id(void) { - unsigned long preempt_count = preempt_count(); int this_cpu = raw_smp_processor_id(); - if (likely(preempt_count)) + if (likely(preempt_count())) goto out; if (irqs_disabled()) diff --git a/mm/Kconfig b/mm/Kconfig index 026771a9b097..394838f489eb 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -183,7 +183,7 @@ config MEMORY_HOTPLUG_SPARSE config MEMORY_HOTREMOVE bool "Allow for memory hot remove" select MEMORY_ISOLATION - select HAVE_BOOTMEM_INFO_NODE if X86_64 + select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64) depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION diff --git a/mm/bounce.c b/mm/bounce.c index c9f0a4339a7d..5a7d58fb883b 100644 --- a/mm/bounce.c +++ b/mm/bounce.c @@ -204,6 +204,8 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig, struct bio_vec *to, *from; unsigned i; + if (force) + goto bounce; bio_for_each_segment(from, *bio_orig, i) if (page_to_pfn(from->bv_page) > queue_bounce_pfn(q)) goto bounce; diff --git a/mm/compaction.c b/mm/compaction.c index c43789388cd8..b5326b141a25 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -677,6 +677,13 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * This can iterate a massively long zone without finding any + * suitable migration targets, so periodically check if we need + * to schedule. + */ + cond_resched(); + if (!pfn_valid(pfn)) continue; diff --git a/mm/filemap.c b/mm/filemap.c index 1e6aec4a2d2e..ae4846ff4849 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1616,7 +1616,6 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; struct page *page; - bool memcg_oom; pgoff_t size; int ret = 0; @@ -1625,11 +1624,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) return VM_FAULT_SIGBUS; /* - * Do we have something in the page cache already? Either - * way, try readahead, but disable the memcg OOM killer for it - * as readahead is optional and no errors are propagated up - * the fault stack. The OOM killer is enabled while trying to - * instantiate the faulting page individually below. + * Do we have something in the page cache already? */ page = find_get_page(mapping, offset); if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) { @@ -1637,14 +1632,10 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) * We found the page, so try async readahead before * waiting for the lock. */ - memcg_oom = mem_cgroup_toggle_oom(false); do_async_mmap_readahead(vma, ra, file, page, offset); - mem_cgroup_toggle_oom(memcg_oom); } else if (!page) { /* No page in the page cache at all */ - memcg_oom = mem_cgroup_toggle_oom(false); do_sync_mmap_readahead(vma, ra, file, offset); - mem_cgroup_toggle_oom(memcg_oom); count_vm_event(PGMAJFAULT); mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7489884682d8..2612f60f53ee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1278,64 +1278,105 @@ out: int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pmd_t pmd, pmd_t *pmdp) { + struct anon_vma *anon_vma = NULL; struct page *page; unsigned long haddr = addr & HPAGE_PMD_MASK; - int target_nid; - int current_nid = -1; - bool migrated; + int page_nid = -1, this_nid = numa_node_id(); + int target_nid, last_cpupid = -1; + bool page_locked; + bool migrated = false; + int flags = 0; spin_lock(&mm->page_table_lock); if (unlikely(!pmd_same(pmd, *pmdp))) goto out_unlock; page = pmd_page(pmd); - get_page(page); - current_nid = page_to_nid(page); + BUG_ON(is_huge_zero_page(page)); + page_nid = page_to_nid(page); + last_cpupid = page_cpupid_last(page); count_vm_numa_event(NUMA_HINT_FAULTS); - if (current_nid == numa_node_id()) + if (page_nid == this_nid) { count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); + flags |= TNF_FAULT_LOCAL; + } + + /* + * Avoid grouping on DSO/COW pages in specific and RO pages + * in general, RO pages shouldn't hurt as much anyway since + * they can be in shared cache state. + */ + if (!pmd_write(pmd)) + flags |= TNF_NO_GROUP; + /* + * Acquire the page lock to serialise THP migrations but avoid dropping + * page_table_lock if at all possible + */ + page_locked = trylock_page(page); target_nid = mpol_misplaced(page, vma, haddr); if (target_nid == -1) { - put_page(page); - goto clear_pmdnuma; + /* If the page was locked, there are no parallel migrations */ + if (page_locked) + goto clear_pmdnuma; + + /* + * Otherwise wait for potential migrations and retry. We do + * relock and check_same as the page may no longer be mapped. + * As the fault is being retried, do not account for it. + */ + spin_unlock(&mm->page_table_lock); + wait_on_page_locked(page); + page_nid = -1; + goto out; } - /* Acquire the page lock to serialise THP migrations */ + /* Page is misplaced, serialise migrations and parallel THP splits */ + get_page(page); spin_unlock(&mm->page_table_lock); - lock_page(page); + if (!page_locked) + lock_page(page); + anon_vma = page_lock_anon_vma_read(page); - /* Confirm the PTE did not while locked */ + /* Confirm the PMD did not change while page_table_lock was released */ spin_lock(&mm->page_table_lock); if (unlikely(!pmd_same(pmd, *pmdp))) { unlock_page(page); put_page(page); + page_nid = -1; goto out_unlock; } - spin_unlock(&mm->page_table_lock); - /* Migrate the THP to the requested node */ + /* + * Migrate the THP to the requested node, returns with page unlocked + * and pmd_numa cleared. + */ + spin_unlock(&mm->page_table_lock); migrated = migrate_misplaced_transhuge_page(mm, vma, pmdp, pmd, addr, page, target_nid); - if (!migrated) - goto check_same; - - task_numa_fault(target_nid, HPAGE_PMD_NR, true); - return 0; + if (migrated) { + flags |= TNF_MIGRATED; + page_nid = target_nid; + } -check_same: - spin_lock(&mm->page_table_lock); - if (unlikely(!pmd_same(pmd, *pmdp))) - goto out_unlock; + goto out; clear_pmdnuma: + BUG_ON(!PageLocked(page)); pmd = pmd_mknonnuma(pmd); set_pmd_at(mm, haddr, pmdp, pmd); VM_BUG_ON(pmd_numa(*pmdp)); update_mmu_cache_pmd(vma, addr, pmdp); + unlock_page(page); out_unlock: spin_unlock(&mm->page_table_lock); - if (current_nid != -1) - task_numa_fault(current_nid, HPAGE_PMD_NR, false); + +out: + if (anon_vma) + page_unlock_anon_vma_read(anon_vma); + + if (page_nid != -1) + task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags); + return 0; } @@ -1432,6 +1473,12 @@ out: return ret; } +/* + * Returns + * - 0 if PMD could not be locked + * - 1 if PMD was locked but protections unchange and TLB flush unnecessary + * - HPAGE_PMD_NR is protections changed and TLB flush necessary + */ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, int prot_numa) { @@ -1440,22 +1487,34 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, if (__pmd_trans_huge_lock(pmd, vma) == 1) { pmd_t entry; - entry = pmdp_get_and_clear(mm, addr, pmd); + ret = 1; if (!prot_numa) { + entry = pmdp_get_and_clear(mm, addr, pmd); entry = pmd_modify(entry, newprot); + ret = HPAGE_PMD_NR; BUG_ON(pmd_write(entry)); } else { struct page *page = pmd_page(*pmd); - /* only check non-shared pages */ - if (page_mapcount(page) == 1 && + /* + * Do not trap faults against the zero page. The + * read-only data is likely to be read-cached on the + * local CPU cache and it is less useful to know about + * local vs remote hits on the zero page. + */ + if (!is_huge_zero_page(page) && !pmd_numa(*pmd)) { + entry = pmdp_get_and_clear(mm, addr, pmd); entry = pmd_mknuma(entry); + ret = HPAGE_PMD_NR; } } - set_pmd_at(mm, addr, pmd, entry); + + /* Set PMD if cleared earlier */ + if (ret == HPAGE_PMD_NR) + set_pmd_at(mm, addr, pmd, entry); + spin_unlock(&vma->vm_mm->page_table_lock); - ret = 1; } return ret; @@ -1636,7 +1695,7 @@ static void __split_huge_page_refcount(struct page *page, page_tail->mapping = page->mapping; page_tail->index = page->index + i; - page_nid_xchg_last(page_tail, page_nid_last(page)); + page_cpupid_xchg_last(page_tail, page_cpupid_last(page)); BUG_ON(!PageAnon(page_tail)); BUG_ON(!PageUptodate(page_tail)); @@ -2697,6 +2756,7 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address, mmun_start = haddr; mmun_end = haddr + HPAGE_PMD_SIZE; +again: mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); spin_lock(&mm->page_table_lock); if (unlikely(!pmd_trans_huge(*pmd))) { @@ -2719,7 +2779,14 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address, split_huge_page(page); put_page(page); - BUG_ON(pmd_trans_huge(*pmd)); + + /* + * We don't always have down_write of mmap_sem here: a racing + * do_huge_pmd_wp_page() might have copied-on-write to another + * huge page before our split_huge_page() got the anon_vma lock. + */ + if (unlikely(pmd_trans_huge(*pmd))) + goto again; } void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b49579c7f2a5..0b7656e804d1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -653,6 +653,7 @@ static void free_huge_page(struct page *page) BUG_ON(page_count(page)); BUG_ON(page_mapcount(page)); restore_reserve = PagePrivate(page); + ClearPagePrivate(page); spin_lock(&hugetlb_lock); hugetlb_cgroup_uncharge_page(hstate_index(h), @@ -695,8 +696,22 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order) /* we rely on prep_new_huge_page to set the destructor */ set_compound_order(page, order); __SetPageHead(page); + __ClearPageReserved(page); for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) { __SetPageTail(p); + /* + * For gigantic hugepages allocated through bootmem at + * boot, it's safer to be consistent with the not-gigantic + * hugepages and clear the PG_reserved bit from all tail pages + * too. Otherwse drivers using get_user_pages() to access tail + * pages may get the reference counting wrong if they see + * PG_reserved set on a tail page (despite the head page not + * having PG_reserved set). Enforcing this consistency between + * head and tail pages allows drivers to optimize away a check + * on the head page when they need know if put_page() is needed + * after get_user_pages(). + */ + __ClearPageReserved(p); set_page_count(p, 0); p->first_page = page; } @@ -1329,9 +1344,9 @@ static void __init gather_bootmem_prealloc(void) #else page = virt_to_page(m); #endif - __ClearPageReserved(page); WARN_ON(page_count(page) != 1); prep_compound_huge_page(page, h->order); + WARN_ON(PageReserved(page)); prep_new_huge_page(h, page, page_to_nid(page)); /* * If we had gigantic hugepages allocated at boot time, we need diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index afc2daa91c60..4c84678371eb 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -20,8 +20,6 @@ static int hwpoison_inject(void *data, u64 val) if (!capable(CAP_SYS_ADMIN)) return -EPERM; - if (!hwpoison_filter_enable) - goto inject; if (!pfn_valid(pfn)) return -ENXIO; @@ -33,6 +31,9 @@ static int hwpoison_inject(void *data, u64 val) if (!get_page_unless_zero(hpage)) return 0; + if (!hwpoison_filter_enable) + goto inject; + if (!PageLRU(p) && !PageHuge(p)) shake_page(p, 0); /* diff --git a/mm/madvise.c b/mm/madvise.c index 6975bc812542..539eeb96b323 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma, */ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) { + struct page *p; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - for (; start < end; start += PAGE_SIZE) { - struct page *p; + for (; start < end; start += PAGE_SIZE << + compound_order(compound_head(p))) { int ret; ret = get_user_pages_fast(start, 1, 0, &p); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c52ddbc839b..34d3ca9572d6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -866,6 +866,7 @@ static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, unsigned long val = 0; int cpu; + get_online_cpus(); for_each_online_cpu(cpu) val += per_cpu(memcg->stat->events[idx], cpu); #ifdef CONFIG_HOTPLUG_CPU @@ -873,6 +874,7 @@ static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, val += memcg->nocpu_base.events[idx]; spin_unlock(&memcg->pcp_counter_lock); #endif + put_online_cpus(); return val; } @@ -2159,110 +2161,59 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) memcg_wakeup_oom(memcg); } -/* - * try to call OOM killer - */ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { - bool locked; - int wakeups; - if (!current->memcg_oom.may_oom) return; - - current->memcg_oom.in_memcg_oom = 1; - /* - * As with any blocking lock, a contender needs to start - * listening for wakeups before attempting the trylock, - * otherwise it can miss the wakeup from the unlock and sleep - * indefinitely. This is just open-coded because our locking - * is so particular to memcg hierarchies. + * We are in the middle of the charge context here, so we + * don't want to block when potentially sitting on a callstack + * that holds all kinds of filesystem and mm locks. + * + * Also, the caller may handle a failed allocation gracefully + * (like optional page cache readahead) and so an OOM killer + * invocation might not even be necessary. + * + * That's why we don't do anything here except remember the + * OOM context and then deal with it at the end of the page + * fault when the stack is unwound, the locks are released, + * and when we know whether the fault was overall successful. */ - wakeups = atomic_read(&memcg->oom_wakeups); - mem_cgroup_mark_under_oom(memcg); - - locked = mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - if (locked && !memcg->oom_kill_disable) { - mem_cgroup_unmark_under_oom(memcg); - mem_cgroup_out_of_memory(memcg, mask, order); - mem_cgroup_oom_unlock(memcg); - /* - * There is no guarantee that an OOM-lock contender - * sees the wakeups triggered by the OOM kill - * uncharges. Wake any sleepers explicitely. - */ - memcg_oom_recover(memcg); - } else { - /* - * A system call can just return -ENOMEM, but if this - * is a page fault and somebody else is handling the - * OOM already, we need to sleep on the OOM waitqueue - * for this memcg until the situation is resolved. - * Which can take some time because it might be - * handled by a userspace task. - * - * However, this is the charge context, which means - * that we may sit on a large call stack and hold - * various filesystem locks, the mmap_sem etc. and we - * don't want the OOM handler to deadlock on them - * while we sit here and wait. Store the current OOM - * context in the task_struct, then return -ENOMEM. - * At the end of the page fault handler, with the - * stack unwound, pagefault_out_of_memory() will check - * back with us by calling - * mem_cgroup_oom_synchronize(), possibly putting the - * task to sleep. - */ - current->memcg_oom.oom_locked = locked; - current->memcg_oom.wakeups = wakeups; - css_get(&memcg->css); - current->memcg_oom.wait_on_memcg = memcg; - } + css_get(&memcg->css); + current->memcg_oom.memcg = memcg; + current->memcg_oom.gfp_mask = mask; + current->memcg_oom.order = order; } /** * mem_cgroup_oom_synchronize - complete memcg OOM handling + * @handle: actually kill/wait or just clean up the OOM state * - * This has to be called at the end of a page fault if the the memcg - * OOM handler was enabled and the fault is returning %VM_FAULT_OOM. + * This has to be called at the end of a page fault if the memcg OOM + * handler was enabled. * - * Memcg supports userspace OOM handling, so failed allocations must + * Memcg supports userspace OOM handling where failed allocations must * sleep on a waitqueue until the userspace task resolves the * situation. Sleeping directly in the charge context with all kinds * of locks held is not a good idea, instead we remember an OOM state * in the task and mem_cgroup_oom_synchronize() has to be called at - * the end of the page fault to put the task to sleep and clean up the - * OOM state. + * the end of the page fault to complete the OOM handling. * * Returns %true if an ongoing memcg OOM situation was detected and - * finalized, %false otherwise. + * completed, %false otherwise. */ -bool mem_cgroup_oom_synchronize(void) +bool mem_cgroup_oom_synchronize(bool handle) { + struct mem_cgroup *memcg = current->memcg_oom.memcg; struct oom_wait_info owait; - struct mem_cgroup *memcg; + bool locked; /* OOM is global, do not handle */ - if (!current->memcg_oom.in_memcg_oom) - return false; - - /* - * We invoked the OOM killer but there is a chance that a kill - * did not free up any charges. Everybody else might already - * be sleeping, so restart the fault and keep the rampage - * going until some charges are released. - */ - memcg = current->memcg_oom.wait_on_memcg; if (!memcg) - goto out; + return false; - if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current)) - goto out_memcg; + if (!handle) + goto cleanup; owait.memcg = memcg; owait.wait.flags = 0; @@ -2271,13 +2222,25 @@ bool mem_cgroup_oom_synchronize(void) INIT_LIST_HEAD(&owait.wait.task_list); prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); - /* Only sleep if we didn't miss any wakeups since OOM */ - if (atomic_read(&memcg->oom_wakeups) == current->memcg_oom.wakeups) + mem_cgroup_mark_under_oom(memcg); + + locked = mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + if (locked && !memcg->oom_kill_disable) { + mem_cgroup_unmark_under_oom(memcg); + finish_wait(&memcg_oom_waitq, &owait.wait); + mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask, + current->memcg_oom.order); + } else { schedule(); - finish_wait(&memcg_oom_waitq, &owait.wait); -out_memcg: - mem_cgroup_unmark_under_oom(memcg); - if (current->memcg_oom.oom_locked) { + mem_cgroup_unmark_under_oom(memcg); + finish_wait(&memcg_oom_waitq, &owait.wait); + } + + if (locked) { mem_cgroup_oom_unlock(memcg); /* * There is no guarantee that an OOM-lock contender @@ -2286,10 +2249,9 @@ out_memcg: */ memcg_oom_recover(memcg); } +cleanup: + current->memcg_oom.memcg = NULL; css_put(&memcg->css); - current->memcg_oom.wait_on_memcg = NULL; -out: - current->memcg_oom.in_memcg_oom = 0; return true; } @@ -2703,6 +2665,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, || fatal_signal_pending(current))) goto bypass; + if (unlikely(task_in_memcg_oom(current))) + goto bypass; + /* * We always charge the cgroup the mm_struct belongs to. * The mm_struct's mem_cgroup changes on task migration if the @@ -2801,6 +2766,8 @@ done: return 0; nomem: *ptr = NULL; + if (gfp_mask & __GFP_NOFAIL) + return 0; return -ENOMEM; bypass: *ptr = root_mem_cgroup; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 947ed5413279..bf3351b5115e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1114,8 +1114,10 @@ int memory_failure(unsigned long pfn, int trapno, int flags) * shake_page could have turned it free. */ if (is_free_buddy_page(p)) { - action_result(pfn, "free buddy, 2nd try", - DELAYED); + if (flags & MF_COUNT_INCREASED) + action_result(pfn, "free buddy", DELAYED); + else + action_result(pfn, "free buddy, 2nd try", DELAYED); return 0; } action_result(pfn, "non LRU", IGNORED); @@ -1349,7 +1351,7 @@ int unpoison_memory(unsigned long pfn) * worked by memory_failure() and the page lock is not held yet. * In such case, we yield to memory_failure() and make unpoison fail. */ - if (PageTransHuge(page)) { + if (!PageHuge(page) && PageTransHuge(page)) { pr_info("MCE: Memory failure is now running on %#lx\n", pfn); return 0; } diff --git a/mm/memory.c b/mm/memory.c index ca0003947115..1f2287eaa88e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -69,8 +69,8 @@ #include "internal.h" -#ifdef LAST_NID_NOT_IN_PAGE_FLAGS -#warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_nid. +#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS +#warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. #endif #ifndef CONFIG_NEED_MULTIPLE_NODES @@ -837,6 +837,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, */ make_migration_entry_read(&entry); pte = swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(*src_pte)) + pte = pte_swp_mksoft_dirty(pte); set_pte_at(src_mm, addr, src_pte, pte); } } @@ -2719,6 +2721,14 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, get_page(dirty_page); reuse: + /* + * Clear the pages cpupid information as the existing + * information potentially belongs to a now completely + * unrelated process. + */ + if (old_page) + page_cpupid_xchg_last(old_page, (1 << LAST_CPUPID_SHIFT) - 1); + flush_cache_page(vma, address, pte_pfn(orig_pte)); entry = pte_mkyoung(orig_pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); @@ -3519,13 +3529,16 @@ static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma, } int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, - unsigned long addr, int current_nid) + unsigned long addr, int page_nid, + int *flags) { get_page(page); count_vm_numa_event(NUMA_HINT_FAULTS); - if (current_nid == numa_node_id()) + if (page_nid == numa_node_id()) { count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); + *flags |= TNF_FAULT_LOCAL; + } return mpol_misplaced(page, vma, addr); } @@ -3535,9 +3548,11 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, { struct page *page = NULL; spinlock_t *ptl; - int current_nid = -1; + int page_nid = -1; + int last_cpupid; int target_nid; bool migrated = false; + int flags = 0; /* * The "pte" at this point cannot be used safely without @@ -3564,123 +3579,44 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, pte_unmap_unlock(ptep, ptl); return 0; } + BUG_ON(is_zero_pfn(page_to_pfn(page))); + + /* + * Avoid grouping on DSO/COW pages in specific and RO pages + * in general, RO pages shouldn't hurt as much anyway since + * they can be in shared cache state. + */ + if (!pte_write(pte)) + flags |= TNF_NO_GROUP; - current_nid = page_to_nid(page); - target_nid = numa_migrate_prep(page, vma, addr, current_nid); + /* + * Flag if the page is shared between multiple address spaces. This + * is later used when determining whether to group tasks together + */ + if (page_mapcount(page) > 1 && (vma->vm_flags & VM_SHARED)) + flags |= TNF_SHARED; + + last_cpupid = page_cpupid_last(page); + page_nid = page_to_nid(page); + target_nid = numa_migrate_prep(page, vma, addr, page_nid, &flags); pte_unmap_unlock(ptep, ptl); if (target_nid == -1) { - /* - * Account for the fault against the current node if it not - * being replaced regardless of where the page is located. - */ - current_nid = numa_node_id(); put_page(page); goto out; } /* Migrate to the requested node */ - migrated = migrate_misplaced_page(page, target_nid); - if (migrated) - current_nid = target_nid; - -out: - if (current_nid != -1) - task_numa_fault(current_nid, 1, migrated); - return 0; -} - -/* NUMA hinting page fault entry point for regular pmds */ -#ifdef CONFIG_NUMA_BALANCING -static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - pte_t *pte, *orig_pte; - unsigned long _addr = addr & PMD_MASK; - unsigned long offset; - spinlock_t *ptl; - bool numa = false; - int local_nid = numa_node_id(); - - spin_lock(&mm->page_table_lock); - pmd = *pmdp; - if (pmd_numa(pmd)) { - set_pmd_at(mm, _addr, pmdp, pmd_mknonnuma(pmd)); - numa = true; + migrated = migrate_misplaced_page(page, vma, target_nid); + if (migrated) { + page_nid = target_nid; + flags |= TNF_MIGRATED; } - spin_unlock(&mm->page_table_lock); - if (!numa) - return 0; - - /* we're in a page fault so some vma must be in the range */ - BUG_ON(!vma); - BUG_ON(vma->vm_start >= _addr + PMD_SIZE); - offset = max(_addr, vma->vm_start) & ~PMD_MASK; - VM_BUG_ON(offset >= PMD_SIZE); - orig_pte = pte = pte_offset_map_lock(mm, pmdp, _addr, &ptl); - pte += offset >> PAGE_SHIFT; - for (addr = _addr + offset; addr < _addr + PMD_SIZE; pte++, addr += PAGE_SIZE) { - pte_t pteval = *pte; - struct page *page; - int curr_nid = local_nid; - int target_nid; - bool migrated; - if (!pte_present(pteval)) - continue; - if (!pte_numa(pteval)) - continue; - if (addr >= vma->vm_end) { - vma = find_vma(mm, addr); - /* there's a pte present so there must be a vma */ - BUG_ON(!vma); - BUG_ON(addr < vma->vm_start); - } - if (pte_numa(pteval)) { - pteval = pte_mknonnuma(pteval); - set_pte_at(mm, addr, pte, pteval); - } - page = vm_normal_page(vma, addr, pteval); - if (unlikely(!page)) - continue; - /* only check non-shared pages */ - if (unlikely(page_mapcount(page) != 1)) - continue; - - /* - * Note that the NUMA fault is later accounted to either - * the node that is currently running or where the page is - * migrated to. - */ - curr_nid = local_nid; - target_nid = numa_migrate_prep(page, vma, addr, - page_to_nid(page)); - if (target_nid == -1) { - put_page(page); - continue; - } - - /* Migrate to the requested node */ - pte_unmap_unlock(pte, ptl); - migrated = migrate_misplaced_page(page, target_nid); - if (migrated) - curr_nid = target_nid; - task_numa_fault(curr_nid, 1, migrated); - - pte = pte_offset_map_lock(mm, pmdp, addr, &ptl); - } - pte_unmap_unlock(orig_pte, ptl); - - return 0; -} -#else -static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - BUG(); +out: + if (page_nid != -1) + task_numa_fault(last_cpupid, page_nid, 1, flags); return 0; } -#endif /* CONFIG_NUMA_BALANCING */ /* * These routines also need to handle stuff like marking pages dirty @@ -3820,8 +3756,8 @@ retry: } } - if (pmd_numa(*pmd)) - return do_pmd_numa_page(mm, vma, address, pmd); + /* THP should already have been handled */ + BUG_ON(pmd_numa(*pmd)); /* * Use __pte_alloc instead of pte_alloc_map, because we can't @@ -3863,15 +3799,21 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, * space. Kernel faults are handled more gracefully. */ if (flags & FAULT_FLAG_USER) - mem_cgroup_enable_oom(); + mem_cgroup_oom_enable(); ret = __handle_mm_fault(mm, vma, address, flags); - if (flags & FAULT_FLAG_USER) - mem_cgroup_disable_oom(); - - if (WARN_ON(task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))) - mem_cgroup_oom_synchronize(); + if (flags & FAULT_FLAG_USER) { + mem_cgroup_oom_disable(); + /* + * The task may have entered a memcg OOM situation but + * if the allocation error was handled gracefully (no + * VM_FAULT_OOM), there is no need to kill anything. + * Just clean up the OOM state peacefully. + */ + if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM)) + mem_cgroup_oom_synchronize(false); + } return ret; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 04729647f359..71cb253368cb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1679,6 +1679,30 @@ struct mempolicy *get_vma_policy(struct task_struct *task, return pol; } +bool vma_policy_mof(struct task_struct *task, struct vm_area_struct *vma) +{ + struct mempolicy *pol = get_task_policy(task); + if (vma) { + if (vma->vm_ops && vma->vm_ops->get_policy) { + bool ret = false; + + pol = vma->vm_ops->get_policy(vma, vma->vm_start); + if (pol && (pol->flags & MPOL_F_MOF)) + ret = true; + mpol_cond_put(pol); + + return ret; + } else if (vma->vm_policy) { + pol = vma->vm_policy; + } + } + + if (!pol) + return default_policy.flags & MPOL_F_MOF; + + return pol->flags & MPOL_F_MOF; +} + static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) { enum zone_type dynamic_policy_zone = policy_zone; @@ -2277,6 +2301,35 @@ static void sp_free(struct sp_node *n) kmem_cache_free(sn_cache, n); } +#ifdef CONFIG_NUMA_BALANCING +static bool numa_migrate_deferred(struct task_struct *p, int last_cpupid) +{ + /* Never defer a private fault */ + if (cpupid_match_pid(p, last_cpupid)) + return false; + + if (p->numa_migrate_deferred) { + p->numa_migrate_deferred--; + return true; + } + return false; +} + +static inline void defer_numa_migrate(struct task_struct *p) +{ + p->numa_migrate_deferred = sysctl_numa_balancing_migrate_deferred; +} +#else +static inline bool numa_migrate_deferred(struct task_struct *p, int last_cpupid) +{ + return false; +} + +static inline void defer_numa_migrate(struct task_struct *p) +{ +} +#endif /* CONFIG_NUMA_BALANCING */ + /** * mpol_misplaced - check whether current page node is valid in policy * @@ -2300,6 +2353,8 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long struct zone *zone; int curnid = page_to_nid(page); unsigned long pgoff; + int thiscpu = raw_smp_processor_id(); + int thisnid = cpu_to_node(thiscpu); int polnid = -1; int ret = -1; @@ -2348,9 +2403,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { - int last_nid; + int last_cpupid; + int this_cpupid; - polnid = numa_node_id(); + polnid = thisnid; + this_cpupid = cpu_pid_to_cpupid(thiscpu, current->pid); /* * Multi-stage node selection is used in conjunction @@ -2373,8 +2430,25 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long * it less likely we act on an unlikely task<->page * relation. */ - last_nid = page_nid_xchg_last(page, polnid); - if (last_nid != polnid) + last_cpupid = page_cpupid_xchg_last(page, this_cpupid); + if (!cpupid_pid_unset(last_cpupid) && cpupid_to_nid(last_cpupid) != thisnid) { + + /* See sysctl_numa_balancing_migrate_deferred comment */ + if (!cpupid_match_pid(current, last_cpupid)) + defer_numa_migrate(current); + + goto out; + } + + /* + * The quadratic filter above reduces extraneous migration + * of shared pages somewhat. This code reduces it even more, + * reducing the overhead of page migrations of shared pages. + * This makes workloads with shared pages rely more on + * "move task near its memory", and less on "move memory + * towards its task", which is exactly what we want. + */ + if (numa_migrate_deferred(current, last_cpupid)) goto out; } diff --git a/mm/migrate.c b/mm/migrate.c index 9c8d5f59d30b..dfc8300ecbb2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -107,7 +107,7 @@ void putback_movable_pages(struct list_head *l) list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); - if (unlikely(balloon_page_movable(page))) + if (unlikely(isolated_balloon_page(page))) balloon_page_putback(page); else putback_lru_page(page); @@ -161,6 +161,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma, get_page(new); pte = pte_mkold(mk_pte(new, vma->vm_page_prot)); + if (pte_swp_soft_dirty(*ptep)) + pte = pte_mksoft_dirty(pte); if (is_write_migration_entry(entry)) pte = pte_mkwrite(pte); #ifdef CONFIG_HUGETLB_PAGE @@ -443,6 +445,8 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, */ void migrate_page_copy(struct page *newpage, struct page *page) { + int cpupid; + if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); else @@ -479,6 +483,13 @@ void migrate_page_copy(struct page *newpage, struct page *page) __set_page_dirty_nobuffers(newpage); } + /* + * Copy NUMA information to the new page, to prevent over-eager + * future migrations of this same page. + */ + cpupid = page_cpupid_xchg_last(page, -1); + page_cpupid_xchg_last(newpage, cpupid); + mlock_migrate_page(newpage, page); ksm_migrate_page(newpage, page); /* @@ -1498,7 +1509,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page, __GFP_NOWARN) & ~GFP_IOFS, 0); if (newpage) - page_nid_xchg_last(newpage, page_nid_last(page)); + page_cpupid_xchg_last(newpage, page_cpupid_last(page)); return newpage; } @@ -1599,7 +1610,8 @@ int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) * node. Caller is expected to have an elevated reference count on * the page that will be dropped by this function before returning. */ -int migrate_misplaced_page(struct page *page, int node) +int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, + int node) { pg_data_t *pgdat = NODE_DATA(node); int isolated; @@ -1607,10 +1619,11 @@ int migrate_misplaced_page(struct page *page, int node) LIST_HEAD(migratepages); /* - * Don't migrate pages that are mapped in multiple processes. - * TODO: Handle false sharing detection instead of this hammer + * Don't migrate file pages that are mapped in multiple processes + * with execute permissions as they are probably shared libraries. */ - if (page_mapcount(page) != 1) + if (page_mapcount(page) != 1 && page_is_file_cache(page) && + (vma->vm_flags & VM_EXEC)) goto out; /* @@ -1660,13 +1673,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, struct mem_cgroup *memcg = NULL; int page_lru = page_is_file_cache(page); - /* - * Don't migrate pages that are mapped in multiple processes. - * TODO: Handle false sharing detection instead of this hammer - */ - if (page_mapcount(page) != 1) - goto out_dropref; - /* * Rate-limit the amount of data that is being migrated to a node. * Optimal placement is no good if the memory bus is saturated and @@ -1680,7 +1686,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, if (!new_page) goto out_fail; - page_nid_xchg_last(new_page, page_nid_last(page)); + page_cpupid_xchg_last(new_page, page_cpupid_last(page)); isolated = numamigrate_isolate_page(pgdat, page); if (!isolated) { @@ -1713,12 +1719,12 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, unlock_page(new_page); put_page(new_page); /* Free it */ - unlock_page(page); + /* Retake the callers reference and putback on LRU */ + get_page(page); putback_lru_page(page); - - count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); - isolated = 0; - goto out; + mod_zone_page_state(page_zone(page), + NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); + goto out_fail; } /* @@ -1735,9 +1741,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); entry = pmd_mkhuge(entry); - page_add_new_anon_rmap(new_page, vma, haddr); - + pmdp_clear_flush(vma, haddr, pmd); set_pmd_at(mm, haddr, pmd, entry); + page_add_new_anon_rmap(new_page, vma, haddr); update_mmu_cache_pmd(vma, address, &entry); page_remove_rmap(page); /* @@ -1756,7 +1762,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); -out: mod_zone_page_state(page_zone(page), NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); @@ -1765,6 +1770,10 @@ out: out_fail: count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); out_dropref: + entry = pmd_mknonnuma(entry); + set_pmd_at(mm, haddr, pmd, entry); + update_mmu_cache_pmd(vma, address, &entry); + unlock_page(page); put_page(page); return 0; diff --git a/mm/mlock.c b/mm/mlock.c index 67ba6da7d0e3..d480cd6fc475 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -379,10 +379,14 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, /* * Initialize pte walk starting at the already pinned page where we - * are sure that there is a pte. + * are sure that there is a pte, as it was pinned under the same + * mmap_sem write op. */ pte = get_locked_pte(vma->vm_mm, start, &ptl); - end = min(end, pmd_addr_end(start, end)); + /* Make sure we do not cross the page table boundary */ + end = pgd_addr_end(start, end); + end = pud_addr_end(start, end); + end = pmd_addr_end(start, end); /* The page next to the pinned page is the first we will try to get */ start += PAGE_SIZE; diff --git a/mm/mm_init.c b/mm/mm_init.c index 633c08863fd8..68562e92d50c 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -71,26 +71,26 @@ void __init mminit_verify_pageflags_layout(void) unsigned long or_mask, add_mask; shift = 8 * sizeof(unsigned long); - width = shift - SECTIONS_WIDTH - NODES_WIDTH - ZONES_WIDTH - LAST_NID_SHIFT; + width = shift - SECTIONS_WIDTH - NODES_WIDTH - ZONES_WIDTH - LAST_CPUPID_SHIFT; mminit_dprintk(MMINIT_TRACE, "pageflags_layout_widths", - "Section %d Node %d Zone %d Lastnid %d Flags %d\n", + "Section %d Node %d Zone %d Lastcpupid %d Flags %d\n", SECTIONS_WIDTH, NODES_WIDTH, ZONES_WIDTH, - LAST_NID_WIDTH, + LAST_CPUPID_WIDTH, NR_PAGEFLAGS); mminit_dprintk(MMINIT_TRACE, "pageflags_layout_shifts", - "Section %d Node %d Zone %d Lastnid %d\n", + "Section %d Node %d Zone %d Lastcpupid %d\n", SECTIONS_SHIFT, NODES_SHIFT, ZONES_SHIFT, - LAST_NID_SHIFT); + LAST_CPUPID_SHIFT); mminit_dprintk(MMINIT_TRACE, "pageflags_layout_pgshifts", - "Section %lu Node %lu Zone %lu Lastnid %lu\n", + "Section %lu Node %lu Zone %lu Lastcpupid %lu\n", (unsigned long)SECTIONS_PGSHIFT, (unsigned long)NODES_PGSHIFT, (unsigned long)ZONES_PGSHIFT, - (unsigned long)LAST_NID_PGSHIFT); + (unsigned long)LAST_CPUPID_PGSHIFT); mminit_dprintk(MMINIT_TRACE, "pageflags_layout_nodezoneid", "Node/Zone ID: %lu -> %lu\n", (unsigned long)(ZONEID_PGOFF + ZONEID_SHIFT), @@ -102,9 +102,9 @@ void __init mminit_verify_pageflags_layout(void) mminit_dprintk(MMINIT_TRACE, "pageflags_layout_nodeflags", "Node not in page flags"); #endif -#ifdef LAST_NID_NOT_IN_PAGE_FLAGS +#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS mminit_dprintk(MMINIT_TRACE, "pageflags_layout_nodeflags", - "Last nid not in page flags"); + "Last cpupid not in page flags"); #endif if (SECTIONS_WIDTH) { diff --git a/mm/mmzone.c b/mm/mmzone.c index 2ac0afbd68f3..bf34fb8556db 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -97,20 +97,20 @@ void lruvec_init(struct lruvec *lruvec) INIT_LIST_HEAD(&lruvec->lists[lru]); } -#if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_NID_NOT_IN_PAGE_FLAGS) -int page_nid_xchg_last(struct page *page, int nid) +#if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) +int page_cpupid_xchg_last(struct page *page, int cpupid) { unsigned long old_flags, flags; - int last_nid; + int last_cpupid; do { old_flags = flags = page->flags; - last_nid = page_nid_last(page); + last_cpupid = page_cpupid_last(page); - flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT); - flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT; + flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT); + flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT; } while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags)); - return last_nid; + return last_cpupid; } #endif diff --git a/mm/mprotect.c b/mm/mprotect.c index 94722a4d6b43..a597f2ffcd6f 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -37,14 +37,12 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot) static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, - int dirty_accountable, int prot_numa, bool *ret_all_same_node) + int dirty_accountable, int prot_numa) { struct mm_struct *mm = vma->vm_mm; pte_t *pte, oldpte; spinlock_t *ptl; unsigned long pages = 0; - bool all_same_node = true; - int last_nid = -1; pte = pte_offset_map_lock(mm, pmd, addr, &ptl); arch_enter_lazy_mmu_mode(); @@ -63,15 +61,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, page = vm_normal_page(vma, addr, oldpte); if (page) { - int this_nid = page_to_nid(page); - if (last_nid == -1) - last_nid = this_nid; - if (last_nid != this_nid) - all_same_node = false; - - /* only check non-shared pages */ - if (!pte_numa(oldpte) && - page_mapcount(page) == 1) { + if (!pte_numa(oldpte)) { ptent = pte_mknuma(ptent); updated = true; } @@ -94,40 +84,27 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, swp_entry_t entry = pte_to_swp_entry(oldpte); if (is_write_migration_entry(entry)) { + pte_t newpte; /* * A protection check is difficult so * just be safe and disable write */ make_migration_entry_read(&entry); - set_pte_at(mm, addr, pte, - swp_entry_to_pte(entry)); + newpte = swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(oldpte)) + newpte = pte_swp_mksoft_dirty(newpte); + set_pte_at(mm, addr, pte, newpte); + + pages++; } - pages++; } } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - *ret_all_same_node = all_same_node; return pages; } -#ifdef CONFIG_NUMA_BALANCING -static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr, - pmd_t *pmd) -{ - spin_lock(&mm->page_table_lock); - set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd)); - spin_unlock(&mm->page_table_lock); -} -#else -static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr, - pmd_t *pmd) -{ - BUG(); -} -#endif /* CONFIG_NUMA_BALANCING */ - static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, int dirty_accountable, int prot_numa) @@ -135,34 +112,33 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pmd_t *pmd; unsigned long next; unsigned long pages = 0; - bool all_same_node; pmd = pmd_offset(pud, addr); do { + unsigned long this_pages; + next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) split_huge_page_pmd(vma, addr, pmd); - else if (change_huge_pmd(vma, pmd, addr, newprot, - prot_numa)) { - pages += HPAGE_PMD_NR; - continue; + else { + int nr_ptes = change_huge_pmd(vma, pmd, addr, + newprot, prot_numa); + + if (nr_ptes) { + if (nr_ptes == HPAGE_PMD_NR) + pages++; + + continue; + } } /* fall through */ } if (pmd_none_or_clear_bad(pmd)) continue; - pages += change_pte_range(vma, pmd, addr, next, newprot, - dirty_accountable, prot_numa, &all_same_node); - - /* - * If we are changing protections for NUMA hinting faults then - * set pmd_numa if the examined pages were all on the same - * node. This allows a regular PMD to be handled as one fault - * and effectively batches the taking of the PTL - */ - if (prot_numa && all_same_node) - change_pmd_protnuma(vma->vm_mm, addr, pmd); + this_pages = change_pte_range(vma, pmd, addr, next, newprot, + dirty_accountable, prot_numa); + pages += this_pages; } while (pmd++, addr = next, addr != end); return pages; diff --git a/mm/mremap.c b/mm/mremap.c index 91b13d6a16d4..0843feb66f3d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -25,7 +25,6 @@ #include #include #include -#include #include "internal.h" @@ -63,10 +62,8 @@ static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma, return NULL; pmd = pmd_alloc(mm, pud, addr); - if (!pmd) { - pud_free(mm, pud); + if (!pmd) return NULL; - } VM_BUG_ON(pmd_trans_huge(*pmd)); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 314e9d274381..6738c47f1f72 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -680,7 +680,7 @@ void pagefault_out_of_memory(void) { struct zonelist *zonelist; - if (mem_cgroup_oom_synchronize()) + if (mem_cgroup_oom_synchronize(true)) return; zonelist = node_zonelist(first_online_node, GFP_KERNEL); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f5236f804aa6..63807583d8e8 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1210,11 +1210,11 @@ static unsigned long dirty_poll_interval(unsigned long dirty, return 1; } -static long bdi_max_pause(struct backing_dev_info *bdi, - unsigned long bdi_dirty) +static unsigned long bdi_max_pause(struct backing_dev_info *bdi, + unsigned long bdi_dirty) { - long bw = bdi->avg_write_bandwidth; - long t; + unsigned long bw = bdi->avg_write_bandwidth; + unsigned long t; /* * Limit pause time for small memory systems. If sleeping for too long @@ -1226,7 +1226,7 @@ static long bdi_max_pause(struct backing_dev_info *bdi, t = bdi_dirty / (1 + bw / roundup_pow_of_two(1 + HZ / 8)); t++; - return min_t(long, t, MAX_PAUSE); + return min_t(unsigned long, t, MAX_PAUSE); } static long bdi_min_pause(struct backing_dev_info *bdi, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0ee638f76ebe..73d812f16dde 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -626,7 +626,7 @@ static inline int free_pages_check(struct page *page) bad_page(page); return 1; } - page_nid_reset_last(page); + page_cpupid_reset_last(page); if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; return 0; @@ -4015,7 +4015,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, mminit_verify_page_links(page, zone, nid, pfn); init_page_count(page); page_mapcount_reset(page); - page_nid_reset_last(page); + page_cpupid_reset_last(page); SetPageReserved(page); /* * Mark the block movable so that blocks are reserved for @@ -6366,10 +6366,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) list_del(&page->lru); rmv_page_order(page); zone->free_area[order].nr_free--; -#ifdef CONFIG_HIGHMEM - if (PageHighMem(page)) - totalhigh_pages -= 1 << order; -#endif for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); diff --git a/mm/slab_common.c b/mm/slab_common.c index a3443278ce3a..e2e98af703ea 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -56,6 +56,7 @@ static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name, continue; } +#if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON) /* * For simplicity, we won't check this in the list of memcg * caches. We have control over memcg naming, and if there @@ -69,6 +70,7 @@ static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name, s = NULL; return -EINVAL; } +#endif } WARN_ON(strchr(name, ' ')); /* It confuses parsers */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 3963fc24fcc1..de7c904e52e5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1824,6 +1824,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) struct filename *pathname; int i, type, prev; int err; + unsigned int old_block_size; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -1914,6 +1915,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) } swap_file = p->swap_file; + old_block_size = p->old_block_size; p->swap_file = NULL; p->max = 0; swap_map = p->swap_map; @@ -1938,7 +1940,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) inode = mapping->host; if (S_ISBLK(inode->i_mode)) { struct block_device *bdev = I_BDEV(inode); - set_blocksize(bdev, p->old_block_size); + set_blocksize(bdev, old_block_size); blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); } else { mutex_lock(&inode->i_mutex); diff --git a/mm/vmscan.c b/mm/vmscan.c index beb35778c69f..eea668d9cff6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -48,6 +48,7 @@ #include #include +#include #include "internal.h" @@ -210,6 +211,7 @@ void unregister_shrinker(struct shrinker *shrinker) down_write(&shrinker_rwsem); list_del(&shrinker->list); up_write(&shrinker_rwsem); + kfree(shrinker->nr_deferred); } EXPORT_SYMBOL(unregister_shrinker); @@ -1113,7 +1115,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, LIST_HEAD(clean_pages); list_for_each_entry_safe(page, next, page_list, lru) { - if (page_is_file_cache(page) && !PageDirty(page)) { + if (page_is_file_cache(page) && !PageDirty(page) && + !isolated_balloon_page(page)) { ClearPageActive(page); list_move(&page->lru, &clean_pages); } diff --git a/mm/zswap.c b/mm/zswap.c index 841e35f1db22..d93510c6aa2d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -804,6 +804,10 @@ static void zswap_frontswap_invalidate_area(unsigned type) } tree->rbroot = RB_ROOT; spin_unlock(&tree->lock); + + zbud_destroy_pool(tree->pool); + kfree(tree); + zswap_trees[type] = NULL; } static struct zbud_ops zswap_zbud_ops = { diff --git a/net/802/mrp.c b/net/802/mrp.c index 1eb05d80b07b..3ed616215870 100644 --- a/net/802/mrp.c +++ b/net/802/mrp.c @@ -24,6 +24,11 @@ static unsigned int mrp_join_time __read_mostly = 200; module_param(mrp_join_time, uint, 0644); MODULE_PARM_DESC(mrp_join_time, "Join time in ms (default 200ms)"); + +static unsigned int mrp_periodic_time __read_mostly = 1000; +module_param(mrp_periodic_time, uint, 0644); +MODULE_PARM_DESC(mrp_periodic_time, "Periodic time in ms (default 1s)"); + MODULE_LICENSE("GPL"); static const u8 @@ -595,6 +600,24 @@ static void mrp_join_timer(unsigned long data) mrp_join_timer_arm(app); } +static void mrp_periodic_timer_arm(struct mrp_applicant *app) +{ + mod_timer(&app->periodic_timer, + jiffies + msecs_to_jiffies(mrp_periodic_time)); +} + +static void mrp_periodic_timer(unsigned long data) +{ + struct mrp_applicant *app = (struct mrp_applicant *)data; + + spin_lock(&app->lock); + mrp_mad_event(app, MRP_EVENT_PERIODIC); + mrp_pdu_queue(app); + spin_unlock(&app->lock); + + mrp_periodic_timer_arm(app); +} + static int mrp_pdu_parse_end_mark(struct sk_buff *skb, int *offset) { __be16 endmark; @@ -845,6 +868,9 @@ int mrp_init_applicant(struct net_device *dev, struct mrp_application *appl) rcu_assign_pointer(dev->mrp_port->applicants[appl->type], app); setup_timer(&app->join_timer, mrp_join_timer, (unsigned long)app); mrp_join_timer_arm(app); + setup_timer(&app->periodic_timer, mrp_periodic_timer, + (unsigned long)app); + mrp_periodic_timer_arm(app); return 0; err3: @@ -870,6 +896,7 @@ void mrp_uninit_applicant(struct net_device *dev, struct mrp_application *appl) * all pending messages before the applicant is gone. */ del_timer_sync(&app->join_timer); + del_timer_sync(&app->periodic_timer); spin_lock_bh(&app->lock); mrp_mad_event(app, MRP_EVENT_TX); diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 634debab4d54..fb7356fcfe51 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1146,7 +1146,11 @@ int hci_dev_open(__u16 dev) goto done; } - if (hdev->rfkill && rfkill_blocked(hdev->rfkill)) { + /* Check for rfkill but allow the HCI setup stage to proceed + * (which in itself doesn't cause any RF activity). + */ + if (test_bit(HCI_RFKILLED, &hdev->dev_flags) && + !test_bit(HCI_SETUP, &hdev->dev_flags)) { ret = -ERFKILL; goto done; } @@ -1566,10 +1570,13 @@ static int hci_rfkill_set_block(void *data, bool blocked) BT_DBG("%p name %s blocked %d", hdev, hdev->name, blocked); - if (!blocked) - return 0; - - hci_dev_do_close(hdev); + if (blocked) { + set_bit(HCI_RFKILLED, &hdev->dev_flags); + if (!test_bit(HCI_SETUP, &hdev->dev_flags)) + hci_dev_do_close(hdev); + } else { + clear_bit(HCI_RFKILLED, &hdev->dev_flags); + } return 0; } @@ -1591,9 +1598,13 @@ static void hci_power_on(struct work_struct *work) return; } - if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags)) + if (test_bit(HCI_RFKILLED, &hdev->dev_flags)) { + clear_bit(HCI_AUTO_OFF, &hdev->dev_flags); + hci_dev_do_close(hdev); + } else if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags)) { queue_delayed_work(hdev->req_workqueue, &hdev->power_off, HCI_AUTO_OFF_TIMEOUT); + } if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags)) mgmt_index_added(hdev); @@ -2209,6 +2220,9 @@ int hci_register_dev(struct hci_dev *hdev) } } + if (hdev->rfkill && rfkill_blocked(hdev->rfkill)) + set_bit(HCI_RFKILLED, &hdev->dev_flags); + set_bit(HCI_SETUP, &hdev->dev_flags); if (hdev->dev_type != HCI_AMP) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 94aab73f89d4..8db3e89fae35 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -3557,7 +3557,11 @@ static void hci_le_ltk_request_evt(struct hci_dev *hdev, struct sk_buff *skb) cp.handle = cpu_to_le16(conn->handle); if (ltk->authenticated) - conn->sec_level = BT_SECURITY_HIGH; + conn->pending_sec_level = BT_SECURITY_HIGH; + else + conn->pending_sec_level = BT_SECURITY_MEDIUM; + + conn->enc_key_size = ltk->enc_size; hci_send_cmd(hdev, HCI_OP_LE_LTK_REPLY, sizeof(cp), &cp); diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index b3bb7bca8e60..63fa11109a1c 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3755,6 +3755,13 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, sk = chan->sk; + /* For certain devices (ex: HID mouse), support for authentication, + * pairing and bonding is optional. For such devices, inorder to avoid + * the ACL alive for too long after L2CAP disconnection, reset the ACL + * disc_timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect. + */ + conn->hcon->disc_timeout = HCI_DISCONN_TIMEOUT; + bacpy(&bt_sk(sk)->src, conn->src); bacpy(&bt_sk(sk)->dst, conn->dst); chan->psm = psm; diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c index 6d126faf145f..84fcf9fff3ea 100644 --- a/net/bluetooth/rfcomm/tty.c +++ b/net/bluetooth/rfcomm/tty.c @@ -569,7 +569,6 @@ static void rfcomm_dev_data_ready(struct rfcomm_dlc *dlc, struct sk_buff *skb) static void rfcomm_dev_state_change(struct rfcomm_dlc *dlc, int err) { struct rfcomm_dev *dev = dlc->owner; - struct tty_struct *tty; if (!dev) return; @@ -581,38 +580,8 @@ static void rfcomm_dev_state_change(struct rfcomm_dlc *dlc, int err) DPM_ORDER_DEV_AFTER_PARENT); wake_up_interruptible(&dev->port.open_wait); - } else if (dlc->state == BT_CLOSED) { - tty = tty_port_tty_get(&dev->port); - if (!tty) { - if (test_bit(RFCOMM_RELEASE_ONHUP, &dev->flags)) { - /* Drop DLC lock here to avoid deadlock - * 1. rfcomm_dev_get will take rfcomm_dev_lock - * but in rfcomm_dev_add there's lock order: - * rfcomm_dev_lock -> dlc lock - * 2. tty_port_put will deadlock if it's - * the last reference - * - * FIXME: when we release the lock anything - * could happen to dev, even its destruction - */ - rfcomm_dlc_unlock(dlc); - if (rfcomm_dev_get(dev->id) == NULL) { - rfcomm_dlc_lock(dlc); - return; - } - - if (!test_and_set_bit(RFCOMM_TTY_RELEASED, - &dev->flags)) - tty_port_put(&dev->port); - - tty_port_put(&dev->port); - rfcomm_dlc_lock(dlc); - } - } else { - tty_hangup(tty); - tty_kref_put(tty); - } - } + } else if (dlc->state == BT_CLOSED) + tty_port_tty_hangup(&dev->port, false); } static void rfcomm_dev_modem_status(struct rfcomm_dlc *dlc, u8 v24_sig) diff --git a/net/core/dev.c b/net/core/dev.c index 5c713f2239cc..65f829cfd928 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5247,10 +5247,12 @@ static int dev_new_index(struct net *net) /* Delayed registration/unregisteration */ static LIST_HEAD(net_todo_list); +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq); static void net_set_todo(struct net_device *dev) { list_add_tail(&dev->todo_list, &net_todo_list); + dev_net(dev)->dev_unreg_count++; } static void rollback_registered_many(struct list_head *head) @@ -5918,6 +5920,12 @@ void netdev_run_todo(void) if (dev->destructor) dev->destructor(dev); + /* Report a network device has been unregistered */ + rtnl_lock(); + dev_net(dev)->dev_unreg_count--; + __rtnl_unlock(); + wake_up(&netdev_unregistering_wq); + /* Free network device */ kobject_put(&dev->dev.kobj); } @@ -6603,6 +6611,34 @@ static void __net_exit default_device_exit(struct net *net) rtnl_unlock(); } +static void __net_exit rtnl_lock_unregistering(struct list_head *net_list) +{ + /* Return with the rtnl_lock held when there are no network + * devices unregistering in any network namespace in net_list. + */ + struct net *net; + bool unregistering; + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait(&netdev_unregistering_wq, &wait, + TASK_UNINTERRUPTIBLE); + unregistering = false; + rtnl_lock(); + list_for_each_entry(net, net_list, exit_list) { + if (net->dev_unreg_count > 0) { + unregistering = true; + break; + } + } + if (!unregistering) + break; + __rtnl_unlock(); + schedule(); + } + finish_wait(&netdev_unregistering_wq, &wait); +} + static void __net_exit default_device_exit_batch(struct list_head *net_list) { /* At exit all network devices most be removed from a network @@ -6614,7 +6650,18 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list) struct net *net; LIST_HEAD(dev_kill_list); - rtnl_lock(); + /* To prevent network device cleanup code from dereferencing + * loopback devices or network devices that have been freed + * wait here for all pending unregistrations to complete, + * before unregistring the loopback device and allowing the + * network namespace be freed. + * + * The netdev todo list containing all network devices + * unregistrations that happen in default_device_exit_batch + * will run in the rtnl_unlock() at the end of + * default_device_exit_batch. + */ + rtnl_lock_unregistering(net_list); list_for_each_entry(net, net_list, exit_list) { for_each_netdev_reverse(net, dev) { if (dev->rtnl_link_ops) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1929af87b260..8d7d0dd72db2 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -154,8 +154,8 @@ ipv6: if (poff >= 0) { __be32 *ports, _ports; - nhoff += poff; - ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports); + ports = skb_header_pointer(skb, nhoff + poff, + sizeof(_ports), &_ports); if (ports) flow->ports = *ports; } diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c index 6a2f13cee86a..3f1ec1586ae1 100644 --- a/net/core/secure_seq.c +++ b/net/core/secure_seq.c @@ -10,11 +10,24 @@ #include -static u32 net_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned; +#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4) -void net_secret_init(void) +static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned; + +static void net_secret_init(void) { - get_random_bytes(net_secret, sizeof(net_secret)); + u32 tmp; + int i; + + if (likely(net_secret[0])) + return; + + for (i = NET_SECRET_SIZE; i > 0;) { + do { + get_random_bytes(&tmp, sizeof(tmp)); + } while (!tmp); + cmpxchg(&net_secret[--i], 0, tmp); + } } #ifdef CONFIG_INET @@ -42,6 +55,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr, u32 hash[MD5_DIGEST_WORDS]; u32 i; + net_secret_init(); memcpy(hash, saddr, 16); for (i = 0; i < 4; i++) secret[i] = net_secret[i] + (__force u32)daddr[i]; @@ -63,6 +77,7 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr, u32 hash[MD5_DIGEST_WORDS]; u32 i; + net_secret_init(); memcpy(hash, saddr, 16); for (i = 0; i < 4; i++) secret[i] = net_secret[i] + (__force u32) daddr[i]; @@ -82,6 +97,7 @@ __u32 secure_ip_id(__be32 daddr) { u32 hash[MD5_DIGEST_WORDS]; + net_secret_init(); hash[0] = (__force __u32) daddr; hash[1] = net_secret[13]; hash[2] = net_secret[14]; @@ -96,6 +112,7 @@ __u32 secure_ipv6_id(const __be32 daddr[4]) { __u32 hash[4]; + net_secret_init(); memcpy(hash, daddr, 16); md5_transform(hash, net_secret); @@ -107,6 +124,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr, { u32 hash[MD5_DIGEST_WORDS]; + net_secret_init(); hash[0] = (__force u32)saddr; hash[1] = (__force u32)daddr; hash[2] = ((__force u16)sport << 16) + (__force u16)dport; @@ -121,6 +139,7 @@ u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport) { u32 hash[MD5_DIGEST_WORDS]; + net_secret_init(); hash[0] = (__force u32)saddr; hash[1] = (__force u32)daddr; hash[2] = (__force u32)dport ^ net_secret[14]; @@ -140,6 +159,7 @@ u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr, u32 hash[MD5_DIGEST_WORDS]; u64 seq; + net_secret_init(); hash[0] = (__force u32)saddr; hash[1] = (__force u32)daddr; hash[2] = ((__force u16)sport << 16) + (__force u16)dport; @@ -164,6 +184,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr, u64 seq; u32 i; + net_secret_init(); memcpy(hash, saddr, 16); for (i = 0; i < 4; i++) secret[i] = net_secret[i] + daddr[i]; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 7a1874b7b8fd..cfeb85cff4f0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -263,10 +263,8 @@ void build_ehash_secret(void) get_random_bytes(&rnd, sizeof(rnd)); } while (rnd == 0); - if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) { + if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret)); - net_secret_init(); - } } EXPORT_SYMBOL(build_ehash_secret); diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index dace87f06e5f..7defdc9ba167 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -736,7 +736,7 @@ static void igmp_gq_timer_expire(unsigned long data) in_dev->mr_gq_running = 0; igmpv3_send_report(in_dev, NULL); - __in_dev_put(in_dev); + in_dev_put(in_dev); } static void igmp_ifc_timer_expire(unsigned long data) @@ -749,7 +749,7 @@ static void igmp_ifc_timer_expire(unsigned long data) igmp_ifc_start_timer(in_dev, unsolicited_report_interval(in_dev)); } - __in_dev_put(in_dev); + in_dev_put(in_dev); } static void igmp_ifc_event(struct in_device *in_dev) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index ac9fabe0300f..63a6d6d6b875 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -623,6 +623,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, tunnel->err_count = 0; } + tos = ip_tunnel_ecn_encap(tos, inner_iph, skb); ttl = tnl_params->ttl; if (ttl == 0) { if (skb->protocol == htons(ETH_P_IP)) @@ -641,18 +642,17 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr) + rt->dst.header_len; - if (max_headroom > dev->needed_headroom) { + if (max_headroom > dev->needed_headroom) dev->needed_headroom = max_headroom; - if (skb_cow_head(skb, dev->needed_headroom)) { - dev->stats.tx_dropped++; - dev_kfree_skb(skb); - return; - } + + if (skb_cow_head(skb, dev->needed_headroom)) { + dev->stats.tx_dropped++; + dev_kfree_skb(skb); + return; } err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol, - ip_tunnel_ecn_encap(tos, inner_iph, skb), ttl, df, - !net_eq(tunnel->net, dev_net(dev))); + tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); iptunnel_xmit_stats(err, &dev->stats, dev->tstats); return; @@ -853,8 +853,10 @@ int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id, /* FB netdevice is special: we have one, and only one per netns. * Allowing to move it to another netns is clearly unsafe. */ - if (!IS_ERR(itn->fb_tunnel_dev)) + if (!IS_ERR(itn->fb_tunnel_dev)) { itn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; + ip_tunnel_add(itn, netdev_priv(itn->fb_tunnel_dev)); + } rtnl_unlock(); return PTR_RET(itn->fb_tunnel_dev); @@ -884,8 +886,6 @@ static void ip_tunnel_destroy(struct ip_tunnel_net *itn, struct list_head *head, if (!net_eq(dev_net(t->dev), net)) unregister_netdevice_queue(t->dev, head); } - if (itn->fb_tunnel_dev) - unregister_netdevice_queue(itn->fb_tunnel_dev, head); } void ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct rtnl_link_ops *ops) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index d6c856b17fd4..c31e3ad98ef2 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -61,7 +61,7 @@ int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb, memset(IPCB(skb), 0, sizeof(*IPCB(skb))); /* Push down and install the IP header. */ - __skb_push(skb, sizeof(struct iphdr)); + skb_push(skb, sizeof(struct iphdr)); skb_reset_network_header(skb); iph = ip_hdr(skb); diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c index 67e17dcda65e..b6346bf2fde3 100644 --- a/net/ipv4/netfilter/ipt_SYNPROXY.c +++ b/net/ipv4/netfilter/ipt_SYNPROXY.c @@ -267,7 +267,8 @@ synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par) if (th == NULL) return NF_DROP; - synproxy_parse_options(skb, par->thoff, th, &opts); + if (!synproxy_parse_options(skb, par->thoff, th, &opts)) + return NF_DROP; if (th->syn && !(th->ack || th->fin || th->rst)) { /* Initial SYN from client */ @@ -350,7 +351,8 @@ static unsigned int ipv4_synproxy_hook(unsigned int hooknum, /* fall through */ case TCP_CONNTRACK_SYN_SENT: - synproxy_parse_options(skb, thoff, th, &opts); + if (!synproxy_parse_options(skb, thoff, th, &opts)) + return NF_DROP; if (!th->syn && th->ack && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { @@ -373,7 +375,9 @@ static unsigned int ipv4_synproxy_hook(unsigned int hooknum, if (!th->syn || !th->ack) break; - synproxy_parse_options(skb, thoff, th, &opts); + if (!synproxy_parse_options(skb, thoff, th, &opts)) + return NF_DROP; + if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP) synproxy->tsoff = opts.tsval - synproxy->its; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index bfec521c717f..193db03540ad 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -218,8 +218,10 @@ static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info) if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) ipv4_sk_update_pmtu(skb, sk, info); - else if (type == ICMP_REDIRECT) + else if (type == ICMP_REDIRECT) { ipv4_sk_redirect(skb, sk); + return; + } /* Report error on raw socket, if: 1. User requested ip_recverr. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7c83cb8bf137..e6bb8256e59f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -895,8 +895,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, skb_orphan(skb); skb->sk = sk; - skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ? - tcp_wfree : sock_wfree; + skb->destructor = tcp_wfree; atomic_add(skb->truesize, &sk->sk_wmem_alloc); /* Build TCP header and checksum it. */ @@ -1840,7 +1839,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, while ((skb = tcp_send_head(sk))) { unsigned int limit; - tso_segs = tcp_init_tso_segs(sk, skb, mss_now); BUG_ON(!tso_segs); @@ -1869,13 +1867,20 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } - /* TSQ : sk_wmem_alloc accounts skb truesize, - * including skb overhead. But thats OK. + /* TCP Small Queues : + * Control number of packets in qdisc/devices to two packets / or ~1 ms. + * This allows for : + * - better RTT estimation and ACK scheduling + * - faster recovery + * - high rates */ - if (atomic_read(&sk->sk_wmem_alloc) >= sysctl_tcp_limit_output_bytes) { + limit = max(skb->truesize, sk->sk_pacing_rate >> 10); + + if (atomic_read(&sk->sk_wmem_alloc) > limit) { set_bit(TSQ_THROTTLED, &tp->tsq_flags); break; } + limit = mss_now; if (tso_segs > 1 && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 74d2c95db57f..0ca44df51ee9 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -658,7 +658,7 @@ void __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable) break; case ICMP_REDIRECT: ipv4_sk_redirect(skb, sk); - break; + goto out; } /* diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d6ff12617f36..cd3fb301da38 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1499,6 +1499,33 @@ static bool ipv6_chk_same_addr(struct net *net, const struct in6_addr *addr, return false; } +/* Compares an address/prefix_len with addresses on device @dev. + * If one is found it returns true. + */ +bool ipv6_chk_custom_prefix(const struct in6_addr *addr, + const unsigned int prefix_len, struct net_device *dev) +{ + struct inet6_dev *idev; + struct inet6_ifaddr *ifa; + bool ret = false; + + rcu_read_lock(); + idev = __in6_dev_get(dev); + if (idev) { + read_lock_bh(&idev->lock); + list_for_each_entry(ifa, &idev->addr_list, if_list) { + ret = ipv6_prefix_equal(addr, &ifa->addr, prefix_len); + if (ret) + break; + } + read_unlock_bh(&idev->lock); + } + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL(ipv6_chk_custom_prefix); + int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev) { struct inet6_dev *idev; @@ -2193,43 +2220,21 @@ ok: else stored_lft = 0; if (!update_lft && !create && stored_lft) { - if (valid_lft > MIN_VALID_LIFETIME || - valid_lft > stored_lft) - update_lft = 1; - else if (stored_lft <= MIN_VALID_LIFETIME) { - /* valid_lft <= stored_lft is always true */ - /* - * RFC 4862 Section 5.5.3e: - * "Note that the preferred lifetime of - * the corresponding address is always - * reset to the Preferred Lifetime in - * the received Prefix Information - * option, regardless of whether the - * valid lifetime is also reset or - * ignored." - * - * So if the preferred lifetime in - * this advertisement is different - * than what we have stored, but the - * valid lifetime is invalid, just - * reset prefered_lft. - * - * We must set the valid lifetime - * to the stored lifetime since we'll - * be updating the timestamp below, - * else we'll set it back to the - * minimum. - */ - if (prefered_lft != ifp->prefered_lft) { - valid_lft = stored_lft; - update_lft = 1; - } - } else { - valid_lft = MIN_VALID_LIFETIME; - if (valid_lft < prefered_lft) - prefered_lft = valid_lft; - update_lft = 1; - } + const u32 minimum_lft = min( + stored_lft, (u32)MIN_VALID_LIFETIME); + valid_lft = max(valid_lft, minimum_lft); + + /* RFC4862 Section 5.5.3e: + * "Note that the preferred lifetime of the + * corresponding address is always reset to + * the Preferred Lifetime in the received + * Prefix Information option, regardless of + * whether the valid lifetime is also reset or + * ignored." + * + * So we should always update prefered_lft here. + */ + update_lft = 1; } if (update_lft) { diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 6b26e9feafb9..7bb5446b9d73 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -618,7 +618,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb, struct ip6_tnl *tunnel = netdev_priv(dev); struct net_device *tdev; /* Device to other host */ struct ipv6hdr *ipv6h; /* Our new IP header */ - unsigned int max_headroom; /* The extra header space needed */ + unsigned int max_headroom = 0; /* The extra header space needed */ int gre_hlen; struct ipv6_tel_txoption opt; int mtu; @@ -693,7 +693,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb, skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev))); - max_headroom = LL_RESERVED_SPACE(tdev) + gre_hlen + dst->header_len; + max_headroom += LL_RESERVED_SPACE(tdev) + gre_hlen + dst->header_len; if (skb_headroom(skb) < max_headroom || skb_shared(skb) || (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3a692d529163..a54c45ce4a48 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk, * udp datagram */ if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) { + struct frag_hdr fhdr; + skb = sock_alloc_send_skb(sk, hh_len + fragheaderlen + transhdrlen + 20, (flags & MSG_DONTWAIT), &err); @@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk, skb->protocol = htons(ETH_P_IPV6); skb->ip_summed = CHECKSUM_PARTIAL; skb->csum = 0; - } - - err = skb_append_datato_frags(sk,skb, getfrag, from, - (length - transhdrlen)); - if (!err) { - struct frag_hdr fhdr; /* Specify the length of each IPv6 datagram fragment. * It has to be a multiple of 8. @@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk, ipv6_select_ident(&fhdr, rt); skb_shinfo(skb)->ip6_frag_id = fhdr.identification; __skb_queue_tail(&sk->sk_write_queue, skb); - - return 0; } - /* There is not enough support do UPD LSO, - * so follow normal path - */ - kfree_skb(skb); - return err; + return skb_append_datato_frags(sk, skb, getfrag, from, + (length - transhdrlen)); } static inline struct ipv6_opt_hdr *ip6_opt_dup(struct ipv6_opt_hdr *src, @@ -1227,27 +1218,27 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, * --yoshfuji */ - cork->length += length; - if (length > mtu) { - int proto = sk->sk_protocol; - if (dontfrag && (proto == IPPROTO_UDP || proto == IPPROTO_RAW)){ - ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen); - return -EMSGSIZE; - } - - if (proto == IPPROTO_UDP && - (rt->dst.dev->features & NETIF_F_UFO)) { + if ((length > mtu) && dontfrag && (sk->sk_protocol == IPPROTO_UDP || + sk->sk_protocol == IPPROTO_RAW)) { + ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen); + return -EMSGSIZE; + } - err = ip6_ufo_append_data(sk, getfrag, from, length, - hh_len, fragheaderlen, - transhdrlen, mtu, flags, rt); - if (err) - goto error; - return 0; - } + skb = skb_peek_tail(&sk->sk_write_queue); + cork->length += length; + if (((length > mtu) || + (skb && skb_is_gso(skb))) && + (sk->sk_protocol == IPPROTO_UDP) && + (rt->dst.dev->features & NETIF_F_UFO)) { + err = ip6_ufo_append_data(sk, getfrag, from, length, + hh_len, fragheaderlen, + transhdrlen, mtu, flags, rt); + if (err) + goto error; + return 0; } - if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) + if (!skb) goto alloc_new_skb; while (length > 0) { diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 2d8f4829575b..a791552e0422 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1731,8 +1731,6 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct ip6_tnl_net *ip6n) } } - t = rtnl_dereference(ip6n->tnls_wc[0]); - unregister_netdevice_queue(t->dev, &list); unregister_netdevice_many(&list); } @@ -1752,6 +1750,7 @@ static int __net_init ip6_tnl_init_net(struct net *net) if (!ip6n->fb_tnl_dev) goto err_alloc_dev; dev_net_set(ip6n->fb_tnl_dev, net); + ip6n->fb_tnl_dev->rtnl_link_ops = &ip6_link_ops; /* FB netdevice is special: we have one, and only one per netns. * Allowing to move it to another netns is clearly unsafe. */ diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 096cd67b737c..d18f9f903db6 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -2034,7 +2034,7 @@ static void mld_dad_timer_expire(unsigned long data) if (idev->mc_dad_count) mld_dad_start_timer(idev, idev->mc_maxdelay); } - __in6_dev_put(idev); + in6_dev_put(idev); } static int ip6_mc_del1_src(struct ifmcaddr6 *pmc, int sfmode, @@ -2379,7 +2379,7 @@ static void mld_gq_timer_expire(unsigned long data) idev->mc_gq_running = 0; mld_send_report(idev, NULL); - __in6_dev_put(idev); + in6_dev_put(idev); } static void mld_ifc_timer_expire(unsigned long data) @@ -2392,7 +2392,7 @@ static void mld_ifc_timer_expire(unsigned long data) if (idev->mc_ifc_count) mld_ifc_start_timer(idev, idev->mc_maxdelay); } - __in6_dev_put(idev); + in6_dev_put(idev); } static void mld_ifc_event(struct inet6_dev *idev) diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c b/net/ipv6/netfilter/ip6t_SYNPROXY.c index 19cfea8dbcaa..2748b042da72 100644 --- a/net/ipv6/netfilter/ip6t_SYNPROXY.c +++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c @@ -282,7 +282,8 @@ synproxy_tg6(struct sk_buff *skb, const struct xt_action_param *par) if (th == NULL) return NF_DROP; - synproxy_parse_options(skb, par->thoff, th, &opts); + if (!synproxy_parse_options(skb, par->thoff, th, &opts)) + return NF_DROP; if (th->syn && !(th->ack || th->fin || th->rst)) { /* Initial SYN from client */ @@ -372,7 +373,8 @@ static unsigned int ipv6_synproxy_hook(unsigned int hooknum, /* fall through */ case TCP_CONNTRACK_SYN_SENT: - synproxy_parse_options(skb, thoff, th, &opts); + if (!synproxy_parse_options(skb, thoff, th, &opts)) + return NF_DROP; if (!th->syn && th->ack && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { @@ -395,7 +397,9 @@ static unsigned int ipv6_synproxy_hook(unsigned int hooknum, if (!th->syn || !th->ack) break; - synproxy_parse_options(skb, thoff, th, &opts); + if (!synproxy_parse_options(skb, thoff, th, &opts)) + return NF_DROP; + if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP) synproxy->tsoff = opts.tsval - synproxy->its; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 58916bbb1728..a4ed2416399e 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -335,8 +335,10 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb, ip6_sk_update_pmtu(skb, sk, info); harderr = (np->pmtudisc == IPV6_PMTUDISC_DO); } - if (type == NDISC_REDIRECT) + if (type == NDISC_REDIRECT) { ip6_sk_redirect(skb, sk); + return; + } if (np->recverr) { u8 *payload = skb->data; if (!inet->hdrincl) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 7ee5cb96db34..19269453a8ea 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -566,6 +566,70 @@ static inline bool is_spoofed_6rd(struct ip_tunnel *tunnel, const __be32 v4addr, return false; } +/* Checks if an address matches an address on the tunnel interface. + * Used to detect the NAT of proto 41 packets and let them pass spoofing test. + * Long story: + * This function is called after we considered the packet as spoofed + * in is_spoofed_6rd. + * We may have a router that is doing NAT for proto 41 packets + * for an internal station. Destination a.a.a.a/PREFIX:bbbb:bbbb + * will be translated to n.n.n.n/PREFIX:bbbb:bbbb. And is_spoofed_6rd + * function will return true, dropping the packet. + * But, we can still check if is spoofed against the IP + * addresses associated with the interface. + */ +static bool only_dnatted(const struct ip_tunnel *tunnel, + const struct in6_addr *v6dst) +{ + int prefix_len; + +#ifdef CONFIG_IPV6_SIT_6RD + prefix_len = tunnel->ip6rd.prefixlen + 32 + - tunnel->ip6rd.relay_prefixlen; +#else + prefix_len = 48; +#endif + return ipv6_chk_custom_prefix(v6dst, prefix_len, tunnel->dev); +} + +/* Returns true if a packet is spoofed */ +static bool packet_is_spoofed(struct sk_buff *skb, + const struct iphdr *iph, + struct ip_tunnel *tunnel) +{ + const struct ipv6hdr *ipv6h; + + if (tunnel->dev->priv_flags & IFF_ISATAP) { + if (!isatap_chksrc(skb, iph, tunnel)) + return true; + + return false; + } + + if (tunnel->dev->flags & IFF_POINTOPOINT) + return false; + + ipv6h = ipv6_hdr(skb); + + if (unlikely(is_spoofed_6rd(tunnel, iph->saddr, &ipv6h->saddr))) { + net_warn_ratelimited("Src spoofed %pI4/%pI6c -> %pI4/%pI6c\n", + &iph->saddr, &ipv6h->saddr, + &iph->daddr, &ipv6h->daddr); + return true; + } + + if (likely(!is_spoofed_6rd(tunnel, iph->daddr, &ipv6h->daddr))) + return false; + + if (only_dnatted(tunnel, &ipv6h->daddr)) + return false; + + net_warn_ratelimited("Dst spoofed %pI4/%pI6c -> %pI4/%pI6c\n", + &iph->saddr, &ipv6h->saddr, + &iph->daddr, &ipv6h->daddr); + return true; +} + static int ipip6_rcv(struct sk_buff *skb) { const struct iphdr *iph = ip_hdr(skb); @@ -586,19 +650,9 @@ static int ipip6_rcv(struct sk_buff *skb) IPCB(skb)->flags = 0; skb->protocol = htons(ETH_P_IPV6); - if (tunnel->dev->priv_flags & IFF_ISATAP) { - if (!isatap_chksrc(skb, iph, tunnel)) { - tunnel->dev->stats.rx_errors++; - goto out; - } - } else if (!(tunnel->dev->flags&IFF_POINTOPOINT)) { - if (is_spoofed_6rd(tunnel, iph->saddr, - &ipv6_hdr(skb)->saddr) || - is_spoofed_6rd(tunnel, iph->daddr, - &ipv6_hdr(skb)->daddr)) { - tunnel->dev->stats.rx_errors++; - goto out; - } + if (packet_is_spoofed(skb, iph, tunnel)) { + tunnel->dev->stats.rx_errors++; + goto out; } __skb_tunnel_rx(skb, tunnel->dev, tunnel->net); @@ -748,7 +802,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, neigh = dst_neigh_lookup(skb_dst(skb), &iph6->daddr); if (neigh == NULL) { - net_dbg_ratelimited("sit: nexthop == NULL\n"); + net_dbg_ratelimited("nexthop == NULL\n"); goto tx_error; } @@ -777,7 +831,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, neigh = dst_neigh_lookup(skb_dst(skb), &iph6->daddr); if (neigh == NULL) { - net_dbg_ratelimited("sit: nexthop == NULL\n"); + net_dbg_ratelimited("nexthop == NULL\n"); goto tx_error; } @@ -1612,6 +1666,7 @@ static int __net_init sit_init_net(struct net *net) goto err_alloc_dev; } dev_net_set(sitn->fb_tunnel_dev, net); + sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; /* FB netdevice is special: we have one, and only one per netns. * Allowing to move it to another netns is clearly unsafe. */ @@ -1646,7 +1701,6 @@ static void __net_exit sit_exit_net(struct net *net) rtnl_lock(); sit_destroy_tunnels(sitn, &list); - unregister_netdevice_queue(sitn->fb_tunnel_dev, &list); unregister_netdevice_many(&list); rtnl_unlock(); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index f4058150262b..72b7eaaf3ca0 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -525,8 +525,10 @@ void __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (type == ICMPV6_PKT_TOOBIG) ip6_sk_update_pmtu(skb, sk, info); - if (type == NDISC_REDIRECT) + if (type == NDISC_REDIRECT) { ip6_sk_redirect(skb, sk); + goto out; + } np = inet6_sk(sk); diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c index 0578d4fa00a9..0f676908d15b 100644 --- a/net/irda/af_irda.c +++ b/net/irda/af_irda.c @@ -2563,9 +2563,8 @@ bed: jiffies + msecs_to_jiffies(val)); /* Wait for IR-LMP to call us back */ - __wait_event_interruptible(self->query_wait, - (self->cachedaddr != 0 || self->errno == -ETIME), - err); + err = __wait_event_interruptible(self->query_wait, + (self->cachedaddr != 0 || self->errno == -ETIME)); /* If watchdog is still activated, kill it! */ del_timer(&(self->watchdog)); diff --git a/net/lapb/lapb_timer.c b/net/lapb/lapb_timer.c index 54563ad8aeb1..355cc3b6fa4d 100644 --- a/net/lapb/lapb_timer.c +++ b/net/lapb/lapb_timer.c @@ -154,6 +154,7 @@ static void lapb_t1timer_expiry(unsigned long param) } else { lapb->n2count++; lapb_requeue_frames(lapb); + lapb_kick(lapb); } break; diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 4f69e83ff836..74fd00c27210 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -116,6 +116,7 @@ ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb) if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; + struct ip_vs_service *svc; s = this_cpu_ptr(dest->stats.cpustats); s->ustats.inpkts++; @@ -123,11 +124,14 @@ ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb) s->ustats.inbytes += skb->len; u64_stats_update_end(&s->syncp); - s = this_cpu_ptr(dest->svc->stats.cpustats); + rcu_read_lock(); + svc = rcu_dereference(dest->svc); + s = this_cpu_ptr(svc->stats.cpustats); s->ustats.inpkts++; u64_stats_update_begin(&s->syncp); s->ustats.inbytes += skb->len; u64_stats_update_end(&s->syncp); + rcu_read_unlock(); s = this_cpu_ptr(ipvs->tot_stats.cpustats); s->ustats.inpkts++; @@ -146,6 +150,7 @@ ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb) if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; + struct ip_vs_service *svc; s = this_cpu_ptr(dest->stats.cpustats); s->ustats.outpkts++; @@ -153,11 +158,14 @@ ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb) s->ustats.outbytes += skb->len; u64_stats_update_end(&s->syncp); - s = this_cpu_ptr(dest->svc->stats.cpustats); + rcu_read_lock(); + svc = rcu_dereference(dest->svc); + s = this_cpu_ptr(svc->stats.cpustats); s->ustats.outpkts++; u64_stats_update_begin(&s->syncp); s->ustats.outbytes += skb->len; u64_stats_update_end(&s->syncp); + rcu_read_unlock(); s = this_cpu_ptr(ipvs->tot_stats.cpustats); s->ustats.outpkts++; diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index c8148e487386..a3df9bddc4f7 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -460,7 +460,7 @@ static inline void __ip_vs_bind_svc(struct ip_vs_dest *dest, struct ip_vs_service *svc) { atomic_inc(&svc->refcnt); - dest->svc = svc; + rcu_assign_pointer(dest->svc, svc); } static void ip_vs_service_free(struct ip_vs_service *svc) @@ -470,18 +470,25 @@ static void ip_vs_service_free(struct ip_vs_service *svc) kfree(svc); } -static void -__ip_vs_unbind_svc(struct ip_vs_dest *dest) +static void ip_vs_service_rcu_free(struct rcu_head *head) { - struct ip_vs_service *svc = dest->svc; + struct ip_vs_service *svc; + + svc = container_of(head, struct ip_vs_service, rcu_head); + ip_vs_service_free(svc); +} - dest->svc = NULL; +static void __ip_vs_svc_put(struct ip_vs_service *svc, bool do_delay) +{ if (atomic_dec_and_test(&svc->refcnt)) { IP_VS_DBG_BUF(3, "Removing service %u/%s:%u\n", svc->fwmark, IP_VS_DBG_ADDR(svc->af, &svc->addr), ntohs(svc->port)); - ip_vs_service_free(svc); + if (do_delay) + call_rcu(&svc->rcu_head, ip_vs_service_rcu_free); + else + ip_vs_service_free(svc); } } @@ -667,11 +674,6 @@ ip_vs_trash_get_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr, IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port), atomic_read(&dest->refcnt)); - /* We can not reuse dest while in grace period - * because conns still can use dest->svc - */ - if (test_bit(IP_VS_DEST_STATE_REMOVING, &dest->state)) - continue; if (dest->af == svc->af && ip_vs_addr_equal(svc->af, &dest->addr, daddr) && dest->port == dport && @@ -697,8 +699,10 @@ out: static void ip_vs_dest_free(struct ip_vs_dest *dest) { + struct ip_vs_service *svc = rcu_dereference_protected(dest->svc, 1); + __ip_vs_dst_cache_reset(dest); - __ip_vs_unbind_svc(dest); + __ip_vs_svc_put(svc, false); free_percpu(dest->stats.cpustats); kfree(dest); } @@ -771,6 +775,7 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, struct ip_vs_dest_user_kern *udest, int add) { struct netns_ipvs *ipvs = net_ipvs(svc->net); + struct ip_vs_service *old_svc; struct ip_vs_scheduler *sched; int conn_flags; @@ -792,13 +797,14 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, atomic_set(&dest->conn_flags, conn_flags); /* bind the service */ - if (!dest->svc) { + old_svc = rcu_dereference_protected(dest->svc, 1); + if (!old_svc) { __ip_vs_bind_svc(dest, svc); } else { - if (dest->svc != svc) { - __ip_vs_unbind_svc(dest); + if (old_svc != svc) { ip_vs_zero_stats(&dest->stats); __ip_vs_bind_svc(dest, svc); + __ip_vs_svc_put(old_svc, true); } } @@ -998,16 +1004,6 @@ ip_vs_edit_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest) return 0; } -static void ip_vs_dest_wait_readers(struct rcu_head *head) -{ - struct ip_vs_dest *dest = container_of(head, struct ip_vs_dest, - rcu_head); - - /* End of grace period after unlinking */ - clear_bit(IP_VS_DEST_STATE_REMOVING, &dest->state); -} - - /* * Delete a destination (must be already unlinked from the service) */ @@ -1023,20 +1019,16 @@ static void __ip_vs_del_dest(struct net *net, struct ip_vs_dest *dest, */ ip_vs_rs_unhash(dest); - if (!cleanup) { - set_bit(IP_VS_DEST_STATE_REMOVING, &dest->state); - call_rcu(&dest->rcu_head, ip_vs_dest_wait_readers); - } - spin_lock_bh(&ipvs->dest_trash_lock); IP_VS_DBG_BUF(3, "Moving dest %s:%u into trash, dest->refcnt=%d\n", IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port), atomic_read(&dest->refcnt)); if (list_empty(&ipvs->dest_trash) && !cleanup) mod_timer(&ipvs->dest_trash_timer, - jiffies + IP_VS_DEST_TRASH_PERIOD); + jiffies + (IP_VS_DEST_TRASH_PERIOD >> 1)); /* dest lives in trash without reference */ list_add(&dest->t_list, &ipvs->dest_trash); + dest->idle_start = 0; spin_unlock_bh(&ipvs->dest_trash_lock); ip_vs_dest_put(dest); } @@ -1108,24 +1100,30 @@ static void ip_vs_dest_trash_expire(unsigned long data) struct net *net = (struct net *) data; struct netns_ipvs *ipvs = net_ipvs(net); struct ip_vs_dest *dest, *next; + unsigned long now = jiffies; spin_lock(&ipvs->dest_trash_lock); list_for_each_entry_safe(dest, next, &ipvs->dest_trash, t_list) { - /* Skip if dest is in grace period */ - if (test_bit(IP_VS_DEST_STATE_REMOVING, &dest->state)) - continue; if (atomic_read(&dest->refcnt) > 0) continue; + if (dest->idle_start) { + if (time_before(now, dest->idle_start + + IP_VS_DEST_TRASH_PERIOD)) + continue; + } else { + dest->idle_start = max(1UL, now); + continue; + } IP_VS_DBG_BUF(3, "Removing destination %u/%s:%u from trash\n", dest->vfwmark, - IP_VS_DBG_ADDR(dest->svc->af, &dest->addr), + IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port)); list_del(&dest->t_list); ip_vs_dest_free(dest); } if (!list_empty(&ipvs->dest_trash)) mod_timer(&ipvs->dest_trash_timer, - jiffies + IP_VS_DEST_TRASH_PERIOD); + jiffies + (IP_VS_DEST_TRASH_PERIOD >> 1)); spin_unlock(&ipvs->dest_trash_lock); } @@ -1320,14 +1318,6 @@ out: return ret; } -static void ip_vs_service_rcu_free(struct rcu_head *head) -{ - struct ip_vs_service *svc; - - svc = container_of(head, struct ip_vs_service, rcu_head); - ip_vs_service_free(svc); -} - /* * Delete a service from the service list * - The service must be unlinked, unlocked and not referenced! @@ -1376,13 +1366,7 @@ static void __ip_vs_del_service(struct ip_vs_service *svc, bool cleanup) /* * Free the service if nobody refers to it */ - if (atomic_dec_and_test(&svc->refcnt)) { - IP_VS_DBG_BUF(3, "Removing service %u/%s:%u\n", - svc->fwmark, - IP_VS_DBG_ADDR(svc->af, &svc->addr), - ntohs(svc->port)); - call_rcu(&svc->rcu_head, ip_vs_service_rcu_free); - } + __ip_vs_svc_put(svc, true); /* decrease the module use count */ ip_vs_use_count_dec(); diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c index 6bee6d0c73a5..1425e9a924c4 100644 --- a/net/netfilter/ipvs/ip_vs_est.c +++ b/net/netfilter/ipvs/ip_vs_est.c @@ -59,12 +59,13 @@ static void ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum, struct ip_vs_cpu_stats __percpu *stats) { int i; + bool add = false; for_each_possible_cpu(i) { struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i); unsigned int start; __u64 inbytes, outbytes; - if (i) { + if (add) { sum->conns += s->ustats.conns; sum->inpkts += s->ustats.inpkts; sum->outpkts += s->ustats.outpkts; @@ -76,6 +77,7 @@ static void ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum, sum->inbytes += inbytes; sum->outbytes += outbytes; } else { + add = true; sum->conns = s->ustats.conns; sum->inpkts = s->ustats.inpkts; sum->outpkts = s->ustats.outpkts; diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c index 1383b0eadc0e..eff13c94498e 100644 --- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -93,7 +93,7 @@ struct ip_vs_lblc_entry { struct hlist_node list; int af; /* address family */ union nf_inet_addr addr; /* destination IP address */ - struct ip_vs_dest __rcu *dest; /* real server (cache) */ + struct ip_vs_dest *dest; /* real server (cache) */ unsigned long lastuse; /* last used time */ struct rcu_head rcu_head; }; @@ -130,20 +130,21 @@ static struct ctl_table vs_vars_table[] = { }; #endif -static inline void ip_vs_lblc_free(struct ip_vs_lblc_entry *en) +static void ip_vs_lblc_rcu_free(struct rcu_head *head) { - struct ip_vs_dest *dest; + struct ip_vs_lblc_entry *en = container_of(head, + struct ip_vs_lblc_entry, + rcu_head); - hlist_del_rcu(&en->list); - /* - * We don't kfree dest because it is referred either by its service - * or the trash dest list. - */ - dest = rcu_dereference_protected(en->dest, 1); - ip_vs_dest_put(dest); - kfree_rcu(en, rcu_head); + ip_vs_dest_put(en->dest); + kfree(en); } +static inline void ip_vs_lblc_del(struct ip_vs_lblc_entry *en) +{ + hlist_del_rcu(&en->list); + call_rcu(&en->rcu_head, ip_vs_lblc_rcu_free); +} /* * Returns hash value for IPVS LBLC entry @@ -203,30 +204,23 @@ ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr, struct ip_vs_lblc_entry *en; en = ip_vs_lblc_get(dest->af, tbl, daddr); - if (!en) { - en = kmalloc(sizeof(*en), GFP_ATOMIC); - if (!en) - return NULL; - - en->af = dest->af; - ip_vs_addr_copy(dest->af, &en->addr, daddr); - en->lastuse = jiffies; + if (en) { + if (en->dest == dest) + return en; + ip_vs_lblc_del(en); + } + en = kmalloc(sizeof(*en), GFP_ATOMIC); + if (!en) + return NULL; - ip_vs_dest_hold(dest); - RCU_INIT_POINTER(en->dest, dest); + en->af = dest->af; + ip_vs_addr_copy(dest->af, &en->addr, daddr); + en->lastuse = jiffies; - ip_vs_lblc_hash(tbl, en); - } else { - struct ip_vs_dest *old_dest; + ip_vs_dest_hold(dest); + en->dest = dest; - old_dest = rcu_dereference_protected(en->dest, 1); - if (old_dest != dest) { - ip_vs_dest_put(old_dest); - ip_vs_dest_hold(dest); - /* No ordering constraints for refcnt */ - RCU_INIT_POINTER(en->dest, dest); - } - } + ip_vs_lblc_hash(tbl, en); return en; } @@ -246,7 +240,7 @@ static void ip_vs_lblc_flush(struct ip_vs_service *svc) tbl->dead = 1; for (i=0; ibucket[i], list) { - ip_vs_lblc_free(en); + ip_vs_lblc_del(en); atomic_dec(&tbl->entries); } } @@ -281,7 +275,7 @@ static inline void ip_vs_lblc_full_check(struct ip_vs_service *svc) sysctl_lblc_expiration(svc))) continue; - ip_vs_lblc_free(en); + ip_vs_lblc_del(en); atomic_dec(&tbl->entries); } spin_unlock(&svc->sched_lock); @@ -335,7 +329,7 @@ static void ip_vs_lblc_check_expire(unsigned long data) if (time_before(now, en->lastuse + ENTRY_TIMEOUT)) continue; - ip_vs_lblc_free(en); + ip_vs_lblc_del(en); atomic_dec(&tbl->entries); goal--; } @@ -443,8 +437,8 @@ __ip_vs_lblc_schedule(struct ip_vs_service *svc) continue; doh = ip_vs_dest_conn_overhead(dest); - if (loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight)) { + if ((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight)) { least = dest; loh = doh; } @@ -511,7 +505,7 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, * free up entries from the trash at any time. */ - dest = rcu_dereference(en->dest); + dest = en->dest; if ((dest->flags & IP_VS_DEST_F_AVAILABLE) && atomic_read(&dest->weight) > 0 && !is_overloaded(dest, svc)) goto out; @@ -631,7 +625,7 @@ static void __exit ip_vs_lblc_cleanup(void) { unregister_ip_vs_scheduler(&ip_vs_lblc_scheduler); unregister_pernet_subsys(&ip_vs_lblc_ops); - synchronize_rcu(); + rcu_barrier(); } diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c index 5199448697f6..0b8550089a2e 100644 --- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -89,7 +89,7 @@ */ struct ip_vs_dest_set_elem { struct list_head list; /* list link */ - struct ip_vs_dest __rcu *dest; /* destination server */ + struct ip_vs_dest *dest; /* destination server */ struct rcu_head rcu_head; }; @@ -107,11 +107,7 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set, if (check) { list_for_each_entry(e, &set->list, list) { - struct ip_vs_dest *d; - - d = rcu_dereference_protected(e->dest, 1); - if (d == dest) - /* already existed */ + if (e->dest == dest) return; } } @@ -121,7 +117,7 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set, return; ip_vs_dest_hold(dest); - RCU_INIT_POINTER(e->dest, dest); + e->dest = dest; list_add_rcu(&e->list, &set->list); atomic_inc(&set->size); @@ -129,22 +125,27 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set, set->lastmod = jiffies; } +static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head) +{ + struct ip_vs_dest_set_elem *e; + + e = container_of(head, struct ip_vs_dest_set_elem, rcu_head); + ip_vs_dest_put(e->dest); + kfree(e); +} + static void ip_vs_dest_set_erase(struct ip_vs_dest_set *set, struct ip_vs_dest *dest) { struct ip_vs_dest_set_elem *e; list_for_each_entry(e, &set->list, list) { - struct ip_vs_dest *d; - - d = rcu_dereference_protected(e->dest, 1); - if (d == dest) { + if (e->dest == dest) { /* HIT */ atomic_dec(&set->size); set->lastmod = jiffies; - ip_vs_dest_put(dest); list_del_rcu(&e->list); - kfree_rcu(e, rcu_head); + call_rcu(&e->rcu_head, ip_vs_lblcr_elem_rcu_free); break; } } @@ -155,16 +156,8 @@ static void ip_vs_dest_set_eraseall(struct ip_vs_dest_set *set) struct ip_vs_dest_set_elem *e, *ep; list_for_each_entry_safe(e, ep, &set->list, list) { - struct ip_vs_dest *d; - - d = rcu_dereference_protected(e->dest, 1); - /* - * We don't kfree dest because it is referred either - * by its service or by the trash dest list. - */ - ip_vs_dest_put(d); list_del_rcu(&e->list); - kfree_rcu(e, rcu_head); + call_rcu(&e->rcu_head, ip_vs_lblcr_elem_rcu_free); } } @@ -175,12 +168,9 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set) struct ip_vs_dest *dest, *least; int loh, doh; - if (set == NULL) - return NULL; - /* select the first destination server, whose weight > 0 */ list_for_each_entry_rcu(e, &set->list, list) { - least = rcu_dereference(e->dest); + least = e->dest; if (least->flags & IP_VS_DEST_F_OVERLOAD) continue; @@ -195,13 +185,13 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set) /* find the destination with the weighted least load */ nextstage: list_for_each_entry_continue_rcu(e, &set->list, list) { - dest = rcu_dereference(e->dest); + dest = e->dest; if (dest->flags & IP_VS_DEST_F_OVERLOAD) continue; doh = ip_vs_dest_conn_overhead(dest); - if ((loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight)) + if (((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight)) && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { least = dest; loh = doh; @@ -232,7 +222,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set) /* select the first destination server, whose weight > 0 */ list_for_each_entry(e, &set->list, list) { - most = rcu_dereference_protected(e->dest, 1); + most = e->dest; if (atomic_read(&most->weight) > 0) { moh = ip_vs_dest_conn_overhead(most); goto nextstage; @@ -243,11 +233,11 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set) /* find the destination with the weighted most load */ nextstage: list_for_each_entry_continue(e, &set->list, list) { - dest = rcu_dereference_protected(e->dest, 1); + dest = e->dest; doh = ip_vs_dest_conn_overhead(dest); /* moh/mw < doh/dw ==> moh*dw < doh*mw, where mw,dw>0 */ - if ((moh * atomic_read(&dest->weight) < - doh * atomic_read(&most->weight)) + if (((__s64)moh * atomic_read(&dest->weight) < + (__s64)doh * atomic_read(&most->weight)) && (atomic_read(&dest->weight) > 0)) { most = dest; moh = doh; @@ -611,8 +601,8 @@ __ip_vs_lblcr_schedule(struct ip_vs_service *svc) continue; doh = ip_vs_dest_conn_overhead(dest); - if (loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight)) { + if ((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight)) { least = dest; loh = doh; } @@ -819,7 +809,7 @@ static void __exit ip_vs_lblcr_cleanup(void) { unregister_ip_vs_scheduler(&ip_vs_lblcr_scheduler); unregister_pernet_subsys(&ip_vs_lblcr_ops); - synchronize_rcu(); + rcu_barrier(); } diff --git a/net/netfilter/ipvs/ip_vs_nq.c b/net/netfilter/ipvs/ip_vs_nq.c index d8d9860934fe..961a6de9bb29 100644 --- a/net/netfilter/ipvs/ip_vs_nq.c +++ b/net/netfilter/ipvs/ip_vs_nq.c @@ -40,7 +40,7 @@ #include -static inline unsigned int +static inline int ip_vs_nq_dest_overhead(struct ip_vs_dest *dest) { /* @@ -59,7 +59,7 @@ ip_vs_nq_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest, *least = NULL; - unsigned int loh = 0, doh; + int loh = 0, doh; IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); @@ -92,8 +92,8 @@ ip_vs_nq_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, } if (!least || - (loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight))) { + ((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight))) { least = dest; loh = doh; } diff --git a/net/netfilter/ipvs/ip_vs_sed.c b/net/netfilter/ipvs/ip_vs_sed.c index a5284cc3d882..e446b9fa7424 100644 --- a/net/netfilter/ipvs/ip_vs_sed.c +++ b/net/netfilter/ipvs/ip_vs_sed.c @@ -44,7 +44,7 @@ #include -static inline unsigned int +static inline int ip_vs_sed_dest_overhead(struct ip_vs_dest *dest) { /* @@ -63,7 +63,7 @@ ip_vs_sed_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest, *least; - unsigned int loh, doh; + int loh, doh; IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); @@ -99,8 +99,8 @@ ip_vs_sed_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, if (dest->flags & IP_VS_DEST_F_OVERLOAD) continue; doh = ip_vs_sed_dest_overhead(dest); - if (loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight)) { + if ((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight)) { least = dest; loh = doh; } diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index f4484719f3e6..f63c2388f38d 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -1637,12 +1637,9 @@ static int sync_thread_master(void *data) continue; } while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { - int ret = 0; - - __wait_event_interruptible(*sk_sleep(sk), + int ret = __wait_event_interruptible(*sk_sleep(sk), sock_writeable(sk) || - kthread_should_stop(), - ret); + kthread_should_stop()); if (unlikely(kthread_should_stop())) goto done; } diff --git a/net/netfilter/ipvs/ip_vs_wlc.c b/net/netfilter/ipvs/ip_vs_wlc.c index 6dc1fa128840..b5b4650d50a9 100644 --- a/net/netfilter/ipvs/ip_vs_wlc.c +++ b/net/netfilter/ipvs/ip_vs_wlc.c @@ -35,7 +35,7 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest, *least; - unsigned int loh, doh; + int loh, doh; IP_VS_DBG(6, "ip_vs_wlc_schedule(): Scheduling...\n"); @@ -71,8 +71,8 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, if (dest->flags & IP_VS_DEST_F_OVERLOAD) continue; doh = ip_vs_dest_conn_overhead(dest); - if (loh * atomic_read(&dest->weight) > - doh * atomic_read(&least->weight)) { + if ((__s64)loh * atomic_read(&dest->weight) > + (__s64)doh * atomic_read(&least->weight)) { least = dest; loh = doh; } diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c index 6fd967c6278c..cdf4567ba9b3 100644 --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -24,7 +24,7 @@ int synproxy_net_id; EXPORT_SYMBOL_GPL(synproxy_net_id); -void +bool synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, const struct tcphdr *th, struct synproxy_options *opts) { @@ -32,7 +32,8 @@ synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, u8 buf[40], *ptr; ptr = skb_header_pointer(skb, doff + sizeof(*th), length, buf); - BUG_ON(ptr == NULL); + if (ptr == NULL) + return false; opts->options = 0; while (length > 0) { @@ -41,16 +42,16 @@ synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, switch (opcode) { case TCPOPT_EOL: - return; + return true; case TCPOPT_NOP: length--; continue; default: opsize = *ptr++; if (opsize < 2) - return; + return true; if (opsize > length) - return; + return true; switch (opcode) { case TCPOPT_MSS: @@ -84,6 +85,7 @@ synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, length -= opsize; } } + return true; } EXPORT_SYMBOL_GPL(synproxy_parse_options); diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 32ad015ee8ce..a2fef8b10b96 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -285,7 +285,7 @@ static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q) /* remove one skb from head of flow queue */ -static struct sk_buff *fq_dequeue_head(struct fq_flow *flow) +static struct sk_buff *fq_dequeue_head(struct Qdisc *sch, struct fq_flow *flow) { struct sk_buff *skb = flow->head; @@ -293,6 +293,8 @@ static struct sk_buff *fq_dequeue_head(struct fq_flow *flow) flow->head = skb->next; skb->next = NULL; flow->qlen--; + sch->qstats.backlog -= qdisc_pkt_len(skb); + sch->q.qlen--; } return skb; } @@ -418,8 +420,9 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) struct fq_flow_head *head; struct sk_buff *skb; struct fq_flow *f; + u32 rate; - skb = fq_dequeue_head(&q->internal); + skb = fq_dequeue_head(sch, &q->internal); if (skb) goto out; fq_check_throttled(q, now); @@ -449,7 +452,7 @@ begin: goto begin; } - skb = fq_dequeue_head(f); + skb = fq_dequeue_head(sch, f); if (!skb) { head->first = f->next; /* force a pass through old_flows to prevent starvation */ @@ -466,43 +469,74 @@ begin: f->time_next_packet = now; f->credit -= qdisc_pkt_len(skb); - if (f->credit <= 0 && - q->rate_enable && - skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) { - u32 rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate; + if (f->credit > 0 || !q->rate_enable) + goto out; - rate = min(rate, q->flow_max_rate); - if (rate) { - u64 len = (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC; - - do_div(len, rate); - /* Since socket rate can change later, - * clamp the delay to 125 ms. - * TODO: maybe segment the too big skb, as in commit - * e43ac79a4bc ("sch_tbf: segment too big GSO packets") - */ - if (unlikely(len > 125 * NSEC_PER_MSEC)) { - len = 125 * NSEC_PER_MSEC; - q->stat_pkts_too_long++; - } + if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) { + rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate; - f->time_next_packet = now + len; + rate = min(rate, q->flow_max_rate); + } else { + rate = q->flow_max_rate; + if (rate == ~0U) + goto out; + } + if (rate) { + u32 plen = max(qdisc_pkt_len(skb), q->quantum); + u64 len = (u64)plen * NSEC_PER_SEC; + + do_div(len, rate); + /* Since socket rate can change later, + * clamp the delay to 125 ms. + * TODO: maybe segment the too big skb, as in commit + * e43ac79a4bc ("sch_tbf: segment too big GSO packets") + */ + if (unlikely(len > 125 * NSEC_PER_MSEC)) { + len = 125 * NSEC_PER_MSEC; + q->stat_pkts_too_long++; } + + f->time_next_packet = now + len; } out: - sch->qstats.backlog -= qdisc_pkt_len(skb); qdisc_bstats_update(sch, skb); - sch->q.qlen--; qdisc_unthrottled(sch); return skb; } static void fq_reset(struct Qdisc *sch) { + struct fq_sched_data *q = qdisc_priv(sch); + struct rb_root *root; struct sk_buff *skb; + struct rb_node *p; + struct fq_flow *f; + unsigned int idx; - while ((skb = fq_dequeue(sch)) != NULL) + while ((skb = fq_dequeue_head(sch, &q->internal)) != NULL) kfree_skb(skb); + + if (!q->fq_root) + return; + + for (idx = 0; idx < (1U << q->fq_trees_log); idx++) { + root = &q->fq_root[idx]; + while ((p = rb_first(root)) != NULL) { + f = container_of(p, struct fq_flow, fq_node); + rb_erase(p, root); + + while ((skb = fq_dequeue_head(sch, f)) != NULL) + kfree_skb(skb); + + kmem_cache_free(fq_flow_cachep, f); + } + } + q->new_flows.first = NULL; + q->old_flows.first = NULL; + q->delayed = RB_ROOT; + q->flows = 0; + q->inactive_flows = 0; + q->throttled_flows = 0; } static void fq_rehash(struct fq_sched_data *q, @@ -645,6 +679,8 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt) while (sch->q.qlen > sch->limit) { struct sk_buff *skb = fq_dequeue(sch); + if (!skb) + break; kfree_skb(skb); drop_count++; } @@ -657,21 +693,9 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt) static void fq_destroy(struct Qdisc *sch) { struct fq_sched_data *q = qdisc_priv(sch); - struct rb_root *root; - struct rb_node *p; - unsigned int idx; - if (q->fq_root) { - for (idx = 0; idx < (1U << q->fq_trees_log); idx++) { - root = &q->fq_root[idx]; - while ((p = rb_first(root)) != NULL) { - rb_erase(p, root); - kmem_cache_free(fq_flow_cachep, - container_of(p, struct fq_flow, fq_node)); - } - } - kfree(q->fq_root); - } + fq_reset(sch); + kfree(q->fq_root); qdisc_watchdog_cancel(&q->watchdog); } diff --git a/net/sysctl_net.c b/net/sysctl_net.c index 9bc6db04be3e..e7000be321b0 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -47,12 +47,12 @@ static int net_ctl_permissions(struct ctl_table_header *head, /* Allow network administrator to have same access as root. */ if (ns_capable(net->user_ns, CAP_NET_ADMIN) || - uid_eq(root_uid, current_uid())) { + uid_eq(root_uid, current_euid())) { int mode = (table->mode >> 6) & 7; return (mode << 6) | (mode << 3) | mode; } /* Allow netns root group to have the same access as the root group */ - if (gid_eq(root_gid, current_gid())) { + if (in_egroup_p(root_gid)) { int mode = (table->mode >> 3) & 7; return (mode << 3) | mode; } diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index 95c2b2689a03..7db9954f1af2 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -580,15 +580,13 @@ static struct aa_namespace *__next_namespace(struct aa_namespace *root, /* check if the next ns is a sibling, parent, gp, .. */ parent = ns->parent; - while (parent) { + while (ns != root) { mutex_unlock(&ns->lock); next = list_entry_next(ns, base.list); if (!list_entry_is_head(next, &parent->sub_ns, base.list)) { mutex_lock(&next->lock); return next; } - if (parent == root) - return NULL; ns = parent; parent = parent->parent; } diff --git a/security/apparmor/crypto.c b/security/apparmor/crypto.c index d6222ba4e919..532471d0b3a0 100644 --- a/security/apparmor/crypto.c +++ b/security/apparmor/crypto.c @@ -15,14 +15,14 @@ * it should be. */ -#include +#include #include "include/apparmor.h" #include "include/crypto.h" static unsigned int apparmor_hash_size; -static struct crypto_hash *apparmor_tfm; +static struct crypto_shash *apparmor_tfm; unsigned int aa_hash_size(void) { @@ -32,35 +32,33 @@ unsigned int aa_hash_size(void) int aa_calc_profile_hash(struct aa_profile *profile, u32 version, void *start, size_t len) { - struct scatterlist sg[2]; - struct hash_desc desc = { - .tfm = apparmor_tfm, - .flags = 0 - }; + struct { + struct shash_desc shash; + char ctx[crypto_shash_descsize(apparmor_tfm)]; + } desc; int error = -ENOMEM; u32 le32_version = cpu_to_le32(version); if (!apparmor_tfm) return 0; - sg_init_table(sg, 2); - sg_set_buf(&sg[0], &le32_version, 4); - sg_set_buf(&sg[1], (u8 *) start, len); - profile->hash = kzalloc(apparmor_hash_size, GFP_KERNEL); if (!profile->hash) goto fail; - error = crypto_hash_init(&desc); + desc.shash.tfm = apparmor_tfm; + desc.shash.flags = 0; + + error = crypto_shash_init(&desc.shash); if (error) goto fail; - error = crypto_hash_update(&desc, &sg[0], 4); + error = crypto_shash_update(&desc.shash, (u8 *) &le32_version, 4); if (error) goto fail; - error = crypto_hash_update(&desc, &sg[1], len); + error = crypto_shash_update(&desc.shash, (u8 *) start, len); if (error) goto fail; - error = crypto_hash_final(&desc, profile->hash); + error = crypto_shash_final(&desc.shash, profile->hash); if (error) goto fail; @@ -75,19 +73,19 @@ fail: static int __init init_profile_hash(void) { - struct crypto_hash *tfm; + struct crypto_shash *tfm; if (!apparmor_initialized) return 0; - tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC); + tfm = crypto_alloc_shash("sha1", 0, CRYPTO_ALG_ASYNC); if (IS_ERR(tfm)) { int error = PTR_ERR(tfm); AA_ERROR("failed to setup profile sha1 hashing: %d\n", error); return error; } apparmor_tfm = tfm; - apparmor_hash_size = crypto_hash_digestsize(apparmor_tfm); + apparmor_hash_size = crypto_shash_digestsize(apparmor_tfm); aa_info_message("AppArmor sha1 policy hashing enabled"); diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h index f2d4b6348cbc..c28b0f20ab53 100644 --- a/security/apparmor/include/policy.h +++ b/security/apparmor/include/policy.h @@ -360,7 +360,9 @@ static inline void aa_put_replacedby(struct aa_replacedby *p) static inline void __aa_update_replacedby(struct aa_profile *orig, struct aa_profile *new) { - struct aa_profile *tmp = rcu_dereference(orig->replacedby->profile); + struct aa_profile *tmp; + tmp = rcu_dereference_protected(orig->replacedby->profile, + mutex_is_locked(&orig->ns->lock)); rcu_assign_pointer(orig->replacedby->profile, aa_get_profile(new)); orig->flags |= PFLAG_INVALID; aa_put_profile(tmp); diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c index 6172509fa2b7..705c2879d3a9 100644 --- a/security/apparmor/policy.c +++ b/security/apparmor/policy.c @@ -563,7 +563,8 @@ void __init aa_free_root_ns(void) static void free_replacedby(struct aa_replacedby *r) { if (r) { - aa_put_profile(rcu_dereference(r->profile)); + /* r->profile will not be updated any more as r is dead */ + aa_put_profile(rcu_dereference_protected(r->profile, true)); kzfree(r); } } @@ -609,6 +610,7 @@ void aa_free_profile(struct aa_profile *profile) aa_put_dfa(profile->policy.dfa); aa_put_replacedby(profile->replacedby); + kzfree(profile->hash); kzfree(profile); } diff --git a/security/selinux/avc.c b/security/selinux/avc.c index dad36a6ab45f..fc3e6628a864 100644 --- a/security/selinux/avc.c +++ b/security/selinux/avc.c @@ -746,7 +746,6 @@ inline int avc_has_perm_noaudit(u32 ssid, u32 tsid, * @tclass: target security class * @requested: requested permissions, interpreted based on @tclass * @auditdata: auxiliary audit data - * @flags: VFS walk flags * * Check the AVC to determine whether the @requested permissions are granted * for the SID pair (@ssid, @tsid), interpreting the permissions @@ -756,17 +755,15 @@ inline int avc_has_perm_noaudit(u32 ssid, u32 tsid, * permissions are granted, -%EACCES if any permissions are denied, or * another -errno upon other errors. */ -int avc_has_perm_flags(u32 ssid, u32 tsid, u16 tclass, - u32 requested, struct common_audit_data *auditdata, - unsigned flags) +int avc_has_perm(u32 ssid, u32 tsid, u16 tclass, + u32 requested, struct common_audit_data *auditdata) { struct av_decision avd; int rc, rc2; rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd); - rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata, - flags); + rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata); if (rc2) return rc2; return rc; diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index a5091ec06aa6..5b5231068516 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1502,7 +1502,7 @@ static int cred_has_capability(const struct cred *cred, rc = avc_has_perm_noaudit(sid, sid, sclass, av, 0, &avd); if (audit == SECURITY_CAP_AUDIT) { - int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad, 0); + int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad); if (rc2) return rc2; } @@ -1525,8 +1525,7 @@ static int task_has_system(struct task_struct *tsk, static int inode_has_perm(const struct cred *cred, struct inode *inode, u32 perms, - struct common_audit_data *adp, - unsigned flags) + struct common_audit_data *adp) { struct inode_security_struct *isec; u32 sid; @@ -1539,7 +1538,7 @@ static int inode_has_perm(const struct cred *cred, sid = cred_sid(cred); isec = inode->i_security; - return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, flags); + return avc_has_perm(sid, isec->sid, isec->sclass, perms, adp); } /* Same as inode_has_perm, but pass explicit audit data containing @@ -1554,7 +1553,7 @@ static inline int dentry_has_perm(const struct cred *cred, ad.type = LSM_AUDIT_DATA_DENTRY; ad.u.dentry = dentry; - return inode_has_perm(cred, inode, av, &ad, 0); + return inode_has_perm(cred, inode, av, &ad); } /* Same as inode_has_perm, but pass explicit audit data containing @@ -1569,7 +1568,7 @@ static inline int path_has_perm(const struct cred *cred, ad.type = LSM_AUDIT_DATA_PATH; ad.u.path = *path; - return inode_has_perm(cred, inode, av, &ad, 0); + return inode_has_perm(cred, inode, av, &ad); } /* Same as path_has_perm, but uses the inode from the file struct. */ @@ -1581,7 +1580,7 @@ static inline int file_path_has_perm(const struct cred *cred, ad.type = LSM_AUDIT_DATA_PATH; ad.u.path = file->f_path; - return inode_has_perm(cred, file_inode(file), av, &ad, 0); + return inode_has_perm(cred, file_inode(file), av, &ad); } /* Check whether a task can use an open file descriptor to @@ -1617,7 +1616,7 @@ static int file_has_perm(const struct cred *cred, /* av is zero if only checking access to the descriptor. */ rc = 0; if (av) - rc = inode_has_perm(cred, inode, av, &ad, 0); + rc = inode_has_perm(cred, inode, av, &ad); out: return rc; diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h index 92d0ab561db8..f53ee3c58d0f 100644 --- a/security/selinux/include/avc.h +++ b/security/selinux/include/avc.h @@ -130,7 +130,7 @@ static inline int avc_audit(u32 ssid, u32 tsid, u16 tclass, u32 requested, struct av_decision *avd, int result, - struct common_audit_data *a, unsigned flags) + struct common_audit_data *a) { u32 audited, denied; audited = avc_audit_required(requested, avd, result, 0, &denied); @@ -138,7 +138,7 @@ static inline int avc_audit(u32 ssid, u32 tsid, return 0; return slow_avc_audit(ssid, tsid, tclass, requested, audited, denied, - a, flags); + a, 0); } #define AVC_STRICT 1 /* Ignore permissive mode. */ @@ -147,17 +147,9 @@ int avc_has_perm_noaudit(u32 ssid, u32 tsid, unsigned flags, struct av_decision *avd); -int avc_has_perm_flags(u32 ssid, u32 tsid, - u16 tclass, u32 requested, - struct common_audit_data *auditdata, - unsigned); - -static inline int avc_has_perm(u32 ssid, u32 tsid, - u16 tclass, u32 requested, - struct common_audit_data *auditdata) -{ - return avc_has_perm_flags(ssid, tsid, tclass, requested, auditdata, 0); -} +int avc_has_perm(u32 ssid, u32 tsid, + u16 tclass, u32 requested, + struct common_audit_data *auditdata); u32 avc_policy_seqno(void); diff --git a/sound/core/compress_offload.c b/sound/core/compress_offload.c index 98969541cbcc..bea523a5d852 100644 --- a/sound/core/compress_offload.c +++ b/sound/core/compress_offload.c @@ -139,6 +139,18 @@ static int snd_compr_open(struct inode *inode, struct file *f) static int snd_compr_free(struct inode *inode, struct file *f) { struct snd_compr_file *data = f->private_data; + struct snd_compr_runtime *runtime = data->stream.runtime; + + switch (runtime->state) { + case SNDRV_PCM_STATE_RUNNING: + case SNDRV_PCM_STATE_DRAINING: + case SNDRV_PCM_STATE_PAUSED: + data->stream.ops->trigger(&data->stream, SNDRV_PCM_TRIGGER_STOP); + break; + default: + break; + } + data->stream.ops->free(&data->stream); kfree(data->stream.runtime->buffer); kfree(data->stream.runtime); @@ -837,7 +849,8 @@ static int snd_compress_dev_disconnect(struct snd_device *device) struct snd_compr *compr; compr = device->device_data; - snd_unregister_device(compr->direction, compr->card, compr->device); + snd_unregister_device(SNDRV_DEVICE_TYPE_COMPRESS, compr->card, + compr->device); return 0; } diff --git a/sound/pci/ac97/ac97_codec.c b/sound/pci/ac97/ac97_codec.c index 445ca481d8d3..bf578ba2677e 100644 --- a/sound/pci/ac97/ac97_codec.c +++ b/sound/pci/ac97/ac97_codec.c @@ -175,6 +175,7 @@ static const struct ac97_codec_id snd_ac97_codec_ids[] = { { 0x54524106, 0xffffffff, "TR28026", NULL, NULL }, { 0x54524108, 0xffffffff, "TR28028", patch_tritech_tr28028, NULL }, // added by xin jin [07/09/99] { 0x54524123, 0xffffffff, "TR28602", NULL, NULL }, // only guess --jk [TR28023 = eMicro EM28023 (new CT1297)] +{ 0x54584e03, 0xffffffff, "TLV320AIC27", NULL, NULL }, { 0x54584e20, 0xffffffff, "TLC320AD9xC", NULL, NULL }, { 0x56494161, 0xffffffff, "VIA1612A", NULL, NULL }, // modified ICE1232 with S/PDIF { 0x56494170, 0xffffffff, "VIA1617A", patch_vt1617a, NULL }, // modified VT1616 with S/PDIF diff --git a/sound/pci/hda/hda_generic.c b/sound/pci/hda/hda_generic.c index ac41e9cdc976..26ad4f0aade3 100644 --- a/sound/pci/hda/hda_generic.c +++ b/sound/pci/hda/hda_generic.c @@ -3531,7 +3531,7 @@ static int create_capture_mixers(struct hda_codec *codec) if (!multi) err = create_single_cap_vol_ctl(codec, n, vol, sw, inv_dmic); - else if (!multi_cap_vol) + else if (!multi_cap_vol && !inv_dmic) err = create_bind_cap_vol_ctl(codec, n, vol, sw); else err = create_multi_cap_vol_ctl(codec); diff --git a/sound/pci/hda/patch_cirrus.c b/sound/pci/hda/patch_cirrus.c index b524f89a1f13..18d972501585 100644 --- a/sound/pci/hda/patch_cirrus.c +++ b/sound/pci/hda/patch_cirrus.c @@ -111,6 +111,9 @@ enum { /* 0x0009 - 0x0014 -> 12 test regs */ /* 0x0015 - visibility reg */ +/* Cirrus Logic CS4208 */ +#define CS4208_VENDOR_NID 0x24 + /* * Cirrus Logic CS4210 * @@ -223,6 +226,16 @@ static const struct hda_verb cs_coef_init_verbs[] = { {} /* terminator */ }; +static const struct hda_verb cs4208_coef_init_verbs[] = { + {0x01, AC_VERB_SET_POWER_STATE, 0x00}, /* AFG: D0 */ + {0x24, AC_VERB_SET_PROC_STATE, 0x01}, /* VPW: processing on */ + {0x24, AC_VERB_SET_COEF_INDEX, 0x0033}, + {0x24, AC_VERB_SET_PROC_COEF, 0x0001}, /* A1 ICS */ + {0x24, AC_VERB_SET_COEF_INDEX, 0x0034}, + {0x24, AC_VERB_SET_PROC_COEF, 0x1C01}, /* A1 Enable, A Thresh = 300mV */ + {} /* terminator */ +}; + /* Errata: CS4207 rev C0/C1/C2 Silicon * * http://www.cirrus.com/en/pubs/errata/ER880C3.pdf @@ -295,6 +308,8 @@ static int cs_init(struct hda_codec *codec) /* init_verb sequence for C0/C1/C2 errata*/ snd_hda_sequence_write(codec, cs_errata_init_verbs); snd_hda_sequence_write(codec, cs_coef_init_verbs); + } else if (spec->vendor_nid == CS4208_VENDOR_NID) { + snd_hda_sequence_write(codec, cs4208_coef_init_verbs); } snd_hda_gen_init(codec); @@ -434,6 +449,29 @@ static const struct hda_pintbl mba42_pincfgs[] = { {} /* terminator */ }; +static const struct hda_pintbl mba6_pincfgs[] = { + { 0x10, 0x032120f0 }, /* HP */ + { 0x11, 0x500000f0 }, + { 0x12, 0x90100010 }, /* Speaker */ + { 0x13, 0x500000f0 }, + { 0x14, 0x500000f0 }, + { 0x15, 0x770000f0 }, + { 0x16, 0x770000f0 }, + { 0x17, 0x430000f0 }, + { 0x18, 0x43ab9030 }, /* Mic */ + { 0x19, 0x770000f0 }, + { 0x1a, 0x770000f0 }, + { 0x1b, 0x770000f0 }, + { 0x1c, 0x90a00090 }, + { 0x1d, 0x500000f0 }, + { 0x1e, 0x500000f0 }, + { 0x1f, 0x500000f0 }, + { 0x20, 0x500000f0 }, + { 0x21, 0x430000f0 }, + { 0x22, 0x430000f0 }, + {} /* terminator */ +}; + static void cs420x_fixup_gpio_13(struct hda_codec *codec, const struct hda_fixup *fix, int action) { @@ -556,22 +594,23 @@ static int patch_cs420x(struct hda_codec *codec) /* * CS4208 support: - * Its layout is no longer compatible with CS4206/CS4207, and the generic - * parser seems working fairly well, except for trivial fixups. + * Its layout is no longer compatible with CS4206/CS4207 */ enum { + CS4208_MBA6, CS4208_GPIO0, }; static const struct hda_model_fixup cs4208_models[] = { { .id = CS4208_GPIO0, .name = "gpio0" }, + { .id = CS4208_MBA6, .name = "mba6" }, {} }; static const struct snd_pci_quirk cs4208_fixup_tbl[] = { /* codec SSID */ - SND_PCI_QUIRK(0x106b, 0x7100, "MacBookPro 6,1", CS4208_GPIO0), - SND_PCI_QUIRK(0x106b, 0x7200, "MacBookPro 6,2", CS4208_GPIO0), + SND_PCI_QUIRK(0x106b, 0x7100, "MacBookAir 6,1", CS4208_MBA6), + SND_PCI_QUIRK(0x106b, 0x7200, "MacBookAir 6,2", CS4208_MBA6), {} /* terminator */ }; @@ -588,18 +627,35 @@ static void cs4208_fixup_gpio0(struct hda_codec *codec, } static const struct hda_fixup cs4208_fixups[] = { + [CS4208_MBA6] = { + .type = HDA_FIXUP_PINS, + .v.pins = mba6_pincfgs, + .chained = true, + .chain_id = CS4208_GPIO0, + }, [CS4208_GPIO0] = { .type = HDA_FIXUP_FUNC, .v.func = cs4208_fixup_gpio0, }, }; +/* correct the 0dB offset of input pins */ +static void cs4208_fix_amp_caps(struct hda_codec *codec, hda_nid_t adc) +{ + unsigned int caps; + + caps = query_amp_caps(codec, adc, HDA_INPUT); + caps &= ~(AC_AMPCAP_OFFSET); + caps |= 0x02; + snd_hda_override_amp_caps(codec, adc, HDA_INPUT, caps); +} + static int patch_cs4208(struct hda_codec *codec) { struct cs_spec *spec; int err; - spec = cs_alloc_spec(codec, 0); /* no specific w/a */ + spec = cs_alloc_spec(codec, CS4208_VENDOR_NID); if (!spec) return -ENOMEM; @@ -609,6 +665,12 @@ static int patch_cs4208(struct hda_codec *codec) cs4208_fixups); snd_hda_apply_fixup(codec, HDA_FIXUP_ACT_PRE_PROBE); + snd_hda_override_wcaps(codec, 0x18, + get_wcaps(codec, 0x18) | AC_WCAP_STEREO); + cs4208_fix_amp_caps(codec, 0x18); + cs4208_fix_amp_caps(codec, 0x1b); + cs4208_fix_amp_caps(codec, 0x1c); + err = cs_parse_auto_config(codec); if (err < 0) goto error; diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c index 4edd2d0f9a3c..ec68eaea0336 100644 --- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -3231,6 +3231,7 @@ enum { CXT_FIXUP_INC_MIC_BOOST, CXT_FIXUP_HEADPHONE_MIC_PIN, CXT_FIXUP_HEADPHONE_MIC, + CXT_FIXUP_GPIO1, }; static void cxt_fixup_stereo_dmic(struct hda_codec *codec, @@ -3375,6 +3376,15 @@ static const struct hda_fixup cxt_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = cxt_fixup_headphone_mic, }, + [CXT_FIXUP_GPIO1] = { + .type = HDA_FIXUP_VERBS, + .v.verbs = (const struct hda_verb[]) { + { 0x01, AC_VERB_SET_GPIO_MASK, 0x01 }, + { 0x01, AC_VERB_SET_GPIO_DIRECTION, 0x01 }, + { 0x01, AC_VERB_SET_GPIO_DATA, 0x01 }, + { } + }, + }, }; static const struct snd_pci_quirk cxt5051_fixups[] = { @@ -3384,6 +3394,7 @@ static const struct snd_pci_quirk cxt5051_fixups[] = { static const struct snd_pci_quirk cxt5066_fixups[] = { SND_PCI_QUIRK(0x1025, 0x0543, "Acer Aspire One 522", CXT_FIXUP_STEREO_DMIC), + SND_PCI_QUIRK(0x1025, 0x054c, "Acer Aspire 3830TG", CXT_FIXUP_GPIO1), SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN), SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410), SND_PCI_QUIRK(0x17aa, 0x215e, "Lenovo T410", CXT_PINCFG_LENOVO_TP410), diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 3d8cd04455a6..50173d412ac5 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -936,6 +936,14 @@ static void hdmi_setup_audio_infoframe(struct hda_codec *codec, return; } + /* + * always configure channel mapping, it may have been changed by the + * user in the meantime + */ + hdmi_setup_channel_mapping(codec, pin_nid, non_pcm, ca, + channels, per_pin->chmap, + per_pin->chmap_set); + /* * sizeof(ai) is used instead of sizeof(*hdmi_ai) or * sizeof(*dp_ai) to avoid partial match/update problems when @@ -947,20 +955,10 @@ static void hdmi_setup_audio_infoframe(struct hda_codec *codec, "pin=%d channels=%d\n", pin_nid, channels); - hdmi_setup_channel_mapping(codec, pin_nid, non_pcm, ca, - channels, per_pin->chmap, - per_pin->chmap_set); hdmi_stop_infoframe_trans(codec, pin_nid); hdmi_fill_audio_infoframe(codec, pin_nid, ai.bytes, sizeof(ai)); hdmi_start_infoframe_trans(codec, pin_nid); - } else { - /* For non-pcm audio switch, setup new channel mapping - * accordingly */ - if (per_pin->non_pcm != non_pcm) - hdmi_setup_channel_mapping(codec, pin_nid, non_pcm, ca, - channels, per_pin->chmap, - per_pin->chmap_set); } per_pin->non_pcm = non_pcm; @@ -1149,32 +1147,43 @@ static int hdmi_choose_cvt(struct hda_codec *codec, } static void haswell_config_cvts(struct hda_codec *codec, - int pin_id, int mux_id) + hda_nid_t pin_nid, int mux_idx) { struct hdmi_spec *spec = codec->spec; - struct hdmi_spec_per_pin *per_pin; - int pin_idx, mux_idx; - int curr; - int err; + hda_nid_t nid, end_nid; + int cvt_idx, curr; + struct hdmi_spec_per_cvt *per_cvt; - for (pin_idx = 0; pin_idx < spec->num_pins; pin_idx++) { - per_pin = get_pin(spec, pin_idx); + /* configure all pins, including "no physical connection" ones */ + end_nid = codec->start_nid + codec->num_nodes; + for (nid = codec->start_nid; nid < end_nid; nid++) { + unsigned int wid_caps = get_wcaps(codec, nid); + unsigned int wid_type = get_wcaps_type(wid_caps); + + if (wid_type != AC_WID_PIN) + continue; - if (pin_idx == pin_id) + if (nid == pin_nid) continue; - curr = snd_hda_codec_read(codec, per_pin->pin_nid, 0, + curr = snd_hda_codec_read(codec, nid, 0, AC_VERB_GET_CONNECT_SEL, 0); + if (curr != mux_idx) + continue; - /* Choose another unused converter */ - if (curr == mux_id) { - err = hdmi_choose_cvt(codec, pin_idx, NULL, &mux_idx); - if (err < 0) - return; - snd_printdd("HDMI: choose converter %d for pin %d\n", mux_idx, pin_idx); - snd_hda_codec_write_cache(codec, per_pin->pin_nid, 0, + /* choose an unassigned converter. The conveters in the + * connection list are in the same order as in the codec. + */ + for (cvt_idx = 0; cvt_idx < spec->num_cvts; cvt_idx++) { + per_cvt = get_cvt(spec, cvt_idx); + if (!per_cvt->assigned) { + snd_printdd("choose cvt %d for pin nid %d\n", + cvt_idx, nid); + snd_hda_codec_write_cache(codec, nid, 0, AC_VERB_SET_CONNECT_SEL, - mux_idx); + cvt_idx); + break; + } } } } @@ -1216,7 +1225,7 @@ static int hdmi_pcm_open(struct hda_pcm_stream *hinfo, /* configure unused pins to choose other converters */ if (is_haswell(codec)) - haswell_config_cvts(codec, pin_idx, mux_idx); + haswell_config_cvts(codec, per_pin->pin_nid, mux_idx); snd_hda_spdif_ctls_assign(codec, pin_idx, per_cvt->cvt_nid); diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index bc07d369fac4..bf313bea7085 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2819,6 +2819,15 @@ static void alc269_fixup_hweq(struct hda_codec *codec, alc_write_coef_idx(codec, 0x1e, coef | 0x80); } +static void alc269_fixup_headset_mic(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + struct alc_spec *spec = codec->spec; + + if (action == HDA_FIXUP_ACT_PRE_PROBE) + spec->parse_flags |= HDA_PINCFG_HEADSET_MIC; +} + static void alc271_fixup_dmic(struct hda_codec *codec, const struct hda_fixup *fix, int action) { @@ -3439,6 +3448,9 @@ static void alc283_fixup_chromebook(struct hda_codec *codec, /* Set to manual mode */ val = alc_read_coef_idx(codec, 0x06); alc_write_coef_idx(codec, 0x06, val & ~0x000c); + /* Enable Line1 input control by verb */ + val = alc_read_coef_idx(codec, 0x1a); + alc_write_coef_idx(codec, 0x1a, val | (1 << 4)); break; } } @@ -3493,6 +3505,15 @@ static void alc282_fixup_asus_tx300(struct hda_codec *codec, } } +static void alc290_fixup_mono_speakers(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + if (action == HDA_FIXUP_ACT_PRE_PROBE) + /* Remove DAC node 0x03, as it seems to be + giving mono output */ + snd_hda_override_wcaps(codec, 0x03, 0); +} + enum { ALC269_FIXUP_SONY_VAIO, ALC275_FIXUP_SONY_VAIO_GPIO2, @@ -3504,6 +3525,7 @@ enum { ALC271_FIXUP_DMIC, ALC269_FIXUP_PCM_44K, ALC269_FIXUP_STEREO_DMIC, + ALC269_FIXUP_HEADSET_MIC, ALC269_FIXUP_QUANTA_MUTE, ALC269_FIXUP_LIFEBOOK, ALC269_FIXUP_AMIC, @@ -3516,9 +3538,11 @@ enum { ALC269_FIXUP_HP_GPIO_LED, ALC269_FIXUP_INV_DMIC, ALC269_FIXUP_LENOVO_DOCK, + ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_HEADSET_MODE, ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC, ALC269_FIXUP_ASUS_X101_FUNC, @@ -3531,6 +3555,8 @@ enum { ALC269VB_FIXUP_ORDISSIMO_EVE2, ALC283_FIXUP_CHROME_BOOK, ALC282_FIXUP_ASUS_TX300, + ALC283_FIXUP_INT_MIC, + ALC290_FIXUP_MONO_SPEAKERS, }; static const struct hda_fixup alc269_fixups[] = { @@ -3599,6 +3625,10 @@ static const struct hda_fixup alc269_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = alc269_fixup_stereo_dmic, }, + [ALC269_FIXUP_HEADSET_MIC] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_headset_mic, + }, [ALC269_FIXUP_QUANTA_MUTE] = { .type = HDA_FIXUP_FUNC, .v.func = alc269_fixup_quanta_mute, @@ -3708,6 +3738,15 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC }, + [ALC269_FIXUP_DELL3_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x1a, 0x01a1913c }, /* use as headset mic, without its own jack detect */ + { } + }, + .chained = true, + .chain_id = ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC + }, [ALC269_FIXUP_HEADSET_MODE] = { .type = HDA_FIXUP_FUNC, .v.func = alc_fixup_headset_mode, @@ -3716,6 +3755,15 @@ static const struct hda_fixup alc269_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = alc_fixup_headset_mode_no_hp_mic, }, + [ALC286_FIXUP_SONY_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x18, 0x01a1913c }, /* use as headset mic, without its own jack detect */ + { } + }, + .chained = true, + .chain_id = ALC269_FIXUP_HEADSET_MIC + }, [ALC269_FIXUP_ASUS_X101_FUNC] = { .type = HDA_FIXUP_FUNC, .v.func = alc269_fixup_x101_headset_mic, @@ -3790,6 +3838,22 @@ static const struct hda_fixup alc269_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = alc282_fixup_asus_tx300, }, + [ALC283_FIXUP_INT_MIC] = { + .type = HDA_FIXUP_VERBS, + .v.verbs = (const struct hda_verb[]) { + {0x20, AC_VERB_SET_COEF_INDEX, 0x1a}, + {0x20, AC_VERB_SET_PROC_COEF, 0x0011}, + { } + }, + .chained = true, + .chain_id = ALC269_FIXUP_LIMIT_INT_MIC_BOOST + }, + [ALC290_FIXUP_MONO_SPEAKERS] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc290_fixup_mono_speakers, + .chained = true, + .chain_id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, + }, }; static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -3831,6 +3895,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x0608, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0609, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0613, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0616, "Dell Vostro 5470", ALC290_FIXUP_MONO_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x15cc, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x15cd, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2), @@ -3853,6 +3918,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1043, 0x8398, "ASUS P1005", ALC269_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x1043, 0x83ce, "ASUS P1005", ALC269_FIXUP_STEREO_DMIC), SND_PCI_QUIRK(0x1043, 0x8516, "ASUS X101CH", ALC269_FIXUP_ASUS_X101), + SND_PCI_QUIRK(0x104d, 0x90b6, "Sony VAIO Pro 13", ALC286_FIXUP_SONY_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x104d, 0x9073, "Sony VAIO", ALC275_FIXUP_SONY_VAIO_GPIO2), SND_PCI_QUIRK(0x104d, 0x907b, "Sony VAIO", ALC275_FIXUP_SONY_HWEQ), SND_PCI_QUIRK(0x104d, 0x9084, "Sony VAIO", ALC275_FIXUP_SONY_HWEQ), @@ -3874,7 +3940,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x17aa, 0x2214, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x2215, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x5013, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), - SND_PCI_QUIRK(0x17aa, 0x501a, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), + SND_PCI_QUIRK(0x17aa, 0x501a, "Thinkpad", ALC283_FIXUP_INT_MIC), SND_PCI_QUIRK(0x17aa, 0x5026, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x5109, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x3bf8, "Quanta FL1", ALC269_FIXUP_PCM_44K), @@ -3938,6 +4004,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_STEREO_DMIC, .name = "alc269-dmic"}, {.id = ALC271_FIXUP_DMIC, .name = "alc271-dmic"}, {.id = ALC269_FIXUP_INV_DMIC, .name = "inv-dmic"}, + {.id = ALC269_FIXUP_HEADSET_MIC, .name = "headset-mic"}, {.id = ALC269_FIXUP_LENOVO_DOCK, .name = "lenovo-dock"}, {.id = ALC269_FIXUP_HP_GPIO_LED, .name = "hp-gpio-led"}, {.id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, .name = "dell-headset-multi"}, @@ -4555,6 +4622,7 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x05d8, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x05db, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x1632, "HP RP5800", ALC662_FIXUP_HP_RP5800), + SND_PCI_QUIRK(0x1043, 0x1477, "ASUS N56VZ", ALC662_FIXUP_ASUS_MODE4), SND_PCI_QUIRK(0x1043, 0x8469, "ASUS mobo", ALC662_FIXUP_NO_JACK_DETECT), SND_PCI_QUIRK(0x105b, 0x0cd6, "Foxconn", ALC662_FIXUP_ASUS_MODE2), SND_PCI_QUIRK(0x144d, 0xc051, "Samsung R720", ALC662_FIXUP_IDEAPAD), diff --git a/sound/pci/rme9652/hdsp.c b/sound/pci/rme9652/hdsp.c index 4f255dfee450..f59a321a6d6a 100644 --- a/sound/pci/rme9652/hdsp.c +++ b/sound/pci/rme9652/hdsp.c @@ -4845,6 +4845,7 @@ static int snd_hdsp_hwdep_ioctl(struct snd_hwdep *hw, struct file *file, unsigne if ((err = hdsp_get_iobox_version(hdsp)) < 0) return err; } + memset(&hdsp_version, 0, sizeof(hdsp_version)); hdsp_version.io_type = hdsp->io_type; hdsp_version.firmware_rev = hdsp->firmware_rev; if ((err = copy_to_user(argp, &hdsp_version, sizeof(hdsp_version)))) diff --git a/sound/soc/blackfin/bf6xx-i2s.c b/sound/soc/blackfin/bf6xx-i2s.c index c02405cc007d..5810a0603f2f 100644 --- a/sound/soc/blackfin/bf6xx-i2s.c +++ b/sound/soc/blackfin/bf6xx-i2s.c @@ -88,6 +88,7 @@ static int bfin_i2s_hw_params(struct snd_pcm_substream *substream, case SNDRV_PCM_FORMAT_S8: param.spctl |= 0x70; sport->wdsize = 1; + break; case SNDRV_PCM_FORMAT_S16_LE: param.spctl |= 0xf0; sport->wdsize = 2; diff --git a/sound/soc/codecs/88pm860x-codec.c b/sound/soc/codecs/88pm860x-codec.c index 8af04343cc1a..259d1ac4492f 100644 --- a/sound/soc/codecs/88pm860x-codec.c +++ b/sound/soc/codecs/88pm860x-codec.c @@ -349,6 +349,9 @@ static int snd_soc_put_volsw_2r_st(struct snd_kcontrol *kcontrol, val = ucontrol->value.integer.value[0]; val2 = ucontrol->value.integer.value[1]; + if (val >= ARRAY_SIZE(st_table) || val2 >= ARRAY_SIZE(st_table)) + return -EINVAL; + err = snd_soc_update_bits(codec, reg, 0x3f, st_table[val].m); if (err < 0) return err; diff --git a/sound/soc/codecs/ab8500-codec.c b/sound/soc/codecs/ab8500-codec.c index b8ba0adacfce..80555d7551e6 100644 --- a/sound/soc/codecs/ab8500-codec.c +++ b/sound/soc/codecs/ab8500-codec.c @@ -1225,13 +1225,18 @@ static int anc_status_control_put(struct snd_kcontrol *kcontrol, struct ab8500_codec_drvdata *drvdata = dev_get_drvdata(codec->dev); struct device *dev = codec->dev; bool apply_fir, apply_iir; - int req, status; + unsigned int req; + int status; dev_dbg(dev, "%s: Enter.\n", __func__); mutex_lock(&drvdata->anc_lock); req = ucontrol->value.integer.value[0]; + if (req >= ARRAY_SIZE(enum_anc_state)) { + status = -EINVAL; + goto cleanup; + } if (req != ANC_APPLY_FIR_IIR && req != ANC_APPLY_FIR && req != ANC_APPLY_IIR) { dev_err(dev, "%s: ERROR: Unsupported status to set '%s'!\n", diff --git a/sound/soc/codecs/max98095.c b/sound/soc/codecs/max98095.c index 41cdd1642970..8dbcacd44e6a 100644 --- a/sound/soc/codecs/max98095.c +++ b/sound/soc/codecs/max98095.c @@ -1863,7 +1863,7 @@ static int max98095_put_eq_enum(struct snd_kcontrol *kcontrol, struct max98095_pdata *pdata = max98095->pdata; int channel = max98095_get_eq_channel(kcontrol->id.name); struct max98095_cdata *cdata; - int sel = ucontrol->value.integer.value[0]; + unsigned int sel = ucontrol->value.integer.value[0]; struct max98095_eq_cfg *coef_set; int fs, best, best_val, i; int regmask, regsave; @@ -2016,7 +2016,7 @@ static int max98095_put_bq_enum(struct snd_kcontrol *kcontrol, struct max98095_pdata *pdata = max98095->pdata; int channel = max98095_get_bq_channel(codec, kcontrol->id.name); struct max98095_cdata *cdata; - int sel = ucontrol->value.integer.value[0]; + unsigned int sel = ucontrol->value.integer.value[0]; struct max98095_biquad_cfg *coef_set; int fs, best, best_val, i; int regmask, regsave; diff --git a/sound/soc/fsl/imx-sgtl5000.c b/sound/soc/fsl/imx-sgtl5000.c index 46c5b4fdfc52..ca1be1d9dcf0 100644 --- a/sound/soc/fsl/imx-sgtl5000.c +++ b/sound/soc/fsl/imx-sgtl5000.c @@ -62,7 +62,7 @@ static int imx_sgtl5000_probe(struct platform_device *pdev) struct device_node *ssi_np, *codec_np; struct platform_device *ssi_pdev; struct i2c_client *codec_dev; - struct imx_sgtl5000_data *data; + struct imx_sgtl5000_data *data = NULL; int int_port, ext_port; int ret; @@ -128,7 +128,7 @@ static int imx_sgtl5000_probe(struct platform_device *pdev) goto fail; } - data->codec_clk = devm_clk_get(&codec_dev->dev, NULL); + data->codec_clk = clk_get(&codec_dev->dev, NULL); if (IS_ERR(data->codec_clk)) { ret = PTR_ERR(data->codec_clk); goto fail; @@ -172,6 +172,8 @@ static int imx_sgtl5000_probe(struct platform_device *pdev) return 0; fail: + if (data && !IS_ERR(data->codec_clk)) + clk_put(data->codec_clk); if (ssi_np) of_node_put(ssi_np); if (codec_np) @@ -185,6 +187,7 @@ static int imx_sgtl5000_remove(struct platform_device *pdev) struct imx_sgtl5000_data *data = platform_get_drvdata(pdev); snd_soc_unregister_card(&data->card); + clk_put(data->codec_clk); return 0; } diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 4d0561312f3b..1a38be0d0ca8 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1380,7 +1380,6 @@ static int soc_probe_link_dais(struct snd_soc_card *card, int num, int order) return -ENODEV; list_add(&cpu_dai->dapm.list, &card->dapm_list); - snd_soc_dapm_new_dai_widgets(&cpu_dai->dapm, cpu_dai); } if (cpu_dai->driver->probe) { diff --git a/sound/usb/usx2y/us122l.c b/sound/usb/usx2y/us122l.c index d0323a693ba2..999550bbad40 100644 --- a/sound/usb/usx2y/us122l.c +++ b/sound/usb/usx2y/us122l.c @@ -262,7 +262,9 @@ static int usb_stream_hwdep_mmap(struct snd_hwdep *hw, } area->vm_ops = &usb_stream_hwdep_vm_ops; - area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; + area->vm_flags |= VM_DONTDUMP; + if (!read) + area->vm_flags |= VM_DONTEXPAND; area->vm_private_data = us122l; atomic_inc(&us122l->mmap_count); out: diff --git a/sound/usb/usx2y/usbusx2yaudio.c b/sound/usb/usx2y/usbusx2yaudio.c index 63fb5219f0f8..6234a51625b1 100644 --- a/sound/usb/usx2y/usbusx2yaudio.c +++ b/sound/usb/usx2y/usbusx2yaudio.c @@ -299,19 +299,6 @@ static void usX2Y_error_urb_status(struct usX2Ydev *usX2Y, usX2Y_clients_stop(usX2Y); } -static void usX2Y_error_sequence(struct usX2Ydev *usX2Y, - struct snd_usX2Y_substream *subs, struct urb *urb) -{ - snd_printk(KERN_ERR -"Sequence Error!(hcd_frame=%i ep=%i%s;wait=%i,frame=%i).\n" -"Most probably some urb of usb-frame %i is still missing.\n" -"Cause could be too long delays in usb-hcd interrupt handling.\n", - usb_get_current_frame_number(usX2Y->dev), - subs->endpoint, usb_pipein(urb->pipe) ? "in" : "out", - usX2Y->wait_iso_frame, urb->start_frame, usX2Y->wait_iso_frame); - usX2Y_clients_stop(usX2Y); -} - static void i_usX2Y_urb_complete(struct urb *urb) { struct snd_usX2Y_substream *subs = urb->context; @@ -328,12 +315,9 @@ static void i_usX2Y_urb_complete(struct urb *urb) usX2Y_error_urb_status(usX2Y, subs, urb); return; } - if (likely((urb->start_frame & 0xFFFF) == (usX2Y->wait_iso_frame & 0xFFFF))) - subs->completed_urb = urb; - else { - usX2Y_error_sequence(usX2Y, subs, urb); - return; - } + + subs->completed_urb = urb; + { struct snd_usX2Y_substream *capsubs = usX2Y->subs[SNDRV_PCM_STREAM_CAPTURE], *playbacksubs = usX2Y->subs[SNDRV_PCM_STREAM_PLAYBACK]; diff --git a/sound/usb/usx2y/usx2yhwdeppcm.c b/sound/usb/usx2y/usx2yhwdeppcm.c index f2a1acdc4d83..814d0e887c62 100644 --- a/sound/usb/usx2y/usx2yhwdeppcm.c +++ b/sound/usb/usx2y/usx2yhwdeppcm.c @@ -244,13 +244,8 @@ static void i_usX2Y_usbpcm_urb_complete(struct urb *urb) usX2Y_error_urb_status(usX2Y, subs, urb); return; } - if (likely((urb->start_frame & 0xFFFF) == (usX2Y->wait_iso_frame & 0xFFFF))) - subs->completed_urb = urb; - else { - usX2Y_error_sequence(usX2Y, subs, urb); - return; - } + subs->completed_urb = urb; capsubs = usX2Y->subs[SNDRV_PCM_STREAM_CAPTURE]; capsubs2 = usX2Y->subs[SNDRV_PCM_STREAM_CAPTURE + 2]; playbacksubs = usX2Y->subs[SNDRV_PCM_STREAM_PLAYBACK]; diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile index ca6cb779876a..fc1502098595 100644 --- a/tools/lib/traceevent/Makefile +++ b/tools/lib/traceevent/Makefile @@ -134,14 +134,14 @@ ifeq ($(VERBOSE),1) print_install = else Q = @ - print_compile = echo ' CC '$(OBJ); - print_app_build = echo ' BUILD '$(OBJ); - print_fpic_compile = echo ' CC FPIC '$(OBJ); - print_shared_lib_compile = echo ' BUILD SHARED LIB '$(OBJ); - print_plugin_obj_compile = echo ' CC PLUGIN OBJ '$(OBJ); - print_plugin_build = echo ' CC PLUGI '$(OBJ); - print_static_lib_build = echo ' BUILD STATIC LIB '$(OBJ); - print_install = echo ' INSTALL '$1' to $(DESTDIR_SQ)$2'; + print_compile = echo ' CC '$(OBJ); + print_app_build = echo ' BUILD '$(OBJ); + print_fpic_compile = echo ' CC FPIC '$(OBJ); + print_shared_lib_compile = echo ' BUILD SHARED LIB '$(OBJ); + print_plugin_obj_compile = echo ' BUILD PLUGIN OBJ '$(OBJ); + print_plugin_build = echo ' BUILD PLUGIN '$(OBJ); + print_static_lib_build = echo ' BUILD STATIC LIB '$(OBJ); + print_install = echo ' INSTALL '$1' to $(DESTDIR_SQ)$2'; endif do_fpic_compile = \ @@ -268,7 +268,7 @@ TRACK_CFLAGS = $(subst ','\'',$(CFLAGS)):$(ARCH):$(CROSS_COMPILE) TRACEEVENT-CFLAGS: force @FLAGS='$(TRACK_CFLAGS)'; \ if test x"$$FLAGS" != x"`cat TRACEEVENT-CFLAGS 2>/dev/null`" ; then \ - echo 1>&2 " * new build flags or cross compiler"; \ + echo 1>&2 " FLAGS: * new build flags or cross compiler"; \ echo "$$FLAGS" >TRACEEVENT-CFLAGS; \ fi diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore index 8f8fbc227a46..782d86e961b9 100644 --- a/tools/perf/.gitignore +++ b/tools/perf/.gitignore @@ -13,6 +13,7 @@ perf*.html common-cmds.h perf.data perf.data.old +output.svg perf-archive tags TAGS diff --git a/tools/perf/Documentation/Makefile b/tools/perf/Documentation/Makefile index 5a37a7c84e69..3ba1c0b09908 100644 --- a/tools/perf/Documentation/Makefile +++ b/tools/perf/Documentation/Makefile @@ -145,16 +145,17 @@ endif ifneq ($(findstring $(MAKEFLAGS),s),s) ifneq ($(V),1) - QUIET_ASCIIDOC = @echo ' ' ASCIIDOC $@; - QUIET_XMLTO = @echo ' ' XMLTO $@; - QUIET_DB2TEXI = @echo ' ' DB2TEXI $@; - QUIET_MAKEINFO = @echo ' ' MAKEINFO $@; - QUIET_DBLATEX = @echo ' ' DBLATEX $@; - QUIET_XSLTPROC = @echo ' ' XSLTPROC $@; - QUIET_GEN = @echo ' ' GEN $@; + QUIET_ASCIIDOC = @echo ' ASCIIDOC '$@; + QUIET_XMLTO = @echo ' XMLTO '$@; + QUIET_DB2TEXI = @echo ' DB2TEXI '$@; + QUIET_MAKEINFO = @echo ' MAKEINFO '$@; + QUIET_DBLATEX = @echo ' DBLATEX '$@; + QUIET_XSLTPROC = @echo ' XSLTPROC '$@; + QUIET_GEN = @echo ' GEN '$@; QUIET_STDERR = 2> /dev/null QUIET_SUBDIR0 = +@subdir= - QUIET_SUBDIR1 = ;$(NO_SUBDIR) echo ' ' SUBDIR $$subdir; \ + QUIET_SUBDIR1 = ;$(NO_SUBDIR) \ + echo ' SUBDIR ' $$subdir; \ $(MAKE) $(PRINT_DIR) -C $$subdir export V endif @@ -183,47 +184,43 @@ ifdef missing_tools endif do-install-man: man - $(INSTALL) -d -m 755 $(DESTDIR)$(man1dir) -# $(INSTALL) -d -m 755 $(DESTDIR)$(man5dir) -# $(INSTALL) -d -m 755 $(DESTDIR)$(man7dir) - $(INSTALL) -m 644 $(DOC_MAN1) $(DESTDIR)$(man1dir) -# $(INSTALL) -m 644 $(DOC_MAN5) $(DESTDIR)$(man5dir) -# $(INSTALL) -m 644 $(DOC_MAN7) $(DESTDIR)$(man7dir) + $(call QUIET_INSTALL, Documentation-man) \ + $(INSTALL) -d -m 755 $(DESTDIR)$(man1dir); \ +# $(INSTALL) -d -m 755 $(DESTDIR)$(man5dir); \ +# $(INSTALL) -d -m 755 $(DESTDIR)$(man7dir); \ + $(INSTALL) -m 644 $(DOC_MAN1) $(DESTDIR)$(man1dir); \ +# $(INSTALL) -m 644 $(DOC_MAN5) $(DESTDIR)$(man5dir); \ +# $(INSTALL) -m 644 $(DOC_MAN7) $(DESTDIR)$(man7dir) install-man: check-man-tools man -try-install-man: ifdef missing_tools - $(warning Please install $(missing_tools) to have the man pages installed) + DO_INSTALL_MAN = $(warning Please install $(missing_tools) to have the man pages installed) else - $(MAKE) do-install-man + DO_INSTALL_MAN = do-install-man endif +try-install-man: $(DO_INSTALL_MAN) + install-info: info - $(INSTALL) -d -m 755 $(DESTDIR)$(infodir) - $(INSTALL) -m 644 $(OUTPUT)perf.info $(OUTPUT)perfman.info $(DESTDIR)$(infodir) + $(call QUIET_INSTALL, Documentation-info) \ + $(INSTALL) -d -m 755 $(DESTDIR)$(infodir); \ + $(INSTALL) -m 644 $(OUTPUT)perf.info $(OUTPUT)perfman.info $(DESTDIR)$(infodir); \ if test -r $(DESTDIR)$(infodir)/dir; then \ - $(INSTALL_INFO) --info-dir=$(DESTDIR)$(infodir) perf.info ;\ - $(INSTALL_INFO) --info-dir=$(DESTDIR)$(infodir) perfman.info ;\ + $(INSTALL_INFO) --info-dir=$(DESTDIR)$(infodir) perf.info ;\ + $(INSTALL_INFO) --info-dir=$(DESTDIR)$(infodir) perfman.info ;\ else \ echo "No directory found in $(DESTDIR)$(infodir)" >&2 ; \ fi install-pdf: pdf - $(INSTALL) -d -m 755 $(DESTDIR)$(pdfdir) - $(INSTALL) -m 644 $(OUTPUT)user-manual.pdf $(DESTDIR)$(pdfdir) + $(call QUIET_INSTALL, Documentation-pdf) \ + $(INSTALL) -d -m 755 $(DESTDIR)$(pdfdir); \ + $(INSTALL) -m 644 $(OUTPUT)user-manual.pdf $(DESTDIR)$(pdfdir) #install-html: html # '$(SHELL_PATH_SQ)' ./install-webdoc.sh $(DESTDIR)$(htmldir) -ifneq ($(MAKECMDGOALS),clean) -ifneq ($(MAKECMDGOALS),tags) -$(OUTPUT)PERF-VERSION-FILE: .FORCE-PERF-VERSION-FILE - $(QUIET_SUBDIR0)../ $(QUIET_SUBDIR1) $(OUTPUT)PERF-VERSION-FILE - --include $(OUTPUT)PERF-VERSION-FILE -endif -endif # # Determine "include::" file references in asciidoc files. @@ -253,15 +250,17 @@ $(OUTPUT)cmd-list.made: cmd-list.perl ../command-list.txt $(MAN1_TXT) $(PERL_PATH) ./cmd-list.perl ../command-list.txt $(QUIET_STDERR) && \ date >$@ +CLEAN_FILES = \ + $(MAN_XML) $(addsuffix +,$(MAN_XML)) \ + $(MAN_HTML) $(addsuffix +,$(MAN_HTML)) \ + $(DOC_HTML) $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN7) \ + $(OUTPUT)*.texi $(OUTPUT)*.texi+ $(OUTPUT)*.texi++ \ + $(OUTPUT)perf.info $(OUTPUT)perfman.info \ + $(OUTPUT)howto-index.txt $(OUTPUT)howto/*.html $(OUTPUT)doc.dep \ + $(OUTPUT)technical/api-*.html $(OUTPUT)technical/api-index.txt \ + $(cmds_txt) $(OUTPUT)*.made clean: - $(RM) $(MAN_XML) $(addsuffix +,$(MAN_XML)) - $(RM) $(MAN_HTML) $(addsuffix +,$(MAN_HTML)) - $(RM) $(DOC_HTML) $(DOC_MAN1) $(DOC_MAN5) $(DOC_MAN7) - $(RM) $(OUTPUT)*.texi $(OUTPUT)*.texi+ $(OUTPUT)*.texi++ - $(RM) $(OUTPUT)perf.info $(OUTPUT)perfman.info - $(RM) $(OUTPUT)howto-index.txt $(OUTPUT)howto/*.html $(OUTPUT)doc.dep - $(RM) $(OUTPUT)technical/api-*.html $(OUTPUT)technical/api-index.txt - $(RM) $(cmds_txt) $(OUTPUT)*.made + $(call QUIET_CLEAN, Documentation) $(RM) $(CLEAN_FILES) $(MAN_HTML): $(OUTPUT)%.html : %.txt $(QUIET_ASCIIDOC)$(RM) $@+ $@ && \ @@ -342,5 +341,3 @@ $(patsubst %.txt,%.html,$(wildcard howto/*.txt)): %.html : %.txt #quick-install-html: # '$(SHELL_PATH_SQ)' ./install-doc-quick.sh $(HTML_REF) $(DESTDIR)$(htmldir) - -.PHONY: .FORCE-PERF-VERSION-FILE diff --git a/tools/perf/Documentation/perf-buildid-cache.txt b/tools/perf/Documentation/perf-buildid-cache.txt index e9a8349a7172..fd77d81ea748 100644 --- a/tools/perf/Documentation/perf-buildid-cache.txt +++ b/tools/perf/Documentation/perf-buildid-cache.txt @@ -21,6 +21,19 @@ OPTIONS -a:: --add=:: Add specified file to the cache. +-k:: +--kcore:: + Add specified kcore file to the cache. For the current host that is + /proc/kcore which requires root permissions to read. Be aware that + running 'perf buildid-cache' as root may update root's build-id cache + not the user's. Use the -v option to see where the file is created. + Note that the copied file contains only code sections not the whole core + image. Note also that files "kallsyms" and "modules" must also be in the + same directory and are also copied. All 3 files are created with read + permissions for root only. kcore will not be added if there is already a + kcore in the cache (with the same build-id) that has the same modules at + the same addresses. Use the -v option to see if a copy of kcore is + actually made. -r:: --remove=:: Remove specified file from the cache. diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt index ac84db2d2334..6a06cefe9642 100644 --- a/tools/perf/Documentation/perf-kvm.txt +++ b/tools/perf/Documentation/perf-kvm.txt @@ -109,7 +109,9 @@ STAT LIVE OPTIONS -m:: --mmap-pages=:: - Number of mmap data pages. Must be a power of two. + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. -a:: --all-cpus:: diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt index c7f5f55634ac..ab25be28c9dc 100644 --- a/tools/perf/Documentation/perf-lock.txt +++ b/tools/perf/Documentation/perf-lock.txt @@ -48,7 +48,7 @@ REPORT OPTIONS -k:: --key=:: Sorting key. Possible values: acquired (default), contended, - wait_total, wait_max, wait_min. + avg_wait, wait_total, wait_max, wait_min. INFO OPTIONS ------------ diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index e297b74471b8..f10ab63576d7 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -87,7 +87,9 @@ OPTIONS -m:: --mmap-pages=:: - Number of mmap data pages. Must be a power of two. + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. -g:: --call-graph:: @@ -166,6 +168,9 @@ following filters are defined: - u: only when the branch target is at the user level - k: only when the branch target is in the kernel - hv: only when the target is at the hypervisor level + - in_tx: only when the target is in a hardware transaction + - no_tx: only when the target is not in a hardware transaction + - abort_tx: only when the target is a hardware transaction abort + The option requires at least one branch type among any, any_call, any_ret, ind_call. @@ -176,12 +181,14 @@ is enabled for all the sampling events. The sampled branch type is the same for The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k Note that this feature may not be available on all processors. --W:: --weight:: Enable weightened sampling. An additional weight is recorded per sample and can be displayed with the weight and local_weight sort keys. This currently works for TSX abort events and some memory events in precise mode on modern Intel CPUs. +--transaction:: +Record transaction flags for transaction related events. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 2b8097ee39d8..10a279871251 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -71,7 +71,11 @@ OPTIONS entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - srcline: filename and line number executed at the time of sample. The - DWARF debuggin info must be provided. + DWARF debugging info must be provided. + - weight: Event specific weight, e.g. memory latency or transaction + abort cost. This is the global weight. + - local_weight: Local weight version of the weight above. + - transaction: Transaction abort flags. By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) @@ -85,6 +89,8 @@ OPTIONS - symbol_from: name of function branched from - symbol_to: name of function branched to - mispredict: "N" for predicted branch, "Y" for mispredicted branch + - in_tx: branch in TSX transaction + - abort: TSX transaction abort. And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. @@ -135,6 +141,14 @@ OPTIONS Default: fractal,0.5,callee,function. +--max-stack:: + Set the stack depth limit when parsing the callchain, anything + beyond the specified depth will be ignored. This is a trade-off + between information loss and faster processing especially for + workloads that can have a very long callchain stack. + + Default: 127 + -G:: --inverted:: alias for inverted caller based call graph. diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 73c9759005a3..80c7da6732f2 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -137,6 +137,11 @@ core number and the number of online logical processors on that physical process After starting the program, wait msecs before measuring. This is useful to filter out the startup phase of the program, which is often very different. +-T:: +--transaction:: + +Print statistics of transactional execution if supported. + EXAMPLES -------- diff --git a/tools/perf/Documentation/perf-timechart.txt b/tools/perf/Documentation/perf-timechart.txt index 1632b0efc757..3ff8bd4f0b4d 100644 --- a/tools/perf/Documentation/perf-timechart.txt +++ b/tools/perf/Documentation/perf-timechart.txt @@ -8,7 +8,8 @@ perf-timechart - Tool to visualize total system behavior during a workload SYNOPSIS -------- [verse] -'perf timechart' {record} +'perf timechart' record +'perf timechart' [] DESCRIPTION ----------- @@ -41,6 +42,18 @@ OPTIONS --symfs=:: Look for files with symbols relative to this directory. +EXAMPLES +-------- + +$ perf timechart record git pull + + [ perf record: Woken up 13 times to write data ] + [ perf record: Captured and wrote 4.253 MB perf.data (~185801 samples) ] + +$ perf timechart + + Written 10.2 seconds of trace to output.svg. + SEE ALSO -------- linkperf:perf-record[1] diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 58d6598a9686..c16a09e2f182 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -68,7 +68,9 @@ Default is to monitor all CPUS. -m :: --mmap-pages=:: - Number of mmapped data pages. + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. -p :: --pid=:: @@ -112,7 +114,8 @@ Default is to monitor all CPUS. -s:: --sort:: - Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight, local_weight. + Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight, + local_weight, abort, in_tx, transaction -n:: --show-nr-samples:: @@ -155,6 +158,14 @@ Default is to monitor all CPUS. Default: fractal,0.5,callee. +--max-stack:: + Set the stack depth limit when parsing the callchain, anything + beyond the specified depth will be ignored. This is a trade-off + between information loss and faster processing especially for + workloads that can have a very long callchain stack. + + Default: 127 + --ignore-callees=:: Ignore callees of the function(s) matching the given regex. This has the effect of collecting the callers of each such diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt index daccd2c0a48f..7b0497f95a75 100644 --- a/tools/perf/Documentation/perf-trace.txt +++ b/tools/perf/Documentation/perf-trace.txt @@ -9,6 +9,7 @@ SYNOPSIS -------- [verse] 'perf trace' +'perf trace record' DESCRIPTION ----------- @@ -16,9 +17,14 @@ This command will show the events associated with the target, initially syscalls, but other system events like pagefaults, task lifetime events, scheduling events, etc. -Initially this is a live mode only tool, but eventually will work with -perf.data files like the other tools, allowing a detached 'record' from -analysis phases. +This is a live mode tool in addition to working with perf.data files like +the other perf tools. Files can be generated using the 'perf record' command +but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*'). +Alernatively, the 'perf trace record' can be used as a shortcut to +automatically include the raw_syscalls events when writing events to a file. + +The following options apply to perf trace; options to perf trace record are +found in the perf record man page. OPTIONS ------- @@ -59,7 +65,9 @@ OPTIONS -m:: --mmap-pages=:: - Number of mmap data pages. Must be a power of two. + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. -C:: --cpu:: @@ -78,6 +86,21 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs. --input Process events from a given perf data file. +-T +--time + Print full timestamp rather time relative to first sample. + +--comm:: + Show process COMM right beside its ID, on by default, disable with --no-comm. + +--summary:: + Show a summary of syscalls by thread with min, max, and average times (in + msec) and relative stddev. + +--tool_stats:: + Show tool stats such as number of times fd->pathname was discovered thru + hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc. + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-script[1] diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 3a0ff7fb71b6..4835618a5608 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -1,818 +1,79 @@ -include ../scripts/Makefile.include - -# The default target of this Makefile is... -all: - -include config/utilities.mak - -# Define V to have a more verbose compile. -# -# Define O to save output files in a separate directory. -# -# Define ARCH as name of target architecture if you want cross-builds. -# -# Define CROSS_COMPILE as prefix name of compiler if you want cross-builds. -# -# Define NO_LIBPERL to disable perl script extension. -# -# Define NO_LIBPYTHON to disable python script extension. -# -# Define PYTHON to point to the python binary if the default -# `python' is not correct; for example: PYTHON=python2 -# -# Define PYTHON_CONFIG to point to the python-config binary if -# the default `$(PYTHON)-config' is not correct. # -# Define ASCIIDOC8 if you want to format documentation with AsciiDoc 8 +# This is a simple wrapper Makefile that calls the main Makefile.perf +# with a -j option to do parallel builds # -# Define DOCBOOK_XSL_172 if you want to format man pages with DocBook XSL v1.72. +# If you want to invoke the perf build in some non-standard way then +# you can use the 'make -f Makefile.perf' method to invoke it. # -# Define LDFLAGS=-static to build a static binary. -# -# Define EXTRA_CFLAGS=-m64 or EXTRA_CFLAGS=-m32 as appropriate for cross-builds. -# -# Define NO_DWARF if you do not want debug-info analysis feature at all. -# -# Define WERROR=0 to disable treating any warnings as errors. -# -# Define NO_NEWT if you do not want TUI support. (deprecated) -# -# Define NO_SLANG if you do not want TUI support. -# -# Define NO_GTK2 if you do not want GTK+ GUI support. + # -# Define NO_DEMANGLE if you do not want C++ symbol demangling. +# Clear out the built-in rules GNU make defines by default (such as .o targets), +# so that we pass through all targets to Makefile.perf: # -# Define NO_LIBELF if you do not want libelf dependency (e.g. cross-builds) +.SUFFIXES: + # -# Define NO_LIBUNWIND if you do not want libunwind dependency for dwarf -# backtrace post unwind. +# We don't want to pass along options like -j: # -# Define NO_BACKTRACE if you do not want stack backtrace debug feature +unexport MAKEFLAGS + # -# Define NO_LIBNUMA if you do not want numa perf benchmark +# Do a parallel build with multiple jobs, based on the number of CPUs online +# in this system: 'make -j8' on a 8-CPU system, etc. # -# Define NO_LIBAUDIT if you do not want libaudit support +# (To override it, run 'make JOBS=1' and similar.) # -# Define NO_LIBBIONIC if you do not want bionic support - -ifeq ($(srctree),) -srctree := $(patsubst %/,%,$(dir $(shell pwd))) -srctree := $(patsubst %/,%,$(dir $(srctree))) -#$(info Determined 'srctree' to be $(srctree)) -endif - -ifneq ($(objtree),) -#$(info Determined 'objtree' to be $(objtree)) -endif - -ifneq ($(OUTPUT),) -#$(info Determined 'OUTPUT' to be $(OUTPUT)) -endif - -$(OUTPUT)PERF-VERSION-FILE: .FORCE-PERF-VERSION-FILE - @$(SHELL_PATH) util/PERF-VERSION-GEN $(OUTPUT) - -CC = $(CROSS_COMPILE)gcc -AR = $(CROSS_COMPILE)ar - -RM = rm -f -MKDIR = mkdir -FIND = find -INSTALL = install -FLEX = flex -BISON = bison -STRIP = strip - -LK_DIR = $(srctree)/tools/lib/lk/ -TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/ - -# include config/Makefile by default and rule out -# non-config cases -config := 1 - -NON_CONFIG_TARGETS := clean TAGS tags cscope help - -ifdef MAKECMDGOALS -ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),) - config := 0 -endif +ifeq ($(JOBS),) + JOBS := $(shell grep -c ^processor /proc/cpuinfo 2>/dev/null) + ifeq ($(JOBS),) + JOBS := 1 + endif endif -ifeq ($(config),1) -include config/Makefile +# +# Only pass canonical directory names as the output directory: +# +ifneq ($(O),) + FULL_O := $(shell readlink -f $(O) || echo $(O)) endif -export prefix bindir sharedir sysconfdir - -# sparse is architecture-neutral, which means that we need to tell it -# explicitly what architecture to check for. Fix this up for yours.. -SPARSE_FLAGS = -D__BIG_ENDIAN__ -D__powerpc__ - -# Guard against environment variables -BUILTIN_OBJS = -LIB_H = -LIB_OBJS = -PYRF_OBJS = -SCRIPT_SH = - -SCRIPT_SH += perf-archive.sh - -grep-libs = $(filter -l%,$(1)) -strip-libs = $(filter-out -l%,$(1)) - -ifneq ($(OUTPUT),) - TE_PATH=$(OUTPUT) -ifneq ($(subdir),) - LK_PATH=$(OUTPUT)/../lib/lk/ -else - LK_PATH=$(OUTPUT) -endif +# +# Only accept the 'DEBUG' variable from the command line: +# +ifeq ("$(origin DEBUG)", "command line") + ifeq ($(DEBUG),) + override DEBUG = 0 + else + SET_DEBUG = "DEBUG=$(DEBUG)" + endif else - TE_PATH=$(TRACE_EVENT_DIR) - LK_PATH=$(LK_DIR) + override DEBUG = 0 endif -LIBTRACEEVENT = $(TE_PATH)libtraceevent.a -export LIBTRACEEVENT - -LIBLK = $(LK_PATH)liblk.a -export LIBLK - -# python extension build directories -PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/ -PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/ -PYTHON_EXTBUILD_TMP := $(PYTHON_EXTBUILD)tmp/ -export PYTHON_EXTBUILD_LIB PYTHON_EXTBUILD_TMP +define print_msg + @printf ' BUILD: Doing '\''make \033[33m-j'$(JOBS)'\033[m'\'' parallel build\n' +endef -python-clean := rm -rf $(PYTHON_EXTBUILD) $(OUTPUT)python/perf.so +define make + @$(MAKE) -f Makefile.perf --no-print-directory -j$(JOBS) O=$(FULL_O) $(SET_DEBUG) $@ +endef -PYTHON_EXT_SRCS := $(shell grep -v ^\# util/python-ext-sources) -PYTHON_EXT_DEPS := util/python-ext-sources util/setup.py $(LIBTRACEEVENT) $(LIBLK) - -$(OUTPUT)python/perf.so: $(PYTHON_EXT_SRCS) $(PYTHON_EXT_DEPS) - $(QUIET_GEN)CFLAGS='$(CFLAGS)' $(PYTHON_WORD) util/setup.py \ - --quiet build_ext; \ - mkdir -p $(OUTPUT)python && \ - cp $(PYTHON_EXTBUILD_LIB)perf.so $(OUTPUT)python/ # -# No Perl scripts right now: +# Needed if no target specified: # - -SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) +all: + $(print_msg) + $(make) # -# Single 'perf' binary right now: +# The clean target is not really parallel, don't print the jobs info: # -PROGRAMS += $(OUTPUT)perf - -# what 'all' will build and 'install' will install, in perfexecdir -ALL_PROGRAMS = $(PROGRAMS) $(SCRIPTS) - -# what 'all' will build but not install in perfexecdir -OTHER_PROGRAMS = $(OUTPUT)perf - -# Set paths to tools early so that they can be used for version tests. -ifndef SHELL_PATH - SHELL_PATH = /bin/sh -endif -ifndef PERL_PATH - PERL_PATH = /usr/bin/perl -endif - -export PERL_PATH - -$(OUTPUT)util/parse-events-flex.c: util/parse-events.l $(OUTPUT)util/parse-events-bison.c - $(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/parse-events-flex.h $(PARSER_DEBUG_FLEX) -t util/parse-events.l > $(OUTPUT)util/parse-events-flex.c - -$(OUTPUT)util/parse-events-bison.c: util/parse-events.y - $(QUIET_BISON)$(BISON) -v util/parse-events.y -d $(PARSER_DEBUG_BISON) -o $(OUTPUT)util/parse-events-bison.c -p parse_events_ - -$(OUTPUT)util/pmu-flex.c: util/pmu.l $(OUTPUT)util/pmu-bison.c - $(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/pmu-flex.h -t util/pmu.l > $(OUTPUT)util/pmu-flex.c - -$(OUTPUT)util/pmu-bison.c: util/pmu.y - $(QUIET_BISON)$(BISON) -v util/pmu.y -d -o $(OUTPUT)util/pmu-bison.c -p perf_pmu_ - -$(OUTPUT)util/parse-events.o: $(OUTPUT)util/parse-events-flex.c $(OUTPUT)util/parse-events-bison.c -$(OUTPUT)util/pmu.o: $(OUTPUT)util/pmu-flex.c $(OUTPUT)util/pmu-bison.c - -LIB_FILE=$(OUTPUT)libperf.a - -LIB_H += ../../include/uapi/linux/perf_event.h -LIB_H += ../../include/linux/rbtree.h -LIB_H += ../../include/linux/list.h -LIB_H += ../../include/uapi/linux/const.h -LIB_H += ../../include/linux/hash.h -LIB_H += ../../include/linux/stringify.h -LIB_H += util/include/linux/bitmap.h -LIB_H += util/include/linux/bitops.h -LIB_H += util/include/linux/compiler.h -LIB_H += util/include/linux/const.h -LIB_H += util/include/linux/ctype.h -LIB_H += util/include/linux/kernel.h -LIB_H += util/include/linux/list.h -LIB_H += util/include/linux/export.h -LIB_H += util/include/linux/magic.h -LIB_H += util/include/linux/poison.h -LIB_H += util/include/linux/prefetch.h -LIB_H += util/include/linux/rbtree.h -LIB_H += util/include/linux/rbtree_augmented.h -LIB_H += util/include/linux/string.h -LIB_H += util/include/linux/types.h -LIB_H += util/include/linux/linkage.h -LIB_H += util/include/asm/asm-offsets.h -LIB_H += util/include/asm/bug.h -LIB_H += util/include/asm/byteorder.h -LIB_H += util/include/asm/hweight.h -LIB_H += util/include/asm/swab.h -LIB_H += util/include/asm/system.h -LIB_H += util/include/asm/uaccess.h -LIB_H += util/include/dwarf-regs.h -LIB_H += util/include/asm/dwarf2.h -LIB_H += util/include/asm/cpufeature.h -LIB_H += util/include/asm/unistd_32.h -LIB_H += util/include/asm/unistd_64.h -LIB_H += perf.h -LIB_H += util/annotate.h -LIB_H += util/cache.h -LIB_H += util/callchain.h -LIB_H += util/build-id.h -LIB_H += util/debug.h -LIB_H += util/sysfs.h -LIB_H += util/pmu.h -LIB_H += util/event.h -LIB_H += util/evsel.h -LIB_H += util/evlist.h -LIB_H += util/exec_cmd.h -LIB_H += util/types.h -LIB_H += util/levenshtein.h -LIB_H += util/machine.h -LIB_H += util/map.h -LIB_H += util/parse-options.h -LIB_H += util/parse-events.h -LIB_H += util/quote.h -LIB_H += util/util.h -LIB_H += util/xyarray.h -LIB_H += util/header.h -LIB_H += util/help.h -LIB_H += util/session.h -LIB_H += util/strbuf.h -LIB_H += util/strlist.h -LIB_H += util/strfilter.h -LIB_H += util/svghelper.h -LIB_H += util/tool.h -LIB_H += util/run-command.h -LIB_H += util/sigchain.h -LIB_H += util/dso.h -LIB_H += util/symbol.h -LIB_H += util/color.h -LIB_H += util/values.h -LIB_H += util/sort.h -LIB_H += util/hist.h -LIB_H += util/thread.h -LIB_H += util/thread_map.h -LIB_H += util/trace-event.h -LIB_H += util/probe-finder.h -LIB_H += util/dwarf-aux.h -LIB_H += util/probe-event.h -LIB_H += util/pstack.h -LIB_H += util/cpumap.h -LIB_H += util/top.h -LIB_H += $(ARCH_INCLUDE) -LIB_H += util/cgroup.h -LIB_H += $(LIB_INCLUDE)traceevent/event-parse.h -LIB_H += util/target.h -LIB_H += util/rblist.h -LIB_H += util/intlist.h -LIB_H += util/perf_regs.h -LIB_H += util/unwind.h -LIB_H += util/vdso.h -LIB_H += ui/helpline.h -LIB_H += ui/progress.h -LIB_H += ui/util.h -LIB_H += ui/ui.h - -LIB_OBJS += $(OUTPUT)util/abspath.o -LIB_OBJS += $(OUTPUT)util/alias.o -LIB_OBJS += $(OUTPUT)util/annotate.o -LIB_OBJS += $(OUTPUT)util/build-id.o -LIB_OBJS += $(OUTPUT)util/config.o -LIB_OBJS += $(OUTPUT)util/ctype.o -LIB_OBJS += $(OUTPUT)util/sysfs.o -LIB_OBJS += $(OUTPUT)util/pmu.o -LIB_OBJS += $(OUTPUT)util/environment.o -LIB_OBJS += $(OUTPUT)util/event.o -LIB_OBJS += $(OUTPUT)util/evlist.o -LIB_OBJS += $(OUTPUT)util/evsel.o -LIB_OBJS += $(OUTPUT)util/exec_cmd.o -LIB_OBJS += $(OUTPUT)util/help.o -LIB_OBJS += $(OUTPUT)util/levenshtein.o -LIB_OBJS += $(OUTPUT)util/parse-options.o -LIB_OBJS += $(OUTPUT)util/parse-events.o -LIB_OBJS += $(OUTPUT)util/path.o -LIB_OBJS += $(OUTPUT)util/rbtree.o -LIB_OBJS += $(OUTPUT)util/bitmap.o -LIB_OBJS += $(OUTPUT)util/hweight.o -LIB_OBJS += $(OUTPUT)util/run-command.o -LIB_OBJS += $(OUTPUT)util/quote.o -LIB_OBJS += $(OUTPUT)util/strbuf.o -LIB_OBJS += $(OUTPUT)util/string.o -LIB_OBJS += $(OUTPUT)util/strlist.o -LIB_OBJS += $(OUTPUT)util/strfilter.o -LIB_OBJS += $(OUTPUT)util/top.o -LIB_OBJS += $(OUTPUT)util/usage.o -LIB_OBJS += $(OUTPUT)util/wrapper.o -LIB_OBJS += $(OUTPUT)util/sigchain.o -LIB_OBJS += $(OUTPUT)util/dso.o -LIB_OBJS += $(OUTPUT)util/symbol.o -LIB_OBJS += $(OUTPUT)util/symbol-elf.o -LIB_OBJS += $(OUTPUT)util/color.o -LIB_OBJS += $(OUTPUT)util/pager.o -LIB_OBJS += $(OUTPUT)util/header.o -LIB_OBJS += $(OUTPUT)util/callchain.o -LIB_OBJS += $(OUTPUT)util/values.o -LIB_OBJS += $(OUTPUT)util/debug.o -LIB_OBJS += $(OUTPUT)util/machine.o -LIB_OBJS += $(OUTPUT)util/map.o -LIB_OBJS += $(OUTPUT)util/pstack.o -LIB_OBJS += $(OUTPUT)util/session.o -LIB_OBJS += $(OUTPUT)util/thread.o -LIB_OBJS += $(OUTPUT)util/thread_map.o -LIB_OBJS += $(OUTPUT)util/trace-event-parse.o -LIB_OBJS += $(OUTPUT)util/parse-events-flex.o -LIB_OBJS += $(OUTPUT)util/parse-events-bison.o -LIB_OBJS += $(OUTPUT)util/pmu-flex.o -LIB_OBJS += $(OUTPUT)util/pmu-bison.o -LIB_OBJS += $(OUTPUT)util/trace-event-read.o -LIB_OBJS += $(OUTPUT)util/trace-event-info.o -LIB_OBJS += $(OUTPUT)util/trace-event-scripting.o -LIB_OBJS += $(OUTPUT)util/svghelper.o -LIB_OBJS += $(OUTPUT)util/sort.o -LIB_OBJS += $(OUTPUT)util/hist.o -LIB_OBJS += $(OUTPUT)util/probe-event.o -LIB_OBJS += $(OUTPUT)util/util.o -LIB_OBJS += $(OUTPUT)util/xyarray.o -LIB_OBJS += $(OUTPUT)util/cpumap.o -LIB_OBJS += $(OUTPUT)util/cgroup.o -LIB_OBJS += $(OUTPUT)util/target.o -LIB_OBJS += $(OUTPUT)util/rblist.o -LIB_OBJS += $(OUTPUT)util/intlist.o -LIB_OBJS += $(OUTPUT)util/vdso.o -LIB_OBJS += $(OUTPUT)util/stat.o -LIB_OBJS += $(OUTPUT)util/record.o - -LIB_OBJS += $(OUTPUT)ui/setup.o -LIB_OBJS += $(OUTPUT)ui/helpline.o -LIB_OBJS += $(OUTPUT)ui/progress.o -LIB_OBJS += $(OUTPUT)ui/util.o -LIB_OBJS += $(OUTPUT)ui/hist.o -LIB_OBJS += $(OUTPUT)ui/stdio/hist.o - -LIB_OBJS += $(OUTPUT)arch/common.o - -LIB_OBJS += $(OUTPUT)tests/parse-events.o -LIB_OBJS += $(OUTPUT)tests/dso-data.o -LIB_OBJS += $(OUTPUT)tests/attr.o -LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o -LIB_OBJS += $(OUTPUT)tests/open-syscall.o -LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o -LIB_OBJS += $(OUTPUT)tests/open-syscall-tp-fields.o -LIB_OBJS += $(OUTPUT)tests/mmap-basic.o -LIB_OBJS += $(OUTPUT)tests/perf-record.o -LIB_OBJS += $(OUTPUT)tests/rdpmc.o -LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o -LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o -LIB_OBJS += $(OUTPUT)tests/pmu.o -LIB_OBJS += $(OUTPUT)tests/hists_link.o -LIB_OBJS += $(OUTPUT)tests/python-use.o -LIB_OBJS += $(OUTPUT)tests/bp_signal.o -LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o -LIB_OBJS += $(OUTPUT)tests/task-exit.o -LIB_OBJS += $(OUTPUT)tests/sw-clock.o -ifeq ($(ARCH),x86) -LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o -endif -LIB_OBJS += $(OUTPUT)tests/code-reading.o -LIB_OBJS += $(OUTPUT)tests/sample-parsing.o -LIB_OBJS += $(OUTPUT)tests/parse-no-sample-id-all.o - -BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o -BUILTIN_OBJS += $(OUTPUT)builtin-bench.o -# Benchmark modules -BUILTIN_OBJS += $(OUTPUT)bench/sched-messaging.o -BUILTIN_OBJS += $(OUTPUT)bench/sched-pipe.o -ifeq ($(RAW_ARCH),x86_64) -BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy-x86-64-asm.o -BUILTIN_OBJS += $(OUTPUT)bench/mem-memset-x86-64-asm.o -endif -BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy.o -BUILTIN_OBJS += $(OUTPUT)bench/mem-memset.o - -BUILTIN_OBJS += $(OUTPUT)builtin-diff.o -BUILTIN_OBJS += $(OUTPUT)builtin-evlist.o -BUILTIN_OBJS += $(OUTPUT)builtin-help.o -BUILTIN_OBJS += $(OUTPUT)builtin-sched.o -BUILTIN_OBJS += $(OUTPUT)builtin-buildid-list.o -BUILTIN_OBJS += $(OUTPUT)builtin-buildid-cache.o -BUILTIN_OBJS += $(OUTPUT)builtin-list.o -BUILTIN_OBJS += $(OUTPUT)builtin-record.o -BUILTIN_OBJS += $(OUTPUT)builtin-report.o -BUILTIN_OBJS += $(OUTPUT)builtin-stat.o -BUILTIN_OBJS += $(OUTPUT)builtin-timechart.o -BUILTIN_OBJS += $(OUTPUT)builtin-top.o -BUILTIN_OBJS += $(OUTPUT)builtin-script.o -BUILTIN_OBJS += $(OUTPUT)builtin-probe.o -BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o -BUILTIN_OBJS += $(OUTPUT)builtin-lock.o -BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o -BUILTIN_OBJS += $(OUTPUT)builtin-inject.o -BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o -BUILTIN_OBJS += $(OUTPUT)builtin-mem.o - -PERFLIBS = $(LIB_FILE) $(LIBLK) $(LIBTRACEEVENT) - -# We choose to avoid "if .. else if .. else .. endif endif" -# because maintaining the nesting to match is a pain. If -# we had "elif" things would have been much nicer... - --include arch/$(ARCH)/Makefile - -ifneq ($(OUTPUT),) - CFLAGS += -I$(OUTPUT) -endif - -ifdef NO_LIBELF -EXTLIBS := $(filter-out -lelf,$(EXTLIBS)) - -# Remove ELF/DWARF dependent codes -LIB_OBJS := $(filter-out $(OUTPUT)util/symbol-elf.o,$(LIB_OBJS)) -LIB_OBJS := $(filter-out $(OUTPUT)util/dwarf-aux.o,$(LIB_OBJS)) -LIB_OBJS := $(filter-out $(OUTPUT)util/probe-event.o,$(LIB_OBJS)) -LIB_OBJS := $(filter-out $(OUTPUT)util/probe-finder.o,$(LIB_OBJS)) - -BUILTIN_OBJS := $(filter-out $(OUTPUT)builtin-probe.o,$(BUILTIN_OBJS)) - -# Use minimal symbol handling -LIB_OBJS += $(OUTPUT)util/symbol-minimal.o - -else # NO_LIBELF -ifndef NO_DWARF - LIB_OBJS += $(OUTPUT)util/probe-finder.o - LIB_OBJS += $(OUTPUT)util/dwarf-aux.o -endif # NO_DWARF -endif # NO_LIBELF - -ifndef NO_LIBUNWIND - LIB_OBJS += $(OUTPUT)util/unwind.o -endif -LIB_OBJS += $(OUTPUT)tests/keep-tracking.o - -ifndef NO_LIBAUDIT - BUILTIN_OBJS += $(OUTPUT)builtin-trace.o -endif - -ifndef NO_SLANG - LIB_OBJS += $(OUTPUT)ui/browser.o - LIB_OBJS += $(OUTPUT)ui/browsers/annotate.o - LIB_OBJS += $(OUTPUT)ui/browsers/hists.o - LIB_OBJS += $(OUTPUT)ui/browsers/map.o - LIB_OBJS += $(OUTPUT)ui/browsers/scripts.o - LIB_OBJS += $(OUTPUT)ui/tui/setup.o - LIB_OBJS += $(OUTPUT)ui/tui/util.o - LIB_OBJS += $(OUTPUT)ui/tui/helpline.o - LIB_OBJS += $(OUTPUT)ui/tui/progress.o - LIB_H += ui/browser.h - LIB_H += ui/browsers/map.h - LIB_H += ui/keysyms.h - LIB_H += ui/libslang.h -endif - -ifndef NO_GTK2 - LIB_OBJS += $(OUTPUT)ui/gtk/browser.o - LIB_OBJS += $(OUTPUT)ui/gtk/hists.o - LIB_OBJS += $(OUTPUT)ui/gtk/setup.o - LIB_OBJS += $(OUTPUT)ui/gtk/util.o - LIB_OBJS += $(OUTPUT)ui/gtk/helpline.o - LIB_OBJS += $(OUTPUT)ui/gtk/progress.o - LIB_OBJS += $(OUTPUT)ui/gtk/annotate.o -endif - -ifndef NO_LIBPERL - LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-perl.o - LIB_OBJS += $(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o -endif - -ifndef NO_LIBPYTHON - LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-python.o - LIB_OBJS += $(OUTPUT)scripts/python/Perf-Trace-Util/Context.o -endif - -ifeq ($(NO_PERF_REGS),0) - ifeq ($(ARCH),x86) - LIB_H += arch/x86/include/perf_regs.h - endif -endif - -ifndef NO_LIBNUMA - BUILTIN_OBJS += $(OUTPUT)bench/numa.o -endif - -ifdef ASCIIDOC8 - export ASCIIDOC8 -endif - -LIBS = -Wl,--whole-archive $(PERFLIBS) -Wl,--no-whole-archive -Wl,--start-group $(EXTLIBS) -Wl,--end-group - -export INSTALL SHELL_PATH - -### Build rules - -SHELL = $(SHELL_PATH) - -all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) - -please_set_SHELL_PATH_to_a_more_modern_shell: - @$$(:) - -shell_compatibility_test: please_set_SHELL_PATH_to_a_more_modern_shell - -strip: $(PROGRAMS) $(OUTPUT)perf - $(STRIP) $(STRIP_OPTS) $(PROGRAMS) $(OUTPUT)perf - -$(OUTPUT)perf.o: perf.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -include $(OUTPUT)PERF-VERSION-FILE \ - '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ - $(CFLAGS) -c $(filter %.c,$^) -o $@ - -$(OUTPUT)perf: $(OUTPUT)perf.o $(BUILTIN_OBJS) $(PERFLIBS) - $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(OUTPUT)perf.o \ - $(BUILTIN_OBJS) $(LIBS) -o $@ - -$(OUTPUT)builtin-help.o: builtin-help.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ - '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ - '-DPERF_MAN_PATH="$(mandir_SQ)"' \ - '-DPERF_INFO_PATH="$(infodir_SQ)"' $< - -$(OUTPUT)builtin-timechart.o: builtin-timechart.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ - '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ - '-DPERF_MAN_PATH="$(mandir_SQ)"' \ - '-DPERF_INFO_PATH="$(infodir_SQ)"' $< - -$(OUTPUT)common-cmds.h: util/generate-cmdlist.sh command-list.txt - -$(OUTPUT)common-cmds.h: $(wildcard Documentation/perf-*.txt) - $(QUIET_GEN). util/generate-cmdlist.sh > $@+ && mv $@+ $@ - -$(SCRIPTS) : % : %.sh - $(QUIET_GEN)$(INSTALL) '$@.sh' '$(OUTPUT)$@' - -# These can record PERF_VERSION -$(OUTPUT)perf.o perf.spec \ - $(SCRIPTS) \ - : $(OUTPUT)PERF-VERSION-FILE - -.SUFFIXES: -.SUFFIXES: .o .c .S .s - -# These two need to be here so that when O= is not used they take precedence -# over the general rule for .o - -$(OUTPUT)util/%-flex.o: $(OUTPUT)util/%-flex.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c -Iutil/ $(CFLAGS) -w $< - -$(OUTPUT)util/%-bison.o: $(OUTPUT)util/%-bison.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c -Iutil/ $(CFLAGS) -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w $< - -$(OUTPUT)%.o: %.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $< -$(OUTPUT)%.i: %.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -E $(CFLAGS) $< -$(OUTPUT)%.s: %.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -S $(CFLAGS) $< -$(OUTPUT)%.o: %.S - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $< -$(OUTPUT)%.s: %.S - $(QUIET_CC)$(CC) -o $@ -E $(CFLAGS) $< - -$(OUTPUT)util/exec_cmd.o: util/exec_cmd.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ - '-DPERF_EXEC_PATH="$(perfexecdir_SQ)"' \ - '-DPREFIX="$(prefix_SQ)"' \ - $< - -$(OUTPUT)tests/attr.o: tests/attr.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ - '-DBINDIR="$(bindir_SQ)"' -DPYTHON='"$(PYTHON_WORD)"' \ - $< - -$(OUTPUT)tests/python-use.o: tests/python-use.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ - -DPYTHONPATH='"$(OUTPUT)python"' \ - -DPYTHON='"$(PYTHON_WORD)"' \ - $< - -$(OUTPUT)util/config.o: util/config.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< - -$(OUTPUT)ui/browser.o: ui/browser.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< - -$(OUTPUT)ui/browsers/annotate.o: ui/browsers/annotate.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< +clean: + $(make) -$(OUTPUT)ui/browsers/hists.o: ui/browsers/hists.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< - -$(OUTPUT)ui/browsers/map.o: ui/browsers/map.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< - -$(OUTPUT)ui/browsers/scripts.o: ui/browsers/scripts.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< - -$(OUTPUT)util/rbtree.o: ../../lib/rbtree.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -Wno-unused-parameter -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< - -$(OUTPUT)util/parse-events.o: util/parse-events.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -Wno-redundant-decls $< - -$(OUTPUT)util/scripting-engines/trace-event-perl.o: util/scripting-engines/trace-event-perl.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow -Wno-undef -Wno-switch-default $< - -$(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o: scripts/perl/Perf-Trace-Util/Context.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs -Wno-undef -Wno-switch-default $< - -$(OUTPUT)util/scripting-engines/trace-event-python.o: util/scripting-engines/trace-event-python.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PYTHON_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow $< - -$(OUTPUT)scripts/python/Perf-Trace-Util/Context.o: scripts/python/Perf-Trace-Util/Context.c $(OUTPUT)PERF-CFLAGS - $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PYTHON_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs $< - -$(OUTPUT)perf-%: %.o $(PERFLIBS) - $(QUIET_LINK)$(CC) $(CFLAGS) -o $@ $(LDFLAGS) $(filter %.o,$^) $(LIBS) - -$(LIB_OBJS) $(BUILTIN_OBJS): $(LIB_H) -$(patsubst perf-%,%.o,$(PROGRAMS)): $(LIB_H) $(wildcard */*.h) - -# we compile into subdirectories. if the target directory is not the source directory, they might not exists. So -# we depend the various files onto their directories. -DIRECTORY_DEPS = $(LIB_OBJS) $(BUILTIN_OBJS) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h -$(DIRECTORY_DEPS): | $(sort $(dir $(DIRECTORY_DEPS))) -# In the second step, we make a rule to actually create these directories -$(sort $(dir $(DIRECTORY_DEPS))): - $(QUIET_MKDIR)$(MKDIR) -p $@ 2>/dev/null - -$(LIB_FILE): $(LIB_OBJS) - $(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(LIB_OBJS) - -# libtraceevent.a -$(LIBTRACEEVENT): - $(QUIET_SUBDIR0)$(TRACE_EVENT_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) libtraceevent.a - -$(LIBTRACEEVENT)-clean: - $(QUIET_SUBDIR0)$(TRACE_EVENT_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) clean - -# if subdir is set, we've been called from above so target has been built -# already -$(LIBLK): -ifeq ($(subdir),) - $(QUIET_SUBDIR0)$(LK_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) liblk.a -endif - -$(LIBLK)-clean: -ifeq ($(subdir),) - $(QUIET_SUBDIR0)$(LK_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) clean -endif - -help: - @echo 'Perf make targets:' - @echo ' doc - make *all* documentation (see below)' - @echo ' man - make manpage documentation (access with man )' - @echo ' html - make html documentation' - @echo ' info - make GNU info documentation (access with info )' - @echo ' pdf - make pdf documentation' - @echo ' TAGS - use etags to make tag information for source browsing' - @echo ' tags - use ctags to make tag information for source browsing' - @echo ' cscope - use cscope to make interactive browsing database' - @echo '' - @echo 'Perf install targets:' - @echo ' NOTE: documentation build requires asciidoc, xmlto packages to be installed' - @echo ' HINT: use "make prefix= " to install to a particular' - @echo ' path like make prefix=/usr/local install install-doc' - @echo ' install - install compiled binaries' - @echo ' install-doc - install *all* documentation' - @echo ' install-man - install manpage documentation' - @echo ' install-html - install html documentation' - @echo ' install-info - install GNU info documentation' - @echo ' install-pdf - install pdf documentation' - @echo '' - @echo ' quick-install-doc - alias for quick-install-man' - @echo ' quick-install-man - install the documentation quickly' - @echo ' quick-install-html - install the html documentation quickly' - @echo '' - @echo 'Perf maintainer targets:' - @echo ' clean - clean all binary objects and build output' - - -DOC_TARGETS := doc man html info pdf - -INSTALL_DOC_TARGETS := $(patsubst %,install-%,$(DOC_TARGETS)) try-install-man -INSTALL_DOC_TARGETS += quick-install-doc quick-install-man quick-install-html - -# 'make doc' should call 'make -C Documentation all' -$(DOC_TARGETS): - $(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all) - -TAGS: - $(RM) TAGS - $(FIND) . -name '*.[hcS]' -print | xargs etags -a - -tags: - $(RM) tags - $(FIND) . -name '*.[hcS]' -print | xargs ctags -a - -cscope: - $(RM) cscope* - $(FIND) . -name '*.[hcS]' -print | xargs cscope -b - -### Detect prefix changes -TRACK_CFLAGS = $(subst ','\'',$(CFLAGS)):\ - $(bindir_SQ):$(perfexecdir_SQ):$(template_dir_SQ):$(prefix_SQ) - -$(OUTPUT)PERF-CFLAGS: .FORCE-PERF-CFLAGS - @FLAGS='$(TRACK_CFLAGS)'; \ - if test x"$$FLAGS" != x"`cat $(OUTPUT)PERF-CFLAGS 2>/dev/null`" ; then \ - echo 1>&2 " * new build flags or prefix"; \ - echo "$$FLAGS" >$(OUTPUT)PERF-CFLAGS; \ - fi - -### Testing rules - -# GNU make supports exporting all variables by "export" without parameters. -# However, the environment gets quite big, and some programs have problems -# with that. - -check: $(OUTPUT)common-cmds.h - if sparse; \ - then \ - for i in *.c */*.c; \ - do \ - sparse $(CFLAGS) $(SPARSE_FLAGS) $$i || exit; \ - done; \ - else \ - exit 1; \ - fi - -### Installation rules - -install-bin: all - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(bindir_SQ)' - $(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)' - $(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)' -ifndef NO_LIBPERL - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace' - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin' - $(INSTALL) scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace' - $(INSTALL) scripts/perl/*.pl -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl' - $(INSTALL) scripts/perl/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin' -endif -ifndef NO_LIBPYTHON - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' - $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' - $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' - $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' -endif - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d' - $(INSTALL) bash_completion '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests' - $(INSTALL) tests/attr.py '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests' - $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr' - $(INSTALL) tests/attr/* '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr' - -install: install-bin try-install-man - -install-python_ext: - $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' - -# 'make install-doc' should call 'make -C Documentation install' -$(INSTALL_DOC_TARGETS): - $(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:-doc=) - -### Cleaning rules - -clean: $(LIBTRACEEVENT)-clean $(LIBLK)-clean - $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf.o $(LANG_BINDINGS) - $(RM) $(ALL_PROGRAMS) perf - $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* - $(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean - $(RM) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS - $(RM) $(OUTPUT)util/*-bison* - $(RM) $(OUTPUT)util/*-flex* - $(python-clean) - -.PHONY: all install clean strip $(LIBTRACEEVENT) $(LIBLK) -.PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell -.PHONY: .FORCE-PERF-VERSION-FILE TAGS tags cscope .FORCE-PERF-CFLAGS +# +# All other targets get passed through: +# +%: + $(print_msg) + $(make) diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf new file mode 100644 index 000000000000..326a26e5fc1c --- /dev/null +++ b/tools/perf/Makefile.perf @@ -0,0 +1,888 @@ +include ../scripts/Makefile.include + +# The default target of this Makefile is... +all: + +include config/utilities.mak + +# Define V to have a more verbose compile. +# +# Define O to save output files in a separate directory. +# +# Define ARCH as name of target architecture if you want cross-builds. +# +# Define CROSS_COMPILE as prefix name of compiler if you want cross-builds. +# +# Define NO_LIBPERL to disable perl script extension. +# +# Define NO_LIBPYTHON to disable python script extension. +# +# Define PYTHON to point to the python binary if the default +# `python' is not correct; for example: PYTHON=python2 +# +# Define PYTHON_CONFIG to point to the python-config binary if +# the default `$(PYTHON)-config' is not correct. +# +# Define ASCIIDOC8 if you want to format documentation with AsciiDoc 8 +# +# Define DOCBOOK_XSL_172 if you want to format man pages with DocBook XSL v1.72. +# +# Define LDFLAGS=-static to build a static binary. +# +# Define EXTRA_CFLAGS=-m64 or EXTRA_CFLAGS=-m32 as appropriate for cross-builds. +# +# Define NO_DWARF if you do not want debug-info analysis feature at all. +# +# Define WERROR=0 to disable treating any warnings as errors. +# +# Define NO_NEWT if you do not want TUI support. (deprecated) +# +# Define NO_SLANG if you do not want TUI support. +# +# Define NO_GTK2 if you do not want GTK+ GUI support. +# +# Define NO_DEMANGLE if you do not want C++ symbol demangling. +# +# Define NO_LIBELF if you do not want libelf dependency (e.g. cross-builds) +# +# Define NO_LIBUNWIND if you do not want libunwind dependency for dwarf +# backtrace post unwind. +# +# Define NO_BACKTRACE if you do not want stack backtrace debug feature +# +# Define NO_LIBNUMA if you do not want numa perf benchmark +# +# Define NO_LIBAUDIT if you do not want libaudit support +# +# Define NO_LIBBIONIC if you do not want bionic support + +ifeq ($(srctree),) +srctree := $(patsubst %/,%,$(dir $(shell pwd))) +srctree := $(patsubst %/,%,$(dir $(srctree))) +#$(info Determined 'srctree' to be $(srctree)) +endif + +ifneq ($(objtree),) +#$(info Determined 'objtree' to be $(objtree)) +endif + +ifneq ($(OUTPUT),) +#$(info Determined 'OUTPUT' to be $(OUTPUT)) +endif + +$(OUTPUT)PERF-VERSION-FILE: ../../.git/HEAD + @$(SHELL_PATH) util/PERF-VERSION-GEN $(OUTPUT) + @touch $(OUTPUT)PERF-VERSION-FILE + +CC = $(CROSS_COMPILE)gcc +AR = $(CROSS_COMPILE)ar + +RM = rm -f +LN = ln -f +MKDIR = mkdir +FIND = find +INSTALL = install +FLEX = flex +BISON = bison +STRIP = strip + +LK_DIR = $(srctree)/tools/lib/lk/ +TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/ + +# include config/Makefile by default and rule out +# non-config cases +config := 1 + +NON_CONFIG_TARGETS := clean TAGS tags cscope help + +ifdef MAKECMDGOALS +ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),) + config := 0 +endif +endif + +ifeq ($(config),1) +include config/Makefile +endif + +export prefix bindir sharedir sysconfdir + +# sparse is architecture-neutral, which means that we need to tell it +# explicitly what architecture to check for. Fix this up for yours.. +SPARSE_FLAGS = -D__BIG_ENDIAN__ -D__powerpc__ + +# Guard against environment variables +BUILTIN_OBJS = +LIB_H = +LIB_OBJS = +GTK_OBJS = +PYRF_OBJS = +SCRIPT_SH = + +SCRIPT_SH += perf-archive.sh + +grep-libs = $(filter -l%,$(1)) +strip-libs = $(filter-out -l%,$(1)) + +ifneq ($(OUTPUT),) + TE_PATH=$(OUTPUT) +ifneq ($(subdir),) + LK_PATH=$(OUTPUT)/../lib/lk/ +else + LK_PATH=$(OUTPUT) +endif +else + TE_PATH=$(TRACE_EVENT_DIR) + LK_PATH=$(LK_DIR) +endif + +LIBTRACEEVENT = $(TE_PATH)libtraceevent.a +export LIBTRACEEVENT + +LIBLK = $(LK_PATH)liblk.a +export LIBLK + +# python extension build directories +PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/ +PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/ +PYTHON_EXTBUILD_TMP := $(PYTHON_EXTBUILD)tmp/ +export PYTHON_EXTBUILD_LIB PYTHON_EXTBUILD_TMP + +python-clean := $(call QUIET_CLEAN, python) $(RM) -r $(PYTHON_EXTBUILD) $(OUTPUT)python/perf.so + +PYTHON_EXT_SRCS := $(shell grep -v ^\# util/python-ext-sources) +PYTHON_EXT_DEPS := util/python-ext-sources util/setup.py $(LIBTRACEEVENT) $(LIBLK) + +$(OUTPUT)python/perf.so: $(PYTHON_EXT_SRCS) $(PYTHON_EXT_DEPS) + $(QUIET_GEN)CFLAGS='$(CFLAGS)' $(PYTHON_WORD) util/setup.py \ + --quiet build_ext; \ + mkdir -p $(OUTPUT)python && \ + cp $(PYTHON_EXTBUILD_LIB)perf.so $(OUTPUT)python/ +# +# No Perl scripts right now: +# + +SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) + +# +# Single 'perf' binary right now: +# +PROGRAMS += $(OUTPUT)perf + +# what 'all' will build and 'install' will install, in perfexecdir +ALL_PROGRAMS = $(PROGRAMS) $(SCRIPTS) + +# what 'all' will build but not install in perfexecdir +OTHER_PROGRAMS = $(OUTPUT)perf + +# Set paths to tools early so that they can be used for version tests. +ifndef SHELL_PATH + SHELL_PATH = /bin/sh +endif +ifndef PERL_PATH + PERL_PATH = /usr/bin/perl +endif + +export PERL_PATH + +$(OUTPUT)util/parse-events-flex.c: util/parse-events.l $(OUTPUT)util/parse-events-bison.c + $(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/parse-events-flex.h $(PARSER_DEBUG_FLEX) -t util/parse-events.l > $(OUTPUT)util/parse-events-flex.c + +$(OUTPUT)util/parse-events-bison.c: util/parse-events.y + $(QUIET_BISON)$(BISON) -v util/parse-events.y -d $(PARSER_DEBUG_BISON) -o $(OUTPUT)util/parse-events-bison.c -p parse_events_ + +$(OUTPUT)util/pmu-flex.c: util/pmu.l $(OUTPUT)util/pmu-bison.c + $(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/pmu-flex.h -t util/pmu.l > $(OUTPUT)util/pmu-flex.c + +$(OUTPUT)util/pmu-bison.c: util/pmu.y + $(QUIET_BISON)$(BISON) -v util/pmu.y -d -o $(OUTPUT)util/pmu-bison.c -p perf_pmu_ + +$(OUTPUT)util/parse-events.o: $(OUTPUT)util/parse-events-flex.c $(OUTPUT)util/parse-events-bison.c +$(OUTPUT)util/pmu.o: $(OUTPUT)util/pmu-flex.c $(OUTPUT)util/pmu-bison.c + +LIB_FILE=$(OUTPUT)libperf.a + +LIB_H += ../../include/uapi/linux/perf_event.h +LIB_H += ../../include/linux/rbtree.h +LIB_H += ../../include/linux/list.h +LIB_H += ../../include/uapi/linux/const.h +LIB_H += ../../include/linux/hash.h +LIB_H += ../../include/linux/stringify.h +LIB_H += util/include/linux/bitmap.h +LIB_H += util/include/linux/bitops.h +LIB_H += util/include/linux/compiler.h +LIB_H += util/include/linux/const.h +LIB_H += util/include/linux/ctype.h +LIB_H += util/include/linux/kernel.h +LIB_H += util/include/linux/list.h +LIB_H += util/include/linux/export.h +LIB_H += util/include/linux/magic.h +LIB_H += util/include/linux/poison.h +LIB_H += util/include/linux/prefetch.h +LIB_H += util/include/linux/rbtree.h +LIB_H += util/include/linux/rbtree_augmented.h +LIB_H += util/include/linux/string.h +LIB_H += util/include/linux/types.h +LIB_H += util/include/linux/linkage.h +LIB_H += util/include/asm/asm-offsets.h +LIB_H += util/include/asm/bug.h +LIB_H += util/include/asm/byteorder.h +LIB_H += util/include/asm/hweight.h +LIB_H += util/include/asm/swab.h +LIB_H += util/include/asm/system.h +LIB_H += util/include/asm/uaccess.h +LIB_H += util/include/dwarf-regs.h +LIB_H += util/include/asm/dwarf2.h +LIB_H += util/include/asm/cpufeature.h +LIB_H += util/include/asm/unistd_32.h +LIB_H += util/include/asm/unistd_64.h +LIB_H += perf.h +LIB_H += util/annotate.h +LIB_H += util/cache.h +LIB_H += util/callchain.h +LIB_H += util/build-id.h +LIB_H += util/debug.h +LIB_H += util/sysfs.h +LIB_H += util/pmu.h +LIB_H += util/event.h +LIB_H += util/evsel.h +LIB_H += util/evlist.h +LIB_H += util/exec_cmd.h +LIB_H += util/types.h +LIB_H += util/levenshtein.h +LIB_H += util/machine.h +LIB_H += util/map.h +LIB_H += util/parse-options.h +LIB_H += util/parse-events.h +LIB_H += util/quote.h +LIB_H += util/util.h +LIB_H += util/xyarray.h +LIB_H += util/header.h +LIB_H += util/help.h +LIB_H += util/session.h +LIB_H += util/strbuf.h +LIB_H += util/strlist.h +LIB_H += util/strfilter.h +LIB_H += util/svghelper.h +LIB_H += util/tool.h +LIB_H += util/run-command.h +LIB_H += util/sigchain.h +LIB_H += util/dso.h +LIB_H += util/symbol.h +LIB_H += util/color.h +LIB_H += util/values.h +LIB_H += util/sort.h +LIB_H += util/hist.h +LIB_H += util/thread.h +LIB_H += util/thread_map.h +LIB_H += util/trace-event.h +LIB_H += util/probe-finder.h +LIB_H += util/dwarf-aux.h +LIB_H += util/probe-event.h +LIB_H += util/pstack.h +LIB_H += util/cpumap.h +LIB_H += util/top.h +LIB_H += $(ARCH_INCLUDE) +LIB_H += util/cgroup.h +LIB_H += $(LIB_INCLUDE)traceevent/event-parse.h +LIB_H += util/target.h +LIB_H += util/rblist.h +LIB_H += util/intlist.h +LIB_H += util/perf_regs.h +LIB_H += util/unwind.h +LIB_H += util/vdso.h +LIB_H += ui/helpline.h +LIB_H += ui/progress.h +LIB_H += ui/util.h +LIB_H += ui/ui.h + +LIB_OBJS += $(OUTPUT)util/abspath.o +LIB_OBJS += $(OUTPUT)util/alias.o +LIB_OBJS += $(OUTPUT)util/annotate.o +LIB_OBJS += $(OUTPUT)util/build-id.o +LIB_OBJS += $(OUTPUT)util/config.o +LIB_OBJS += $(OUTPUT)util/ctype.o +LIB_OBJS += $(OUTPUT)util/sysfs.o +LIB_OBJS += $(OUTPUT)util/pmu.o +LIB_OBJS += $(OUTPUT)util/environment.o +LIB_OBJS += $(OUTPUT)util/event.o +LIB_OBJS += $(OUTPUT)util/evlist.o +LIB_OBJS += $(OUTPUT)util/evsel.o +LIB_OBJS += $(OUTPUT)util/exec_cmd.o +LIB_OBJS += $(OUTPUT)util/help.o +LIB_OBJS += $(OUTPUT)util/levenshtein.o +LIB_OBJS += $(OUTPUT)util/parse-options.o +LIB_OBJS += $(OUTPUT)util/parse-events.o +LIB_OBJS += $(OUTPUT)util/path.o +LIB_OBJS += $(OUTPUT)util/rbtree.o +LIB_OBJS += $(OUTPUT)util/bitmap.o +LIB_OBJS += $(OUTPUT)util/hweight.o +LIB_OBJS += $(OUTPUT)util/run-command.o +LIB_OBJS += $(OUTPUT)util/quote.o +LIB_OBJS += $(OUTPUT)util/strbuf.o +LIB_OBJS += $(OUTPUT)util/string.o +LIB_OBJS += $(OUTPUT)util/strlist.o +LIB_OBJS += $(OUTPUT)util/strfilter.o +LIB_OBJS += $(OUTPUT)util/top.o +LIB_OBJS += $(OUTPUT)util/usage.o +LIB_OBJS += $(OUTPUT)util/wrapper.o +LIB_OBJS += $(OUTPUT)util/sigchain.o +LIB_OBJS += $(OUTPUT)util/dso.o +LIB_OBJS += $(OUTPUT)util/symbol.o +LIB_OBJS += $(OUTPUT)util/symbol-elf.o +LIB_OBJS += $(OUTPUT)util/color.o +LIB_OBJS += $(OUTPUT)util/pager.o +LIB_OBJS += $(OUTPUT)util/header.o +LIB_OBJS += $(OUTPUT)util/callchain.o +LIB_OBJS += $(OUTPUT)util/values.o +LIB_OBJS += $(OUTPUT)util/debug.o +LIB_OBJS += $(OUTPUT)util/machine.o +LIB_OBJS += $(OUTPUT)util/map.o +LIB_OBJS += $(OUTPUT)util/pstack.o +LIB_OBJS += $(OUTPUT)util/session.o +LIB_OBJS += $(OUTPUT)util/thread.o +LIB_OBJS += $(OUTPUT)util/thread_map.o +LIB_OBJS += $(OUTPUT)util/trace-event-parse.o +LIB_OBJS += $(OUTPUT)util/parse-events-flex.o +LIB_OBJS += $(OUTPUT)util/parse-events-bison.o +LIB_OBJS += $(OUTPUT)util/pmu-flex.o +LIB_OBJS += $(OUTPUT)util/pmu-bison.o +LIB_OBJS += $(OUTPUT)util/trace-event-read.o +LIB_OBJS += $(OUTPUT)util/trace-event-info.o +LIB_OBJS += $(OUTPUT)util/trace-event-scripting.o +LIB_OBJS += $(OUTPUT)util/svghelper.o +LIB_OBJS += $(OUTPUT)util/sort.o +LIB_OBJS += $(OUTPUT)util/hist.o +LIB_OBJS += $(OUTPUT)util/probe-event.o +LIB_OBJS += $(OUTPUT)util/util.o +LIB_OBJS += $(OUTPUT)util/xyarray.o +LIB_OBJS += $(OUTPUT)util/cpumap.o +LIB_OBJS += $(OUTPUT)util/cgroup.o +LIB_OBJS += $(OUTPUT)util/target.o +LIB_OBJS += $(OUTPUT)util/rblist.o +LIB_OBJS += $(OUTPUT)util/intlist.o +LIB_OBJS += $(OUTPUT)util/vdso.o +LIB_OBJS += $(OUTPUT)util/stat.o +LIB_OBJS += $(OUTPUT)util/record.o +LIB_OBJS += $(OUTPUT)util/srcline.o +LIB_OBJS += $(OUTPUT)util/data.o + +LIB_OBJS += $(OUTPUT)ui/setup.o +LIB_OBJS += $(OUTPUT)ui/helpline.o +LIB_OBJS += $(OUTPUT)ui/progress.o +LIB_OBJS += $(OUTPUT)ui/util.o +LIB_OBJS += $(OUTPUT)ui/hist.o +LIB_OBJS += $(OUTPUT)ui/stdio/hist.o + +LIB_OBJS += $(OUTPUT)arch/common.o + +LIB_OBJS += $(OUTPUT)tests/parse-events.o +LIB_OBJS += $(OUTPUT)tests/dso-data.o +LIB_OBJS += $(OUTPUT)tests/attr.o +LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o +LIB_OBJS += $(OUTPUT)tests/open-syscall.o +LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o +LIB_OBJS += $(OUTPUT)tests/open-syscall-tp-fields.o +LIB_OBJS += $(OUTPUT)tests/mmap-basic.o +LIB_OBJS += $(OUTPUT)tests/perf-record.o +LIB_OBJS += $(OUTPUT)tests/rdpmc.o +LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o +LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o +LIB_OBJS += $(OUTPUT)tests/pmu.o +LIB_OBJS += $(OUTPUT)tests/hists_link.o +LIB_OBJS += $(OUTPUT)tests/python-use.o +LIB_OBJS += $(OUTPUT)tests/bp_signal.o +LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o +LIB_OBJS += $(OUTPUT)tests/task-exit.o +LIB_OBJS += $(OUTPUT)tests/sw-clock.o +ifeq ($(ARCH),x86) +LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o +endif +LIB_OBJS += $(OUTPUT)tests/code-reading.o +LIB_OBJS += $(OUTPUT)tests/sample-parsing.o +LIB_OBJS += $(OUTPUT)tests/parse-no-sample-id-all.o + +BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o +BUILTIN_OBJS += $(OUTPUT)builtin-bench.o +# Benchmark modules +BUILTIN_OBJS += $(OUTPUT)bench/sched-messaging.o +BUILTIN_OBJS += $(OUTPUT)bench/sched-pipe.o +ifeq ($(RAW_ARCH),x86_64) +BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy-x86-64-asm.o +BUILTIN_OBJS += $(OUTPUT)bench/mem-memset-x86-64-asm.o +endif +BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy.o +BUILTIN_OBJS += $(OUTPUT)bench/mem-memset.o + +BUILTIN_OBJS += $(OUTPUT)builtin-diff.o +BUILTIN_OBJS += $(OUTPUT)builtin-evlist.o +BUILTIN_OBJS += $(OUTPUT)builtin-help.o +BUILTIN_OBJS += $(OUTPUT)builtin-sched.o +BUILTIN_OBJS += $(OUTPUT)builtin-buildid-list.o +BUILTIN_OBJS += $(OUTPUT)builtin-buildid-cache.o +BUILTIN_OBJS += $(OUTPUT)builtin-list.o +BUILTIN_OBJS += $(OUTPUT)builtin-record.o +BUILTIN_OBJS += $(OUTPUT)builtin-report.o +BUILTIN_OBJS += $(OUTPUT)builtin-stat.o +BUILTIN_OBJS += $(OUTPUT)builtin-timechart.o +BUILTIN_OBJS += $(OUTPUT)builtin-top.o +BUILTIN_OBJS += $(OUTPUT)builtin-script.o +BUILTIN_OBJS += $(OUTPUT)builtin-probe.o +BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o +BUILTIN_OBJS += $(OUTPUT)builtin-lock.o +BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o +BUILTIN_OBJS += $(OUTPUT)builtin-inject.o +BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o +BUILTIN_OBJS += $(OUTPUT)builtin-mem.o + +PERFLIBS = $(LIB_FILE) $(LIBLK) $(LIBTRACEEVENT) + +# We choose to avoid "if .. else if .. else .. endif endif" +# because maintaining the nesting to match is a pain. If +# we had "elif" things would have been much nicer... + +-include arch/$(ARCH)/Makefile + +ifneq ($(OUTPUT),) + CFLAGS += -I$(OUTPUT) +endif + +ifdef NO_LIBELF +EXTLIBS := $(filter-out -lelf,$(EXTLIBS)) + +# Remove ELF/DWARF dependent codes +LIB_OBJS := $(filter-out $(OUTPUT)util/symbol-elf.o,$(LIB_OBJS)) +LIB_OBJS := $(filter-out $(OUTPUT)util/dwarf-aux.o,$(LIB_OBJS)) +LIB_OBJS := $(filter-out $(OUTPUT)util/probe-event.o,$(LIB_OBJS)) +LIB_OBJS := $(filter-out $(OUTPUT)util/probe-finder.o,$(LIB_OBJS)) + +BUILTIN_OBJS := $(filter-out $(OUTPUT)builtin-probe.o,$(BUILTIN_OBJS)) + +# Use minimal symbol handling +LIB_OBJS += $(OUTPUT)util/symbol-minimal.o + +else # NO_LIBELF +ifndef NO_DWARF + LIB_OBJS += $(OUTPUT)util/probe-finder.o + LIB_OBJS += $(OUTPUT)util/dwarf-aux.o +endif # NO_DWARF +endif # NO_LIBELF + +ifndef NO_LIBUNWIND + LIB_OBJS += $(OUTPUT)util/unwind.o +endif +LIB_OBJS += $(OUTPUT)tests/keep-tracking.o + +ifndef NO_LIBAUDIT + BUILTIN_OBJS += $(OUTPUT)builtin-trace.o +endif + +ifndef NO_SLANG + LIB_OBJS += $(OUTPUT)ui/browser.o + LIB_OBJS += $(OUTPUT)ui/browsers/annotate.o + LIB_OBJS += $(OUTPUT)ui/browsers/hists.o + LIB_OBJS += $(OUTPUT)ui/browsers/map.o + LIB_OBJS += $(OUTPUT)ui/browsers/scripts.o + LIB_OBJS += $(OUTPUT)ui/tui/setup.o + LIB_OBJS += $(OUTPUT)ui/tui/util.o + LIB_OBJS += $(OUTPUT)ui/tui/helpline.o + LIB_OBJS += $(OUTPUT)ui/tui/progress.o + LIB_H += ui/browser.h + LIB_H += ui/browsers/map.h + LIB_H += ui/keysyms.h + LIB_H += ui/libslang.h +endif + +ifndef NO_GTK2 + ALL_PROGRAMS += $(OUTPUT)libperf-gtk.so + + GTK_OBJS += $(OUTPUT)ui/gtk/browser.o + GTK_OBJS += $(OUTPUT)ui/gtk/hists.o + GTK_OBJS += $(OUTPUT)ui/gtk/setup.o + GTK_OBJS += $(OUTPUT)ui/gtk/util.o + GTK_OBJS += $(OUTPUT)ui/gtk/helpline.o + GTK_OBJS += $(OUTPUT)ui/gtk/progress.o + GTK_OBJS += $(OUTPUT)ui/gtk/annotate.o + +install-gtk: $(OUTPUT)libperf-gtk.so + $(call QUIET_INSTALL, 'GTK UI') \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(libdir_SQ)'; \ + $(INSTALL) $(OUTPUT)libperf-gtk.so '$(DESTDIR_SQ)$(libdir_SQ)' +endif + +ifndef NO_LIBPERL + LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-perl.o + LIB_OBJS += $(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o +endif + +ifndef NO_LIBPYTHON + LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-python.o + LIB_OBJS += $(OUTPUT)scripts/python/Perf-Trace-Util/Context.o +endif + +ifeq ($(NO_PERF_REGS),0) + ifeq ($(ARCH),x86) + LIB_H += arch/x86/include/perf_regs.h + endif +endif + +ifndef NO_LIBNUMA + BUILTIN_OBJS += $(OUTPUT)bench/numa.o +endif + +ifdef ASCIIDOC8 + export ASCIIDOC8 +endif + +LIBS = -Wl,--whole-archive $(PERFLIBS) -Wl,--no-whole-archive -Wl,--start-group $(EXTLIBS) -Wl,--end-group + +export INSTALL SHELL_PATH + +### Build rules + +SHELL = $(SHELL_PATH) + +all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) + +please_set_SHELL_PATH_to_a_more_modern_shell: + @$$(:) + +shell_compatibility_test: please_set_SHELL_PATH_to_a_more_modern_shell + +strip: $(PROGRAMS) $(OUTPUT)perf + $(STRIP) $(STRIP_OPTS) $(PROGRAMS) $(OUTPUT)perf + +$(OUTPUT)perf.o: perf.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -include $(OUTPUT)PERF-VERSION-FILE \ + '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ + $(CFLAGS) -c $(filter %.c,$^) -o $@ + +$(OUTPUT)perf: $(OUTPUT)perf.o $(BUILTIN_OBJS) $(PERFLIBS) + $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(OUTPUT)perf.o \ + $(BUILTIN_OBJS) $(LIBS) -o $@ + +$(GTK_OBJS): $(OUTPUT)%.o: %.c $(LIB_H) + $(QUIET_CC)$(CC) -o $@ -c -fPIC $(CFLAGS) $(GTK_CFLAGS) $< + +$(OUTPUT)libperf-gtk.so: $(GTK_OBJS) $(PERFLIBS) + $(QUIET_LINK)$(CC) -o $@ -shared $(ALL_LDFLAGS) $(filter %.o,$^) $(GTK_LIBS) + +$(OUTPUT)builtin-help.o: builtin-help.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ + '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ + '-DPERF_MAN_PATH="$(mandir_SQ)"' \ + '-DPERF_INFO_PATH="$(infodir_SQ)"' $< + +$(OUTPUT)builtin-timechart.o: builtin-timechart.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ + '-DPERF_HTML_PATH="$(htmldir_SQ)"' \ + '-DPERF_MAN_PATH="$(mandir_SQ)"' \ + '-DPERF_INFO_PATH="$(infodir_SQ)"' $< + +$(OUTPUT)common-cmds.h: util/generate-cmdlist.sh command-list.txt + +$(OUTPUT)common-cmds.h: $(wildcard Documentation/perf-*.txt) + $(QUIET_GEN). util/generate-cmdlist.sh > $@+ && mv $@+ $@ + +$(SCRIPTS) : % : %.sh + $(QUIET_GEN)$(INSTALL) '$@.sh' '$(OUTPUT)$@' + +# These can record PERF_VERSION +$(OUTPUT)perf.o perf.spec \ + $(SCRIPTS) \ + : $(OUTPUT)PERF-VERSION-FILE + +.SUFFIXES: + +# +# If a target does not match any of the later rules then prefix it by $(OUTPUT) +# This makes targets like 'make O=/tmp/perf perf.o' work in a natural way. +# +ifneq ($(OUTPUT),) +%.o: $(OUTPUT)%.o + @echo " # Redirected target $@ => $(OUTPUT)$@" +util/%.o: $(OUTPUT)util/%.o + @echo " # Redirected target $@ => $(OUTPUT)$@" +bench/%.o: $(OUTPUT)bench/%.o + @echo " # Redirected target $@ => $(OUTPUT)$@" +tests/%.o: $(OUTPUT)tests/%.o + @echo " # Redirected target $@ => $(OUTPUT)$@" +endif + +# These two need to be here so that when O= is not used they take precedence +# over the general rule for .o + +$(OUTPUT)util/%-flex.o: $(OUTPUT)util/%-flex.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c -Iutil/ $(CFLAGS) -w $< + +$(OUTPUT)util/%-bison.o: $(OUTPUT)util/%-bison.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c -Iutil/ $(CFLAGS) -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w $< + +$(OUTPUT)%.o: %.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $< +$(OUTPUT)%.i: %.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -E $(CFLAGS) $< +$(OUTPUT)%.s: %.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -S $(CFLAGS) $< +$(OUTPUT)%.o: %.S + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $< +$(OUTPUT)%.s: %.S + $(QUIET_CC)$(CC) -o $@ -E $(CFLAGS) $< + +$(OUTPUT)util/exec_cmd.o: util/exec_cmd.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ + '-DPERF_EXEC_PATH="$(perfexecdir_SQ)"' \ + '-DPREFIX="$(prefix_SQ)"' \ + $< + +$(OUTPUT)tests/attr.o: tests/attr.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ + '-DBINDIR="$(bindir_SQ)"' -DPYTHON='"$(PYTHON_WORD)"' \ + $< + +$(OUTPUT)tests/python-use.o: tests/python-use.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \ + -DPYTHONPATH='"$(OUTPUT)python"' \ + -DPYTHON='"$(PYTHON_WORD)"' \ + $< + +$(OUTPUT)util/config.o: util/config.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< + +$(OUTPUT)ui/setup.o: ui/setup.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DLIBDIR='"$(libdir_SQ)"' $< + +$(OUTPUT)ui/browser.o: ui/browser.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< + +$(OUTPUT)ui/browsers/annotate.o: ui/browsers/annotate.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< + +$(OUTPUT)ui/browsers/hists.o: ui/browsers/hists.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< + +$(OUTPUT)ui/browsers/map.o: ui/browsers/map.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< + +$(OUTPUT)ui/browsers/scripts.o: ui/browsers/scripts.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DENABLE_SLFUTURE_CONST $< + +$(OUTPUT)util/rbtree.o: ../../lib/rbtree.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -Wno-unused-parameter -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $< + +$(OUTPUT)util/parse-events.o: util/parse-events.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -Wno-redundant-decls $< + +$(OUTPUT)util/scripting-engines/trace-event-perl.o: util/scripting-engines/trace-event-perl.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow -Wno-undef -Wno-switch-default $< + +$(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o: scripts/perl/Perf-Trace-Util/Context.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs -Wno-undef -Wno-switch-default $< + +$(OUTPUT)util/scripting-engines/trace-event-python.o: util/scripting-engines/trace-event-python.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PYTHON_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-shadow $< + +$(OUTPUT)scripts/python/Perf-Trace-Util/Context.o: scripts/python/Perf-Trace-Util/Context.c $(OUTPUT)PERF-CFLAGS + $(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) $(PYTHON_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs $< + +$(OUTPUT)perf-%: %.o $(PERFLIBS) + $(QUIET_LINK)$(CC) $(CFLAGS) -o $@ $(LDFLAGS) $(filter %.o,$^) $(LIBS) + +$(LIB_OBJS) $(BUILTIN_OBJS): $(LIB_H) +$(patsubst perf-%,%.o,$(PROGRAMS)): $(LIB_H) $(wildcard */*.h) + +# we compile into subdirectories. if the target directory is not the source directory, they might not exists. So +# we depend the various files onto their directories. +DIRECTORY_DEPS = $(LIB_OBJS) $(BUILTIN_OBJS) $(GTK_OBJS) +DIRECTORY_DEPS += $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h +$(DIRECTORY_DEPS): | $(sort $(dir $(DIRECTORY_DEPS))) +# In the second step, we make a rule to actually create these directories +$(sort $(dir $(DIRECTORY_DEPS))): + $(QUIET_MKDIR)$(MKDIR) -p $@ 2>/dev/null + +$(LIB_FILE): $(LIB_OBJS) + $(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(LIB_OBJS) + +# libtraceevent.a +TE_SOURCES = $(wildcard $(TRACE_EVENT_DIR)*.[ch]) + +$(LIBTRACEEVENT): $(TE_SOURCES) + $(QUIET_SUBDIR0)$(TRACE_EVENT_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) libtraceevent.a + +$(LIBTRACEEVENT)-clean: + $(call QUIET_CLEAN, libtraceevent) + @$(MAKE) -C $(TRACE_EVENT_DIR) O=$(OUTPUT) clean >/dev/null + +LIBLK_SOURCES = $(wildcard $(LK_PATH)*.[ch]) + +# if subdir is set, we've been called from above so target has been built +# already +$(LIBLK): $(LIBLK_SOURCES) +ifeq ($(subdir),) + $(QUIET_SUBDIR0)$(LK_DIR) $(QUIET_SUBDIR1) O=$(OUTPUT) liblk.a +endif + +$(LIBLK)-clean: +ifeq ($(subdir),) + $(call QUIET_CLEAN, liblk) + @$(MAKE) -C $(LK_DIR) O=$(OUTPUT) clean >/dev/null +endif + +help: + @echo 'Perf make targets:' + @echo ' doc - make *all* documentation (see below)' + @echo ' man - make manpage documentation (access with man )' + @echo ' html - make html documentation' + @echo ' info - make GNU info documentation (access with info )' + @echo ' pdf - make pdf documentation' + @echo ' TAGS - use etags to make tag information for source browsing' + @echo ' tags - use ctags to make tag information for source browsing' + @echo ' cscope - use cscope to make interactive browsing database' + @echo '' + @echo 'Perf install targets:' + @echo ' NOTE: documentation build requires asciidoc, xmlto packages to be installed' + @echo ' HINT: use "make prefix= " to install to a particular' + @echo ' path like make prefix=/usr/local install install-doc' + @echo ' install - install compiled binaries' + @echo ' install-doc - install *all* documentation' + @echo ' install-man - install manpage documentation' + @echo ' install-html - install html documentation' + @echo ' install-info - install GNU info documentation' + @echo ' install-pdf - install pdf documentation' + @echo '' + @echo ' quick-install-doc - alias for quick-install-man' + @echo ' quick-install-man - install the documentation quickly' + @echo ' quick-install-html - install the html documentation quickly' + @echo '' + @echo 'Perf maintainer targets:' + @echo ' clean - clean all binary objects and build output' + + +DOC_TARGETS := doc man html info pdf + +INSTALL_DOC_TARGETS := $(patsubst %,install-%,$(DOC_TARGETS)) try-install-man +INSTALL_DOC_TARGETS += quick-install-doc quick-install-man quick-install-html + +# 'make doc' should call 'make -C Documentation all' +$(DOC_TARGETS): + $(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all) + +TAGS: + $(RM) TAGS + $(FIND) . -name '*.[hcS]' -print | xargs etags -a + +tags: + $(RM) tags + $(FIND) . -name '*.[hcS]' -print | xargs ctags -a + +cscope: + $(RM) cscope* + $(FIND) . -name '*.[hcS]' -print | xargs cscope -b + +### Detect prefix changes +TRACK_CFLAGS = $(subst ','\'',$(CFLAGS)):\ + $(bindir_SQ):$(perfexecdir_SQ):$(template_dir_SQ):$(prefix_SQ) + +$(OUTPUT)PERF-CFLAGS: .FORCE-PERF-CFLAGS + @FLAGS='$(TRACK_CFLAGS)'; \ + if test x"$$FLAGS" != x"`cat $(OUTPUT)PERF-CFLAGS 2>/dev/null`" ; then \ + echo 1>&2 " FLAGS: * new build flags or prefix"; \ + echo "$$FLAGS" >$(OUTPUT)PERF-CFLAGS; \ + fi + +### Testing rules + +# GNU make supports exporting all variables by "export" without parameters. +# However, the environment gets quite big, and some programs have problems +# with that. + +check: $(OUTPUT)common-cmds.h + if sparse; \ + then \ + for i in *.c */*.c; \ + do \ + sparse $(CFLAGS) $(SPARSE_FLAGS) $$i || exit; \ + done; \ + else \ + exit 1; \ + fi + +### Installation rules + +install-gtk: + +install-bin: all install-gtk + $(call QUIET_INSTALL, binaries) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(bindir_SQ)'; \ + $(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'; \ + $(LN) '$(DESTDIR_SQ)$(bindir_SQ)/perf' '$(DESTDIR_SQ)$(bindir_SQ)/trace' + $(call QUIET_INSTALL, libexec) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)' + $(call QUIET_INSTALL, perf-archive) \ + $(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)' +ifndef NO_LIBPERL + $(call QUIET_INSTALL, perl-scripts) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \ + $(INSTALL) scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \ + $(INSTALL) scripts/perl/*.pl -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl'; \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin'; \ + $(INSTALL) scripts/perl/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/bin' +endif +ifndef NO_LIBPYTHON + $(call QUIET_INSTALL, python-scripts) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'; \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'; \ + $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'; \ + $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'; \ + $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' +endif + $(call QUIET_INSTALL, bash_completion-script) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'; \ + $(INSTALL) bash_completion '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' + $(call QUIET_INSTALL, tests) \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ + $(INSTALL) tests/attr.py '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests'; \ + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr'; \ + $(INSTALL) tests/attr/* '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/attr' + +install: install-bin try-install-man + +install-python_ext: + $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' + +# 'make install-doc' should call 'make -C Documentation install' +$(INSTALL_DOC_TARGETS): + $(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:-doc=) + +### Cleaning rules + +# +# This is here, not in config/Makefile, because config/Makefile does +# not get included for the clean target: +# +config-clean: + $(call QUIET_CLEAN, config) + @$(MAKE) -C config/feature-checks clean >/dev/null + +clean: $(LIBTRACEEVENT)-clean $(LIBLK)-clean config-clean + $(call QUIET_CLEAN, core-objs) $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf.o $(LANG_BINDINGS) $(GTK_OBJS) + $(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf + $(call QUIET_CLEAN, core-gen) $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* + $(call QUIET_CLEAN, Documentation) + @$(MAKE) -C Documentation O=$(OUTPUT) clean >/dev/null + $(python-clean) + +# +# Trick: if ../../.git does not exist - we are building out of tree for example, +# then force version regeneration: +# +ifeq ($(wildcard ../../.git/HEAD),) + GIT-HEAD-PHONY = ../../.git/HEAD +else + GIT-HEAD-PHONY = +endif + +.PHONY: all install clean config-clean strip install-gtk +.PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell +.PHONY: $(GIT-HEAD-PHONY) TAGS tags cscope .FORCE-PERF-CFLAGS + diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h index 7fcdcdbee917..e84ca76aae77 100644 --- a/tools/perf/arch/x86/include/perf_regs.h +++ b/tools/perf/arch/x86/include/perf_regs.h @@ -5,7 +5,7 @@ #include "../../util/types.h" #include -#ifndef ARCH_X86_64 +#ifndef HAVE_ARCH_X86_64_SUPPORT #define PERF_REGS_MASK ((1ULL << PERF_REG_X86_32_MAX) - 1) #else #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \ @@ -52,7 +52,7 @@ static inline const char *perf_reg_name(int id) return "FS"; case PERF_REG_X86_GS: return "GS"; -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT case PERF_REG_X86_R8: return "R8"; case PERF_REG_X86_R9: @@ -69,7 +69,7 @@ static inline const char *perf_reg_name(int id) return "R14"; case PERF_REG_X86_R15: return "R15"; -#endif /* ARCH_X86_64 */ +#endif /* HAVE_ARCH_X86_64_SUPPORT */ default: return NULL; } diff --git a/tools/perf/arch/x86/util/unwind.c b/tools/perf/arch/x86/util/unwind.c index 78d956eff96f..456a88cf5b37 100644 --- a/tools/perf/arch/x86/util/unwind.c +++ b/tools/perf/arch/x86/util/unwind.c @@ -4,7 +4,7 @@ #include "perf_regs.h" #include "../../util/unwind.h" -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT int unwind__arch_reg_id(int regnum) { int id; @@ -108,4 +108,4 @@ int unwind__arch_reg_id(int regnum) return id; } -#endif /* ARCH_X86_64 */ +#endif /* HAVE_ARCH_X86_64_SUPPORT */ diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion index 56e6a12aab59..62e157db2e2b 100644 --- a/tools/perf/bash_completion +++ b/tools/perf/bash_completion @@ -1,17 +1,87 @@ # perf completion -function_exists() +# Taken from git.git's completion script. +__my_reassemble_comp_words_by_ref() { - declare -F $1 > /dev/null - return $? + local exclude i j first + # Which word separators to exclude? + exclude="${1//[^$COMP_WORDBREAKS]}" + cword_=$COMP_CWORD + if [ -z "$exclude" ]; then + words_=("${COMP_WORDS[@]}") + return + fi + # List of word completion separators has shrunk; + # re-assemble words to complete. + for ((i=0, j=0; i < ${#COMP_WORDS[@]}; i++, j++)); do + # Append each nonempty word consisting of just + # word separator characters to the current word. + first=t + while + [ $i -gt 0 ] && + [ -n "${COMP_WORDS[$i]}" ] && + # word consists of excluded word separators + [ "${COMP_WORDS[$i]//[^$exclude]}" = "${COMP_WORDS[$i]}" ] + do + # Attach to the previous token, + # unless the previous token is the command name. + if [ $j -ge 2 ] && [ -n "$first" ]; then + ((j--)) + fi + first= + words_[$j]=${words_[j]}${COMP_WORDS[i]} + if [ $i = $COMP_CWORD ]; then + cword_=$j + fi + if (($i < ${#COMP_WORDS[@]} - 1)); then + ((i++)) + else + # Done. + return + fi + done + words_[$j]=${words_[j]}${COMP_WORDS[i]} + if [ $i = $COMP_CWORD ]; then + cword_=$j + fi + done } -function_exists __ltrim_colon_completions || +type _get_comp_words_by_ref &>/dev/null || +_get_comp_words_by_ref() +{ + local exclude cur_ words_ cword_ + if [ "$1" = "-n" ]; then + exclude=$2 + shift 2 + fi + __my_reassemble_comp_words_by_ref "$exclude" + cur_=${words_[cword_]} + while [ $# -gt 0 ]; do + case "$1" in + cur) + cur=$cur_ + ;; + prev) + prev=${words_[$cword_-1]} + ;; + words) + words=("${words_[@]}") + ;; + cword) + cword=$cword_ + ;; + esac + shift + done +} + +type __ltrim_colon_completions &>/dev/null || __ltrim_colon_completions() { if [[ "$1" == *:* && "$COMP_WORDBREAKS" == *:* ]]; then # Remove colon-word prefix from COMPREPLY items - local colon_word=${1%${1##*:}} + local colon_word=${1%"${1##*:}"} local i=${#COMPREPLY[*]} while [[ $((--i)) -ge 0 ]]; do COMPREPLY[$i]=${COMPREPLY[$i]#"$colon_word"} @@ -19,23 +89,18 @@ __ltrim_colon_completions() fi } -have perf && +type perf &>/dev/null && _perf() { - local cur prev cmd + local cur words cword prev cmd COMPREPLY=() - if function_exists _get_comp_words_by_ref; then - _get_comp_words_by_ref -n : cur prev - else - cur=$(_get_cword :) - prev=${COMP_WORDS[COMP_CWORD-1]} - fi + _get_comp_words_by_ref -n =: cur words cword prev - cmd=${COMP_WORDS[0]} + cmd=${words[0]} # List perf subcommands or long options - if [ $COMP_CWORD -eq 1 ]; then + if [ $cword -eq 1 ]; then if [[ $cur == --* ]]; then COMPREPLY=( $( compgen -W '--help --version \ --exec-path --html-path --paginate --no-pager \ @@ -45,18 +110,17 @@ _perf() COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) fi # List possible events for -e option - elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; then + elif [[ $prev == "-e" && "${words[1]}" == @(record|stat|top) ]]; then evts=$($cmd list --raw-dump) COMPREPLY=( $( compgen -W '$evts' -- "$cur" ) ) __ltrim_colon_completions $cur # List long option names elif [[ $cur == --* ]]; then - subcmd=${COMP_WORDS[1]} + subcmd=${words[1]} opts=$($cmd $subcmd --list-opts) COMPREPLY=( $( compgen -W '$opts' -- "$cur" ) ) - # Fall down to list regular files - else - _filedir fi } && -complete -F _perf perf + +complete -o bashdefault -o default -o nospace -F _perf perf 2>/dev/null \ + || complete -o default -o nospace -F _perf perf diff --git a/tools/perf/bench/mem-memcpy-arch.h b/tools/perf/bench/mem-memcpy-arch.h index a72e36cb5394..57b4ed871459 100644 --- a/tools/perf/bench/mem-memcpy-arch.h +++ b/tools/perf/bench/mem-memcpy-arch.h @@ -1,5 +1,5 @@ -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT #define MEMCPY_FN(fn, name, desc) \ extern void *fn(void *, const void *, size_t); diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c index 8cdca43016b2..5ce71d3b72cf 100644 --- a/tools/perf/bench/mem-memcpy.c +++ b/tools/perf/bench/mem-memcpy.c @@ -58,7 +58,7 @@ struct routine routines[] = { { "default", "Default memcpy() provided by glibc", memcpy }, -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT #define MEMCPY_FN(fn, name, desc) { name, desc, fn }, #include "mem-memcpy-x86-64-asm-def.h" diff --git a/tools/perf/bench/mem-memset-arch.h b/tools/perf/bench/mem-memset-arch.h index a040fa77665b..633800cb0dcb 100644 --- a/tools/perf/bench/mem-memset-arch.h +++ b/tools/perf/bench/mem-memset-arch.h @@ -1,5 +1,5 @@ -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT #define MEMSET_FN(fn, name, desc) \ extern void *fn(void *, int, size_t); diff --git a/tools/perf/bench/mem-memset.c b/tools/perf/bench/mem-memset.c index 4a2f12081964..9af79d2b18e5 100644 --- a/tools/perf/bench/mem-memset.c +++ b/tools/perf/bench/mem-memset.c @@ -58,7 +58,7 @@ static const struct routine routines[] = { { "default", "Default memset() provided by glibc", memset }, -#ifdef ARCH_X86_64 +#ifdef HAVE_ARCH_X86_64_SUPPORT #define MEMSET_FN(fn, name, desc) { name, desc, fn }, #include "mem-memset-x86-64-asm-def.h" diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c index 30d1c3225b46..d4c83c60b9b2 100644 --- a/tools/perf/bench/numa.c +++ b/tools/perf/bench/numa.c @@ -429,14 +429,14 @@ static int parse_cpu_list(const char *arg) return 0; } -static void parse_setup_cpu_list(void) +static int parse_setup_cpu_list(void) { struct thread_data *td; char *str0, *str; int t; if (!g->p.cpu_list_str) - return; + return 0; dprintf("g->p.nr_tasks: %d\n", g->p.nr_tasks); @@ -500,8 +500,12 @@ static void parse_setup_cpu_list(void) dprintf("CPUs: %d_%d-%d#%dx%d\n", bind_cpu_0, bind_len, bind_cpu_1, step, mul); - BUG_ON(bind_cpu_0 < 0 || bind_cpu_0 >= g->p.nr_cpus); - BUG_ON(bind_cpu_1 < 0 || bind_cpu_1 >= g->p.nr_cpus); + if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >= g->p.nr_cpus) { + printf("\nTest not applicable, system has only %d CPUs.\n", g->p.nr_cpus); + return -1; + } + + BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0); BUG_ON(bind_cpu_0 > bind_cpu_1); for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) { @@ -541,6 +545,7 @@ out: printf("# NOTE: %d tasks bound, %d tasks unbound\n", t, g->p.nr_tasks - t); free(str0); + return 0; } static int parse_cpus_opt(const struct option *opt __maybe_unused, @@ -561,14 +566,14 @@ static int parse_node_list(const char *arg) return 0; } -static void parse_setup_node_list(void) +static int parse_setup_node_list(void) { struct thread_data *td; char *str0, *str; int t; if (!g->p.node_list_str) - return; + return 0; dprintf("g->p.nr_tasks: %d\n", g->p.nr_tasks); @@ -619,8 +624,12 @@ static void parse_setup_node_list(void) dprintf("NODEs: %d-%d #%d\n", bind_node_0, bind_node_1, step); - BUG_ON(bind_node_0 < 0 || bind_node_0 >= g->p.nr_nodes); - BUG_ON(bind_node_1 < 0 || bind_node_1 >= g->p.nr_nodes); + if (bind_node_0 >= g->p.nr_nodes || bind_node_1 >= g->p.nr_nodes) { + printf("\nTest not applicable, system has only %d nodes.\n", g->p.nr_nodes); + return -1; + } + + BUG_ON(bind_node_0 < 0 || bind_node_1 < 0); BUG_ON(bind_node_0 > bind_node_1); for (bind_node = bind_node_0; bind_node <= bind_node_1; bind_node += step) { @@ -651,6 +660,7 @@ out: printf("# NOTE: %d tasks mem-bound, %d tasks unbound\n", t, g->p.nr_tasks - t); free(str0); + return 0; } static int parse_nodes_opt(const struct option *opt __maybe_unused, @@ -1110,7 +1120,7 @@ static void *worker_thread(void *__tdata) /* Check whether our max runtime timed out: */ if (g->p.nr_secs) { timersub(&stop, &start0, &diff); - if (diff.tv_sec >= g->p.nr_secs) { + if ((u32)diff.tv_sec >= g->p.nr_secs) { g->stop_work = true; break; } @@ -1157,7 +1167,7 @@ static void *worker_thread(void *__tdata) runtime_ns_max += diff.tv_usec * 1000; if (details >= 0) { - printf(" #%2d / %2d: %14.2lf nsecs/op [val: %016lx]\n", + printf(" #%2d / %2d: %14.2lf nsecs/op [val: %016"PRIx64"]\n", process_nr, thread_nr, runtime_ns_max / bytes_done, val); } fflush(stdout); @@ -1356,8 +1366,8 @@ static int init(void) init_thread_data(); tprintf("#\n"); - parse_setup_cpu_list(); - parse_setup_node_list(); + if (parse_setup_cpu_list() || parse_setup_node_list()) + return -1; tprintf("#\n"); print_summary(); @@ -1600,7 +1610,6 @@ static int run_bench_numa(const char *name, const char **argv) return 0; err: - usage_with_options(numa_usage, options); return -1; } @@ -1701,8 +1710,7 @@ static int bench_all(void) BUG_ON(ret < 0); for (i = 0; i < nr; i++) { - if (run_bench_numa(tests[i][0], tests[i] + 1)) - return -1; + run_bench_numa(tests[i][0], tests[i] + 1); } printf("\n"); diff --git a/tools/perf/bench/sched-pipe.c b/tools/perf/bench/sched-pipe.c index 69cfba8d4c6c..07a8d7646a15 100644 --- a/tools/perf/bench/sched-pipe.c +++ b/tools/perf/bench/sched-pipe.c @@ -7,9 +7,7 @@ * Based on pipe-test-1m.c by Ingo Molnar * http://people.redhat.com/mingo/cfs-scheduler/tools/pipe-test-1m.c * Ported to perf by Hitoshi Mitake - * */ - #include "../perf.h" #include "../util/util.h" #include "../util/parse-options.h" @@ -28,12 +26,24 @@ #include #include +#include + +struct thread_data { + int nr; + int pipe_read; + int pipe_write; + pthread_t pthread; +}; + #define LOOPS_DEFAULT 1000000 -static int loops = LOOPS_DEFAULT; +static int loops = LOOPS_DEFAULT; + +/* Use processes by default: */ +static bool threaded; static const struct option options[] = { - OPT_INTEGER('l', "loop", &loops, - "Specify number of loops"), + OPT_INTEGER('l', "loop", &loops, "Specify number of loops"), + OPT_BOOLEAN('T', "threaded", &threaded, "Specify threads/process based task setup"), OPT_END() }; @@ -42,13 +52,37 @@ static const char * const bench_sched_pipe_usage[] = { NULL }; -int bench_sched_pipe(int argc, const char **argv, - const char *prefix __maybe_unused) +static void *worker_thread(void *__tdata) { - int pipe_1[2], pipe_2[2]; + struct thread_data *td = __tdata; int m = 0, i; + int ret; + + for (i = 0; i < loops; i++) { + if (!td->nr) { + ret = read(td->pipe_read, &m, sizeof(int)); + BUG_ON(ret != sizeof(int)); + ret = write(td->pipe_write, &m, sizeof(int)); + BUG_ON(ret != sizeof(int)); + } else { + ret = write(td->pipe_write, &m, sizeof(int)); + BUG_ON(ret != sizeof(int)); + ret = read(td->pipe_read, &m, sizeof(int)); + BUG_ON(ret != sizeof(int)); + } + } + + return NULL; +} + +int bench_sched_pipe(int argc, const char **argv, const char *prefix __maybe_unused) +{ + struct thread_data threads[2], *td; + int pipe_1[2], pipe_2[2]; struct timeval start, stop, diff; unsigned long long result_usec = 0; + int nr_threads = 2; + int t; /* * why does "ret" exist? @@ -58,43 +92,66 @@ int bench_sched_pipe(int argc, const char **argv, int __maybe_unused ret, wait_stat; pid_t pid, retpid __maybe_unused; - argc = parse_options(argc, argv, options, - bench_sched_pipe_usage, 0); + argc = parse_options(argc, argv, options, bench_sched_pipe_usage, 0); BUG_ON(pipe(pipe_1)); BUG_ON(pipe(pipe_2)); - pid = fork(); - assert(pid >= 0); - gettimeofday(&start, NULL); - if (!pid) { - for (i = 0; i < loops; i++) { - ret = read(pipe_1[0], &m, sizeof(int)); - ret = write(pipe_2[1], &m, sizeof(int)); - } - } else { - for (i = 0; i < loops; i++) { - ret = write(pipe_1[1], &m, sizeof(int)); - ret = read(pipe_2[0], &m, sizeof(int)); + for (t = 0; t < nr_threads; t++) { + td = threads + t; + + td->nr = t; + + if (t == 0) { + td->pipe_read = pipe_1[0]; + td->pipe_write = pipe_2[1]; + } else { + td->pipe_write = pipe_1[1]; + td->pipe_read = pipe_2[0]; } } - gettimeofday(&stop, NULL); - timersub(&stop, &start, &diff); - if (pid) { + if (threaded) { + + for (t = 0; t < nr_threads; t++) { + td = threads + t; + + ret = pthread_create(&td->pthread, NULL, worker_thread, td); + BUG_ON(ret); + } + + for (t = 0; t < nr_threads; t++) { + td = threads + t; + + ret = pthread_join(td->pthread, NULL); + BUG_ON(ret); + } + + } else { + pid = fork(); + assert(pid >= 0); + + if (!pid) { + worker_thread(threads + 0); + exit(0); + } else { + worker_thread(threads + 1); + } + retpid = waitpid(pid, &wait_stat, 0); assert((retpid == pid) && WIFEXITED(wait_stat)); - } else { - exit(0); } + gettimeofday(&stop, NULL); + timersub(&stop, &start, &diff); + switch (bench_format) { case BENCH_FORMAT_DEFAULT: - printf("# Executed %d pipe operations between two tasks\n\n", - loops); + printf("# Executed %d pipe operations between two %s\n\n", + loops, threaded ? "threads" : "processes"); result_usec = diff.tv_sec * 1000000; result_usec += diff.tv_usec; diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 5ebd0c3b71b6..03cfa592071f 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -28,8 +28,10 @@ #include "util/hist.h" #include "util/session.h" #include "util/tool.h" +#include "util/data.h" #include "arch/common.h" +#include #include struct perf_annotate { @@ -63,7 +65,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, return 0; } - he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0); if (he == NULL) return -ENOMEM; @@ -142,8 +144,18 @@ find_next: if (use_browser == 2) { int ret; + int (*annotate)(struct hist_entry *he, + struct perf_evsel *evsel, + struct hist_browser_timer *hbt); + + annotate = dlsym(perf_gtk_handle, + "hist_entry__gtk_annotate"); + if (annotate == NULL) { + ui__error("GTK browser not found!\n"); + return; + } - ret = hist_entry__gtk_annotate(he, evsel, NULL); + ret = annotate(he, evsel, NULL); if (!ret || !ann->skip_missing) return; @@ -188,9 +200,13 @@ static int __cmd_annotate(struct perf_annotate *ann) struct perf_session *session; struct perf_evsel *pos; u64 total_nr_samples; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + .force = ann->force, + }; - session = perf_session__new(input_name, O_RDONLY, - ann->force, false, &ann->tool); + session = perf_session__new(&file, false, &ann->tool); if (session == NULL) return -ENOMEM; @@ -243,12 +259,21 @@ static int __cmd_annotate(struct perf_annotate *ann) } if (total_nr_samples == 0) { - ui__error("The %s file has no samples!\n", session->filename); + ui__error("The %s file has no samples!\n", file.path); goto out_delete; } - if (use_browser == 2) - perf_gtk__show_annotations(); + if (use_browser == 2) { + void (*show_annotations)(void); + + show_annotations = dlsym(perf_gtk_handle, + "perf_gtk__show_annotations"); + if (show_annotations == NULL) { + ui__error("GTK browser not found!\n"); + goto out_delete; + } + show_annotations(); + } out_delete: /* diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index 77298bf892b8..33af80fa49cf 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -35,7 +35,7 @@ struct bench_suite { /* sentinel: easy for help */ #define suite_all { "all", "Test all benchmark suites", NULL } -#ifdef LIBNUMA_SUPPORT +#ifdef HAVE_LIBNUMA_SUPPORT static struct bench_suite numa_suites[] = { { "mem", "Benchmark for NUMA workloads", @@ -80,7 +80,7 @@ struct bench_subsys { }; static struct bench_subsys subsystems[] = { -#ifdef LIBNUMA_SUPPORT +#ifdef HAVE_LIBNUMA_SUPPORT { "numa", "NUMA scheduling and MM behavior", numa_suites }, diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c index c96c8fa38243..cfede86161d8 100644 --- a/tools/perf/builtin-buildid-cache.c +++ b/tools/perf/builtin-buildid-cache.c @@ -6,6 +6,11 @@ * Copyright (C) 2010, Red Hat Inc. * Copyright (C) 2010, Arnaldo Carvalho de Melo */ +#include +#include +#include +#include +#include #include "builtin.h" #include "perf.h" #include "util/cache.h" @@ -17,6 +22,140 @@ #include "util/session.h" #include "util/symbol.h" +static int build_id_cache__kcore_buildid(const char *proc_dir, char *sbuildid) +{ + char root_dir[PATH_MAX]; + char notes[PATH_MAX]; + u8 build_id[BUILD_ID_SIZE]; + char *p; + + strlcpy(root_dir, proc_dir, sizeof(root_dir)); + + p = strrchr(root_dir, '/'); + if (!p) + return -1; + *p = '\0'; + + scnprintf(notes, sizeof(notes), "%s/sys/kernel/notes", root_dir); + + if (sysfs__read_build_id(notes, build_id, sizeof(build_id))) + return -1; + + build_id__sprintf(build_id, sizeof(build_id), sbuildid); + + return 0; +} + +static int build_id_cache__kcore_dir(char *dir, size_t sz) +{ + struct timeval tv; + struct tm tm; + char dt[32]; + + if (gettimeofday(&tv, NULL) || !localtime_r(&tv.tv_sec, &tm)) + return -1; + + if (!strftime(dt, sizeof(dt), "%Y%m%d%H%M%S", &tm)) + return -1; + + scnprintf(dir, sz, "%s%02u", dt, (unsigned)tv.tv_usec / 10000); + + return 0; +} + +static int build_id_cache__kcore_existing(const char *from_dir, char *to_dir, + size_t to_dir_sz) +{ + char from[PATH_MAX]; + char to[PATH_MAX]; + struct dirent *dent; + int ret = -1; + DIR *d; + + d = opendir(to_dir); + if (!d) + return -1; + + scnprintf(from, sizeof(from), "%s/modules", from_dir); + + while (1) { + dent = readdir(d); + if (!dent) + break; + if (dent->d_type != DT_DIR) + continue; + scnprintf(to, sizeof(to), "%s/%s/modules", to_dir, + dent->d_name); + if (!compare_proc_modules(from, to)) { + scnprintf(to, sizeof(to), "%s/%s", to_dir, + dent->d_name); + strlcpy(to_dir, to, to_dir_sz); + ret = 0; + break; + } + } + + closedir(d); + + return ret; +} + +static int build_id_cache__add_kcore(const char *filename, const char *debugdir) +{ + char dir[32], sbuildid[BUILD_ID_SIZE * 2 + 1]; + char from_dir[PATH_MAX], to_dir[PATH_MAX]; + char *p; + + strlcpy(from_dir, filename, sizeof(from_dir)); + + p = strrchr(from_dir, '/'); + if (!p || strcmp(p + 1, "kcore")) + return -1; + *p = '\0'; + + if (build_id_cache__kcore_buildid(from_dir, sbuildid)) + return -1; + + scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s", + debugdir, sbuildid); + + if (!build_id_cache__kcore_existing(from_dir, to_dir, sizeof(to_dir))) { + pr_debug("same kcore found in %s\n", to_dir); + return 0; + } + + if (build_id_cache__kcore_dir(dir, sizeof(dir))) + return -1; + + scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s/%s", + debugdir, sbuildid, dir); + + if (mkdir_p(to_dir, 0755)) + return -1; + + if (kcore_copy(from_dir, to_dir)) { + /* Remove YYYYmmddHHMMSShh directory */ + if (!rmdir(to_dir)) { + p = strrchr(to_dir, '/'); + if (p) + *p = '\0'; + /* Try to remove buildid directory */ + if (!rmdir(to_dir)) { + p = strrchr(to_dir, '/'); + if (p) + *p = '\0'; + /* Try to remove [kernel.kcore] directory */ + rmdir(to_dir); + } + } + return -1; + } + + pr_debug("kcore added to build-id cache directory %s\n", to_dir); + + return 0; +} + static int build_id_cache__add_file(const char *filename, const char *debugdir) { char sbuild_id[BUILD_ID_SIZE * 2 + 1]; @@ -82,8 +221,12 @@ static bool dso__missing_buildid_cache(struct dso *dso, int parm __maybe_unused) static int build_id_cache__fprintf_missing(const char *filename, bool force, FILE *fp) { - struct perf_session *session = perf_session__new(filename, O_RDONLY, - force, false, NULL); + struct perf_data_file file = { + .path = filename, + .mode = PERF_DATA_MODE_READ, + .force = force, + }; + struct perf_session *session = perf_session__new(&file, false, NULL); if (session == NULL) return -1; @@ -130,11 +273,14 @@ int cmd_buildid_cache(int argc, const char **argv, char const *add_name_list_str = NULL, *remove_name_list_str = NULL, *missing_filename = NULL, - *update_name_list_str = NULL; + *update_name_list_str = NULL, + *kcore_filename; const struct option buildid_cache_options[] = { OPT_STRING('a', "add", &add_name_list_str, "file list", "file(s) to add"), + OPT_STRING('k', "kcore", &kcore_filename, + "file", "kcore file to add"), OPT_STRING('r', "remove", &remove_name_list_str, "file list", "file(s) to remove"), OPT_STRING('M', "missing", &missing_filename, "file", @@ -217,5 +363,9 @@ int cmd_buildid_cache(int argc, const char **argv, } } + if (kcore_filename && + build_id_cache__add_kcore(kcore_filename, debugdir)) + pr_warning("Couldn't add %s\n", kcore_filename); + return ret; } diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c index e74366a13218..ed3873b3e238 100644 --- a/tools/perf/builtin-buildid-list.c +++ b/tools/perf/builtin-buildid-list.c @@ -15,6 +15,7 @@ #include "util/parse-options.h" #include "util/session.h" #include "util/symbol.h" +#include "util/data.h" static int sysfs__fprintf_build_id(FILE *fp) { @@ -52,6 +53,11 @@ static bool dso__skip_buildid(struct dso *dso, int with_hits) static int perf_session__list_build_ids(bool force, bool with_hits) { struct perf_session *session; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + .force = force, + }; symbol__elf_init(); /* @@ -60,15 +66,14 @@ static int perf_session__list_build_ids(bool force, bool with_hits) if (filename__fprintf_build_id(input_name, stdout)) goto out; - session = perf_session__new(input_name, O_RDONLY, force, false, - &build_id__mark_dso_hit_ops); + session = perf_session__new(&file, false, &build_id__mark_dso_hit_ops); if (session == NULL) return -1; /* * in pipe-mode, the only way to get the buildids is to parse * the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID */ - if (with_hits || session->fd_pipe) + if (with_hits || perf_data_file__is_pipe(&file)) perf_session__process_events(session, &build_id__mark_dso_hit_ops); perf_session__fprintf_dsos_buildid(session, stdout, dso__skip_buildid, with_hits); diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index f28799e94f2a..419d27dd708b 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -16,6 +16,7 @@ #include "util/sort.h" #include "util/symbol.h" #include "util/util.h" +#include "util/data.h" #include #include @@ -42,7 +43,7 @@ struct diff_hpp_fmt { struct data__file { struct perf_session *session; - const char *file; + struct perf_data_file file; int idx; struct hists *hists; struct diff_hpp_fmt fmt[PERF_HPP_DIFF__MAX_INDEX]; @@ -304,9 +305,10 @@ static int formula_fprintf(struct hist_entry *he, struct hist_entry *pair, static int hists__add_entry(struct hists *self, struct addr_location *al, u64 period, - u64 weight) + u64 weight, u64 transaction) { - if (__hists__add_entry(self, al, NULL, period, weight) != NULL) + if (__hists__add_entry(self, al, NULL, period, weight, transaction) + != NULL) return 0; return -ENOMEM; } @@ -328,7 +330,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused, if (al.filtered) return 0; - if (hists__add_entry(&evsel->hists, &al, sample->period, sample->weight)) { + if (hists__add_entry(&evsel->hists, &al, sample->period, + sample->weight, sample->transaction)) { pr_warning("problem incrementing symbol period, skipping event\n"); return -1; } @@ -599,7 +602,7 @@ static void data__fprintf(void) data__for_each_file(i, d) fprintf(stdout, "# [%d] %s %s\n", - d->idx, d->file, + d->idx, d->file.path, !d->idx ? "(Baseline)" : ""); fprintf(stdout, "#\n"); @@ -661,17 +664,16 @@ static int __cmd_diff(void) int ret = -EINVAL, i; data__for_each_file(i, d) { - d->session = perf_session__new(d->file, O_RDONLY, force, - false, &tool); + d->session = perf_session__new(&d->file, false, &tool); if (!d->session) { - pr_err("Failed to open %s\n", d->file); + pr_err("Failed to open %s\n", d->file.path); ret = -ENOMEM; goto out_delete; } ret = perf_session__process_events(d->session, &tool); if (ret) { - pr_err("Failed to process %s\n", d->file); + pr_err("Failed to process %s\n", d->file.path); goto out_delete; } @@ -1014,7 +1016,12 @@ static int data_init(int argc, const char **argv) return -ENOMEM; data__for_each_file(i, d) { - d->file = use_default ? defaults[i] : argv[i]; + struct perf_data_file *file = &d->file; + + file->path = use_default ? defaults[i] : argv[i]; + file->mode = PERF_DATA_MODE_READ, + file->force = force, + d->idx = i; } diff --git a/tools/perf/builtin-evlist.c b/tools/perf/builtin-evlist.c index 05bd9dfe875c..20b0f12763b0 100644 --- a/tools/perf/builtin-evlist.c +++ b/tools/perf/builtin-evlist.c @@ -14,13 +14,18 @@ #include "util/parse-events.h" #include "util/parse-options.h" #include "util/session.h" +#include "util/data.h" static int __cmd_evlist(const char *file_name, struct perf_attr_details *details) { struct perf_session *session; struct perf_evsel *pos; + struct perf_data_file file = { + .path = file_name, + .mode = PERF_DATA_MODE_READ, + }; - session = perf_session__new(file_name, O_RDONLY, 0, false, NULL); + session = perf_session__new(&file, 0, NULL); if (session == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index afe377b2884f..4aa6d7850bcc 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -15,6 +15,7 @@ #include "util/tool.h" #include "util/debug.h" #include "util/build-id.h" +#include "util/data.h" #include "util/parse-options.h" @@ -231,7 +232,7 @@ static int perf_event__inject_buildid(struct perf_tool *tool, * account this as unresolved. */ } else { -#ifdef LIBELF_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT pr_warning("no symbols found in %s, maybe " "install a debug package?\n", al.map->dso->long_name); @@ -345,6 +346,10 @@ static int __cmd_inject(struct perf_inject *inject) { struct perf_session *session; int ret = -EINVAL; + struct perf_data_file file = { + .path = inject->input_name, + .mode = PERF_DATA_MODE_READ, + }; signal(SIGINT, sig_handler); @@ -355,7 +360,7 @@ static int __cmd_inject(struct perf_inject *inject) inject->tool.tracing_data = perf_event__repipe_tracing_data; } - session = perf_session__new(inject->input_name, O_RDONLY, false, true, &inject->tool); + session = perf_session__new(&file, true, &inject->tool); if (session == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index c2dff9cb1f2c..1126382659a9 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -13,6 +13,7 @@ #include "util/parse-options.h" #include "util/trace-event.h" +#include "util/data.h" #include "util/debug.h" @@ -101,7 +102,7 @@ static int setup_cpunode_map(void) dir1 = opendir(PATH_SYS_NODE); if (!dir1) - return -1; + return 0; while ((dent1 = readdir(dir1)) != NULL) { if (dent1->d_type != DT_DIR || @@ -486,8 +487,12 @@ static int __cmd_kmem(void) { "kmem:kfree", perf_evsel__process_free_event, }, { "kmem:kmem_cache_free", perf_evsel__process_free_event, }, }; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; - session = perf_session__new(input_name, O_RDONLY, 0, false, &perf_kmem); + session = perf_session__new(&file, false, &perf_kmem); if (session == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 935d52216c89..188bb29b373f 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -17,6 +17,7 @@ #include "util/tool.h" #include "util/stat.h" #include "util/top.h" +#include "util/data.h" #include #include @@ -1215,10 +1216,13 @@ static int read_events(struct perf_kvm_stat *kvm) .comm = perf_event__process_comm, .ordered_samples = true, }; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; kvm->tool = eops; - kvm->session = perf_session__new(kvm->file_name, O_RDONLY, 0, false, - &kvm->tool); + kvm->session = perf_session__new(&file, false, &kvm->tool); if (!kvm->session) { pr_err("Initializing perf session failed\n"); return -EINVAL; @@ -1426,8 +1430,9 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, const struct option live_options[] = { OPT_STRING('p', "pid", &kvm->opts.target.pid, "pid", "record events on existing process id"), - OPT_UINTEGER('m', "mmap-pages", &kvm->opts.mmap_pages, - "number of mmap data pages"), + OPT_CALLBACK('m', "mmap-pages", &kvm->opts.mmap_pages, "pages", + "number of mmap data pages", + perf_evlist__parse_mmap_pages), OPT_INCR('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), OPT_BOOLEAN('a', "all-cpus", &kvm->opts.target.system_wide, @@ -1449,6 +1454,9 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, "perf kvm stat live []", NULL }; + struct perf_data_file file = { + .mode = PERF_DATA_MODE_WRITE, + }; /* event handling */ @@ -1513,7 +1521,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, /* * perf session */ - kvm->session = perf_session__new(NULL, O_WRONLY, false, false, &kvm->tool); + kvm->session = perf_session__new(&file, false, &kvm->tool); if (kvm->session == NULL) { err = -ENOMEM; goto out; diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index ee33ba2f05dd..33c7253295b9 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -15,6 +15,7 @@ #include "util/debug.h" #include "util/session.h" #include "util/tool.h" +#include "util/data.h" #include #include @@ -56,7 +57,9 @@ struct lock_stat { unsigned int nr_readlock; unsigned int nr_trylock; + /* these times are in nano sec. */ + u64 avg_wait_time; u64 wait_time_total; u64 wait_time_min; u64 wait_time_max; @@ -208,6 +211,7 @@ static struct thread_stat *thread_stat_findnew_first(u32 tid) SINGLE_KEY(nr_acquired) SINGLE_KEY(nr_contended) +SINGLE_KEY(avg_wait_time) SINGLE_KEY(wait_time_total) SINGLE_KEY(wait_time_max) @@ -244,6 +248,7 @@ static struct rb_root result; /* place to store sorted data */ struct lock_key keys[] = { DEF_KEY_LOCK(acquired, nr_acquired), DEF_KEY_LOCK(contended, nr_contended), + DEF_KEY_LOCK(avg_wait, avg_wait_time), DEF_KEY_LOCK(wait_total, wait_time_total), DEF_KEY_LOCK(wait_min, wait_time_min), DEF_KEY_LOCK(wait_max, wait_time_max), @@ -321,10 +326,12 @@ static struct lock_stat *lock_stat_findnew(void *addr, const char *name) new->addr = addr; new->name = zalloc(sizeof(char) * strlen(name) + 1); - if (!new->name) + if (!new->name) { + free(new); goto alloc_failed; - strcpy(new->name, name); + } + strcpy(new->name, name); new->wait_time_min = ULLONG_MAX; list_add(&new->hash_entry, entry); @@ -400,17 +407,17 @@ static int report_lock_acquire_event(struct perf_evsel *evsel, ls = lock_stat_findnew(addr, name); if (!ls) - return -1; + return -ENOMEM; if (ls->discard) return 0; ts = thread_stat_findnew(sample->tid); if (!ts) - return -1; + return -ENOMEM; seq = get_seq(ts, addr); if (!seq) - return -1; + return -ENOMEM; switch (seq->state) { case SEQ_STATE_UNINITIALIZED: @@ -446,7 +453,6 @@ broken: list_del(&seq->list); free(seq); goto end; - break; default: BUG_ON("Unknown state of lock sequence found!\n"); break; @@ -473,17 +479,17 @@ static int report_lock_acquired_event(struct perf_evsel *evsel, ls = lock_stat_findnew(addr, name); if (!ls) - return -1; + return -ENOMEM; if (ls->discard) return 0; ts = thread_stat_findnew(sample->tid); if (!ts) - return -1; + return -ENOMEM; seq = get_seq(ts, addr); if (!seq) - return -1; + return -ENOMEM; switch (seq->state) { case SEQ_STATE_UNINITIALIZED: @@ -508,8 +514,6 @@ static int report_lock_acquired_event(struct perf_evsel *evsel, list_del(&seq->list); free(seq); goto end; - break; - default: BUG_ON("Unknown state of lock sequence found!\n"); break; @@ -517,6 +521,7 @@ static int report_lock_acquired_event(struct perf_evsel *evsel, seq->state = SEQ_STATE_ACQUIRED; ls->nr_acquired++; + ls->avg_wait_time = ls->nr_contended ? ls->wait_time_total/ls->nr_contended : 0; seq->prev_event_time = sample->time; end: return 0; @@ -536,17 +541,17 @@ static int report_lock_contended_event(struct perf_evsel *evsel, ls = lock_stat_findnew(addr, name); if (!ls) - return -1; + return -ENOMEM; if (ls->discard) return 0; ts = thread_stat_findnew(sample->tid); if (!ts) - return -1; + return -ENOMEM; seq = get_seq(ts, addr); if (!seq) - return -1; + return -ENOMEM; switch (seq->state) { case SEQ_STATE_UNINITIALIZED: @@ -564,7 +569,6 @@ static int report_lock_contended_event(struct perf_evsel *evsel, list_del(&seq->list); free(seq); goto end; - break; default: BUG_ON("Unknown state of lock sequence found!\n"); break; @@ -572,6 +576,7 @@ static int report_lock_contended_event(struct perf_evsel *evsel, seq->state = SEQ_STATE_CONTENDED; ls->nr_contended++; + ls->avg_wait_time = ls->wait_time_total/ls->nr_contended; seq->prev_event_time = sample->time; end: return 0; @@ -591,22 +596,21 @@ static int report_lock_release_event(struct perf_evsel *evsel, ls = lock_stat_findnew(addr, name); if (!ls) - return -1; + return -ENOMEM; if (ls->discard) return 0; ts = thread_stat_findnew(sample->tid); if (!ts) - return -1; + return -ENOMEM; seq = get_seq(ts, addr); if (!seq) - return -1; + return -ENOMEM; switch (seq->state) { case SEQ_STATE_UNINITIALIZED: goto end; - break; case SEQ_STATE_ACQUIRED: break; case SEQ_STATE_READ_ACQUIRED: @@ -624,7 +628,6 @@ static int report_lock_release_event(struct perf_evsel *evsel, ls->discard = 1; bad_hist[BROKEN_RELEASE]++; goto free_seq; - break; default: BUG_ON("Unknown state of lock sequence found!\n"); break; @@ -690,7 +693,7 @@ static void print_bad_events(int bad, int total) pr_info("\n=== output for debug===\n\n"); pr_info("bad: %d, total: %d\n", bad, total); - pr_info("bad rate: %f %%\n", (double)bad / (double)total * 100); + pr_info("bad rate: %.2f %%\n", (double)bad / (double)total * 100); pr_info("histogram of events caused bad sequence\n"); for (i = 0; i < BROKEN_MAX; i++) pr_info(" %10s: %d\n", name[i], bad_hist[i]); @@ -707,6 +710,7 @@ static void print_result(void) pr_info("%10s ", "acquired"); pr_info("%10s ", "contended"); + pr_info("%15s ", "avg wait (ns)"); pr_info("%15s ", "total wait (ns)"); pr_info("%15s ", "max wait (ns)"); pr_info("%15s ", "min wait (ns)"); @@ -738,6 +742,7 @@ static void print_result(void) pr_info("%10u ", st->nr_acquired); pr_info("%10u ", st->nr_contended); + pr_info("%15" PRIu64 " ", st->avg_wait_time); pr_info("%15" PRIu64 " ", st->wait_time_total); pr_info("%15" PRIu64 " ", st->wait_time_max); pr_info("%15" PRIu64 " ", st->wait_time_min == ULLONG_MAX ? @@ -822,6 +827,18 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused, return 0; } +static void sort_result(void) +{ + unsigned int i; + struct lock_stat *st; + + for (i = 0; i < LOCKHASH_SIZE; i++) { + list_for_each_entry(st, &lockhash_table[i], hash_entry) { + insert_to_result(st, compare); + } + } +} + static const struct perf_evsel_str_handler lock_tracepoints[] = { { "lock:lock_acquire", perf_evsel__process_lock_acquire, }, /* CONFIG_LOCKDEP */ { "lock:lock_acquired", perf_evsel__process_lock_acquired, }, /* CONFIG_LOCKDEP, CONFIG_LOCK_STAT */ @@ -829,51 +846,51 @@ static const struct perf_evsel_str_handler lock_tracepoints[] = { { "lock:lock_release", perf_evsel__process_lock_release, }, /* CONFIG_LOCKDEP */ }; -static int read_events(void) +static int __cmd_report(bool display_info) { + int err = -EINVAL; struct perf_tool eops = { .sample = process_sample_event, .comm = perf_event__process_comm, .ordered_samples = true, }; - session = perf_session__new(input_name, O_RDONLY, 0, false, &eops); + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; + + session = perf_session__new(&file, false, &eops); if (!session) { pr_err("Initializing perf session failed\n"); - return -1; + return -ENOMEM; } + if (!perf_session__has_traces(session, "lock record")) + goto out_delete; + if (perf_session__set_tracepoints_handlers(session, lock_tracepoints)) { pr_err("Initializing perf session tracepoint handlers failed\n"); - return -1; + goto out_delete; } - return perf_session__process_events(session, &eops); -} - -static void sort_result(void) -{ - unsigned int i; - struct lock_stat *st; + if (select_key()) + goto out_delete; - for (i = 0; i < LOCKHASH_SIZE; i++) { - list_for_each_entry(st, &lockhash_table[i], hash_entry) { - insert_to_result(st, compare); - } - } -} + err = perf_session__process_events(session, &eops); + if (err) + goto out_delete; -static int __cmd_report(void) -{ setup_pager(); + if (display_info) /* used for info subcommand */ + err = dump_info(); + else { + sort_result(); + print_result(); + } - if ((select_key() != 0) || - (read_events() != 0)) - return -1; - - sort_result(); - print_result(); - - return 0; +out_delete: + perf_session__delete(session); + return err; } static int __cmd_record(int argc, const char **argv) @@ -881,7 +898,7 @@ static int __cmd_record(int argc, const char **argv) const char *record_args[] = { "record", "-R", "-m", "1024", "-c", "1", }; - unsigned int rec_argc, i, j; + unsigned int rec_argc, i, j, ret; const char **rec_argv; for (i = 0; i < ARRAY_SIZE(lock_tracepoints); i++) { @@ -898,7 +915,7 @@ static int __cmd_record(int argc, const char **argv) rec_argc += 2 * ARRAY_SIZE(lock_tracepoints); rec_argv = calloc(rec_argc + 1, sizeof(char *)); - if (rec_argv == NULL) + if (!rec_argv) return -ENOMEM; for (i = 0; i < ARRAY_SIZE(record_args); i++) @@ -914,7 +931,9 @@ static int __cmd_record(int argc, const char **argv) BUG_ON(i != rec_argc); - return cmd_record(i, rec_argv, NULL); + ret = cmd_record(i, rec_argv, NULL); + free(rec_argv); + return ret; } int cmd_lock(int argc, const char **argv, const char *prefix __maybe_unused) @@ -934,7 +953,7 @@ int cmd_lock(int argc, const char **argv, const char *prefix __maybe_unused) }; const struct option report_options[] = { OPT_STRING('k', "key", &sort_key, "acquired", - "key for sorting (acquired / contended / wait_total / wait_max / wait_min)"), + "key for sorting (acquired / contended / avg_wait / wait_total / wait_max / wait_min)"), /* TODO: type */ OPT_END() }; @@ -972,7 +991,7 @@ int cmd_lock(int argc, const char **argv, const char *prefix __maybe_unused) if (argc) usage_with_options(report_usage, report_options); } - __cmd_report(); + rc = __cmd_report(false); } else if (!strcmp(argv[0], "script")) { /* Aliased to 'perf script' */ return cmd_script(argc, argv, prefix); @@ -985,11 +1004,7 @@ int cmd_lock(int argc, const char **argv, const char *prefix __maybe_unused) } /* recycling report_lock_ops */ trace_handler = &report_lock_ops; - setup_pager(); - if (read_events() != 0) - rc = -1; - else - rc = dump_info(); + rc = __cmd_report(true); } else { usage_with_options(lock_usage, lock_options); } diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index 253133a6251d..31c00f186da1 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -5,6 +5,7 @@ #include "util/trace-event.h" #include "util/tool.h" #include "util/session.h" +#include "util/data.h" #define MEM_OPERATION_LOAD "load" #define MEM_OPERATION_STORE "store" @@ -119,10 +120,14 @@ static int process_sample_event(struct perf_tool *tool, static int report_raw_events(struct perf_mem *mem) { + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; int err = -EINVAL; int ret; - struct perf_session *session = perf_session__new(input_name, O_RDONLY, - 0, false, &mem->tool); + struct perf_session *session = perf_session__new(&file, false, + &mem->tool); if (session == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c index e8a66f9a6715..89acc17cf2a0 100644 --- a/tools/perf/builtin-probe.c +++ b/tools/perf/builtin-probe.c @@ -173,7 +173,7 @@ static int opt_set_target(const struct option *opt, const char *str, if (str && !params.target) { if (!strcmp(opt->long_name, "exec")) params.uprobes = true; -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT else if (!strcmp(opt->long_name, "module")) params.uprobes = false; #endif @@ -187,7 +187,7 @@ static int opt_set_target(const struct option *opt, const char *str, return ret; } -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT static int opt_show_lines(const struct option *opt __maybe_unused, const char *str, int unset __maybe_unused) { @@ -257,7 +257,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused) "perf probe [] --add 'PROBEDEF' [--add 'PROBEDEF' ...]", "perf probe [] --del '[GROUP:]EVENT' ...", "perf probe --list", -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT "perf probe [] --line 'LINEDESC'", "perf probe [] --vars 'PROBEPOINT'", #endif @@ -271,7 +271,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused) OPT_CALLBACK('d', "del", NULL, "[GROUP:]EVENT", "delete a probe event.", opt_del_probe_event), OPT_CALLBACK('a', "add", NULL, -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT "[EVENT=]FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT" " [[NAME=]ARG ...]", #else @@ -283,7 +283,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused) "\t\tFUNC:\tFunction name\n" "\t\tOFF:\tOffset from function entry (in byte)\n" "\t\t%return:\tPut the probe at function return\n" -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT "\t\tSRC:\tSource code path\n" "\t\tRL:\tRelative line number from function entry.\n" "\t\tAL:\tAbsolute line number in file.\n" @@ -296,7 +296,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused) opt_add_probe_event), OPT_BOOLEAN('f', "force", ¶ms.force_add, "forcibly add events" " with existing name"), -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT OPT_CALLBACK('L', "line", NULL, "FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]", "Show source code lines.", opt_show_lines), @@ -408,7 +408,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused) return ret; } -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT if (params.show_lines && !params.uprobes) { if (params.mod_events) { pr_err(" Error: Don't use --line with" diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index a41ac41546c9..ab8d15e6e8cc 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -24,12 +24,13 @@ #include "util/symbol.h" #include "util/cpumap.h" #include "util/thread_map.h" +#include "util/data.h" #include #include #include -#ifndef HAVE_ON_EXIT +#ifndef HAVE_ON_EXIT_SUPPORT #ifndef ATEXIT_MAX #define ATEXIT_MAX 32 #endif @@ -65,12 +66,10 @@ struct perf_record { struct perf_tool tool; struct perf_record_opts opts; u64 bytes_written; - const char *output_name; + struct perf_data_file file; struct perf_evlist *evlist; struct perf_session *session; const char *progname; - int output; - unsigned int page_size; int realtime_prio; bool no_buildid; bool no_buildid_cache; @@ -85,11 +84,13 @@ static void advance_output(struct perf_record *rec, size_t size) static int write_output(struct perf_record *rec, void *buf, size_t size) { + struct perf_data_file *file = &rec->file; + while (size) { - int ret = write(rec->output, buf, size); + int ret = write(file->fd, buf, size); if (ret < 0) { - pr_err("failed to write\n"); + pr_err("failed to write perf data, error: %m\n"); return -1; } @@ -119,7 +120,7 @@ static int perf_record__mmap_read(struct perf_record *rec, { unsigned int head = perf_mmap__read_head(md); unsigned int old = md->prev; - unsigned char *data = md->base + rec->page_size; + unsigned char *data = md->base + page_size; unsigned long size; void *buf; int rc = 0; @@ -234,10 +235,6 @@ try_again: "or try again with a smaller value of -m/--mmap_pages.\n" "(current value: %d)\n", opts->mmap_pages); rc = -errno; - } else if (!is_power_of_2(opts->mmap_pages) && - (opts->mmap_pages != UINT_MAX)) { - pr_err("--mmap_pages/-m value must be a power of two."); - rc = -EINVAL; } else { pr_err("failed to mmap with %d (%s)\n", errno, strerror(errno)); rc = -errno; @@ -253,13 +250,14 @@ out: static int process_buildids(struct perf_record *rec) { - u64 size = lseek(rec->output, 0, SEEK_CUR); + struct perf_data_file *file = &rec->file; + struct perf_session *session = rec->session; + u64 size = lseek(file->fd, 0, SEEK_CUR); if (size == 0) return 0; - rec->session->fd = rec->output; - return __perf_session__process_events(rec->session, rec->post_processing_offset, + return __perf_session__process_events(session, rec->post_processing_offset, size - rec->post_processing_offset, size, &build_id__mark_dso_hit_ops); } @@ -267,17 +265,18 @@ static int process_buildids(struct perf_record *rec) static void perf_record__exit(int status, void *arg) { struct perf_record *rec = arg; + struct perf_data_file *file = &rec->file; if (status != 0) return; - if (!rec->opts.pipe_output) { + if (!file->is_pipe) { rec->session->header.data_size += rec->bytes_written; if (!rec->no_buildid) process_buildids(rec); perf_session__write_header(rec->session, rec->evlist, - rec->output, true); + file->fd, true); perf_session__delete(rec->session); perf_evlist__delete(rec->evlist); symbol__exit(); @@ -345,62 +344,26 @@ out: static int __cmd_record(struct perf_record *rec, int argc, const char **argv) { - struct stat st; - int flags; - int err, output, feat; + int err, feat; unsigned long waking = 0; const bool forks = argc > 0; struct machine *machine; struct perf_tool *tool = &rec->tool; struct perf_record_opts *opts = &rec->opts; struct perf_evlist *evsel_list = rec->evlist; - const char *output_name = rec->output_name; + struct perf_data_file *file = &rec->file; struct perf_session *session; bool disabled = false; rec->progname = argv[0]; - rec->page_size = sysconf(_SC_PAGE_SIZE); - on_exit(perf_record__sig_exit, rec); signal(SIGCHLD, sig_handler); signal(SIGINT, sig_handler); signal(SIGUSR1, sig_handler); signal(SIGTERM, sig_handler); - if (!output_name) { - if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode)) - opts->pipe_output = true; - else - rec->output_name = output_name = "perf.data"; - } - if (output_name) { - if (!strcmp(output_name, "-")) - opts->pipe_output = true; - else if (!stat(output_name, &st) && st.st_size) { - char oldname[PATH_MAX]; - snprintf(oldname, sizeof(oldname), "%s.old", - output_name); - unlink(oldname); - rename(output_name, oldname); - } - } - - flags = O_CREAT|O_RDWR|O_TRUNC; - - if (opts->pipe_output) - output = STDOUT_FILENO; - else - output = open(output_name, flags, S_IRUSR | S_IWUSR); - if (output < 0) { - perror("failed to create output file"); - return -1; - } - - rec->output = output; - - session = perf_session__new(output_name, O_WRONLY, - true, false, NULL); + session = perf_session__new(file, false, NULL); if (session == NULL) { pr_err("Not enough memory for reading perf file header\n"); return -1; @@ -422,7 +385,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv) if (forks) { err = perf_evlist__prepare_workload(evsel_list, &opts->target, - argv, opts->pipe_output, + argv, file->is_pipe, true); if (err < 0) { pr_err("Couldn't run the workload!\n"); @@ -443,13 +406,13 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv) */ on_exit(perf_record__exit, rec); - if (opts->pipe_output) { - err = perf_header__write_pipe(output); + if (file->is_pipe) { + err = perf_header__write_pipe(file->fd); if (err < 0) goto out_delete_session; } else { err = perf_session__write_header(session, evsel_list, - output, false); + file->fd, false); if (err < 0) goto out_delete_session; } @@ -462,11 +425,11 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv) goto out_delete_session; } - rec->post_processing_offset = lseek(output, 0, SEEK_CUR); + rec->post_processing_offset = lseek(file->fd, 0, SEEK_CUR); machine = &session->machines.host; - if (opts->pipe_output) { + if (file->is_pipe) { err = perf_event__synthesize_attrs(tool, session, process_synthesized_event); if (err < 0) { @@ -483,7 +446,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv) * return this more properly and also * propagate errors that now are calling die() */ - err = perf_event__synthesize_tracing_data(tool, output, evsel_list, + err = perf_event__synthesize_tracing_data(tool, file->fd, evsel_list, process_synthesized_event); if (err <= 0) { pr_err("Couldn't record tracing data.\n"); @@ -590,7 +553,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv) fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s (~%" PRIu64 " samples) ]\n", (double)rec->bytes_written / 1024.0 / 1024.0, - output_name, + file->path, rec->bytes_written / 24); return 0; @@ -618,6 +581,9 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), + BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), + BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), BRANCH_END }; @@ -684,7 +650,7 @@ error: return ret; } -#ifdef LIBUNWIND_SUPPORT +#ifdef HAVE_LIBUNWIND_SUPPORT static int get_stack_size(char *str, unsigned long *_size) { char *endptr; @@ -710,7 +676,7 @@ static int get_stack_size(char *str, unsigned long *_size) max_size, str); return -1; } -#endif /* LIBUNWIND_SUPPORT */ +#endif /* HAVE_LIBUNWIND_SUPPORT */ int record_parse_callchain_opt(const struct option *opt, const char *arg, int unset) @@ -748,7 +714,7 @@ int record_parse_callchain_opt(const struct option *opt, "needed for -g fp\n"); break; -#ifdef LIBUNWIND_SUPPORT +#ifdef HAVE_LIBUNWIND_SUPPORT /* Dwarf style */ } else if (!strncmp(name, "dwarf", sizeof("dwarf"))) { const unsigned long default_stack_dump_size = 8192; @@ -768,7 +734,7 @@ int record_parse_callchain_opt(const struct option *opt, if (!ret) pr_debug("callchain: stack dump size %d\n", opts->stack_dump_size); -#endif /* LIBUNWIND_SUPPORT */ +#endif /* HAVE_LIBUNWIND_SUPPORT */ } else { pr_err("callchain: Unknown -g option " "value: %s\n", arg); @@ -815,7 +781,7 @@ static struct perf_record record = { #define CALLCHAIN_HELP "do call-graph (stack chain/backtrace) recording: " -#ifdef LIBUNWIND_SUPPORT +#ifdef HAVE_LIBUNWIND_SUPPORT const char record_callchain_help[] = CALLCHAIN_HELP "[fp] dwarf"; #else const char record_callchain_help[] = CALLCHAIN_HELP "[fp]"; @@ -849,13 +815,14 @@ const struct option record_options[] = { OPT_STRING('C', "cpu", &record.opts.target.cpu_list, "cpu", "list of cpus to monitor"), OPT_U64('c', "count", &record.opts.user_interval, "event period to sample"), - OPT_STRING('o', "output", &record.output_name, "file", + OPT_STRING('o', "output", &record.file.path, "file", "output file name"), OPT_BOOLEAN('i', "no-inherit", &record.opts.no_inherit, "child tasks do not inherit counters"), OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"), - OPT_UINTEGER('m', "mmap-pages", &record.opts.mmap_pages, - "number of mmap data pages"), + OPT_CALLBACK('m', "mmap-pages", &record.opts.mmap_pages, "pages", + "number of mmap data pages", + perf_evlist__parse_mmap_pages), OPT_BOOLEAN(0, "group", &record.opts.group, "put the counters into a counter group"), OPT_CALLBACK_DEFAULT('g', "call-graph", &record.opts, @@ -891,6 +858,8 @@ const struct option record_options[] = { parse_branch_stack), OPT_BOOLEAN('W', "weight", &record.opts.sample_weight, "sample by weight (on special events only)"), + OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction, + "sample transaction flags (special events only)"), OPT_END() }; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 72eae7498c09..81addcabb356 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -33,8 +33,10 @@ #include "util/thread.h" #include "util/sort.h" #include "util/hist.h" +#include "util/data.h" #include "arch/common.h" +#include #include struct perf_report { @@ -47,6 +49,7 @@ struct perf_report { bool show_threads; bool inverted_callchain; bool mem_mode; + int max_stack; struct perf_read_values show_threads_values; const char *pretty_printing_style; const char *cpu_list; @@ -88,7 +91,8 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool, if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) { err = machine__resolve_callchain(machine, evsel, al->thread, - sample, &parent, al); + sample, &parent, al, + rep->max_stack); if (err) return err; } @@ -179,7 +183,8 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool, if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) { err = machine__resolve_callchain(machine, evsel, al->thread, - sample, &parent, al); + sample, &parent, al, + rep->max_stack); if (err) return err; } @@ -242,24 +247,27 @@ out: return err; } -static int perf_evsel__add_hist_entry(struct perf_evsel *evsel, +static int perf_evsel__add_hist_entry(struct perf_tool *tool, + struct perf_evsel *evsel, struct addr_location *al, struct perf_sample *sample, struct machine *machine) { + struct perf_report *rep = container_of(tool, struct perf_report, tool); struct symbol *parent = NULL; int err = 0; struct hist_entry *he; if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) { err = machine__resolve_callchain(machine, evsel, al->thread, - sample, &parent, al); + sample, &parent, al, + rep->max_stack); if (err) return err; } he = __hists__add_entry(&evsel->hists, al, parent, sample->period, - sample->weight); + sample->weight, sample->transaction); if (he == NULL) return -ENOMEM; @@ -330,7 +338,8 @@ static int process_sample_event(struct perf_tool *tool, if (al.map != NULL) al.map->dso->hit = 1; - ret = perf_evsel__add_hist_entry(evsel, &al, sample, machine); + ret = perf_evsel__add_hist_entry(tool, evsel, &al, sample, + machine); if (ret < 0) pr_debug("problem incrementing symbol period, skipping event\n"); } @@ -366,8 +375,9 @@ static int perf_report__setup_sample_type(struct perf_report *rep) { struct perf_session *self = rep->session; u64 sample_type = perf_evlist__combined_sample_type(self->evlist); + bool is_pipe = perf_data_file__is_pipe(self->file); - if (!self->fd_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) { + if (!is_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) { if (sort__has_parent) { ui__error("Selected --sort parent, but no " "callchain data. Did you call " @@ -390,7 +400,7 @@ static int perf_report__setup_sample_type(struct perf_report *rep) } if (sort__mode == SORT_MODE__BRANCH) { - if (!self->fd_pipe && + if (!is_pipe && !(sample_type & PERF_SAMPLE_BRANCH_STACK)) { ui__error("Selected -b but no branch data. " "Did you call perf record without -b?\n"); @@ -486,6 +496,7 @@ static int __cmd_report(struct perf_report *rep) struct map *kernel_map; struct kmap *kernel_kmap; const char *help = "For a higher level overview, try: perf report --sort comm,dso"; + struct perf_data_file *file = session->file; signal(SIGINT, sig_handler); @@ -570,7 +581,7 @@ static int __cmd_report(struct perf_report *rep) return 0; if (nr_samples == 0) { - ui__error("The %s file has no samples!\n", session->filename); + ui__error("The %s file has no samples!\n", file->path); return 0; } @@ -591,8 +602,19 @@ static int __cmd_report(struct perf_report *rep) ret = 0; } else if (use_browser == 2) { - perf_evlist__gtk_browse_hists(session->evlist, help, - NULL, rep->min_percent); + int (*hist_browser)(struct perf_evlist *, + const char *, + struct hist_browser_timer *, + float min_pcnt); + + hist_browser = dlsym(perf_gtk_handle, + "perf_evlist__gtk_browse_hists"); + if (hist_browser == NULL) { + ui__error("GTK browser not found!\n"); + return ret; + } + hist_browser(session->evlist, help, NULL, + rep->min_percent); } } else perf_evlist__tty_browse_hists(session->evlist, rep, help); @@ -757,6 +779,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) .ordered_samples = true, .ordering_requires_timestamps = true, }, + .max_stack = PERF_MAX_STACK_DEPTH, .pretty_printing_style = "normal", }; const struct option options[] = { @@ -787,7 +810,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) "sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline," " dso_to, dso_from, symbol_to, symbol_from, mispredict," " weight, local_weight, mem, symbol_daddr, dso_daddr, tlb, " - "snoop, locked"), + "snoop, locked, abort, in_tx, transaction"), OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, "Show sample percentage for different cpu modes"), OPT_STRING('p', "parent", &parent_pattern, "regex", @@ -797,6 +820,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order", "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). " "Default: fractal,0.5,callee,function", &parse_callchain_opt, callchain_default_opt), + OPT_INTEGER(0, "max-stack", &report.max_stack, + "Set the maximum stack depth when parsing the callchain, " + "anything beyond the specified depth will be ignored. " + "Default: " __stringify(PERF_MAX_STACK_DEPTH)), OPT_BOOLEAN('G', "inverted", &report.inverted_callchain, "alias for inverted call graph"), OPT_CALLBACK(0, "ignore-callees", NULL, "regex", @@ -845,6 +872,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) "Don't show entries under that percent", parse_percent_limit), OPT_END() }; + struct perf_data_file file = { + .mode = PERF_DATA_MODE_READ, + }; perf_config(perf_report_config, &report); @@ -874,9 +904,11 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) perf_hpp__init(); } + file.path = input_name; + file.force = report.force; + repeat: - session = perf_session__new(input_name, O_RDONLY, - report.force, false, &report.tool); + session = perf_session__new(&file, false, &report.tool); if (session == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index d8c51b2f263f..5a46b102eb08 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -1446,8 +1446,12 @@ static int perf_sched__read_events(struct perf_sched *sched, { "sched:sched_migrate_task", process_sched_migrate_task_event, }, }; struct perf_session *session; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; - session = perf_session__new(input_name, O_RDONLY, 0, false, &sched->tool); + session = perf_session__new(&file, false, &sched->tool); if (session == NULL) { pr_debug("No Memory for session\n"); return -1; diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 9c333ff3dfeb..27de6068049d 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -15,6 +15,7 @@ #include "util/evlist.h" #include "util/evsel.h" #include "util/sort.h" +#include "util/data.h" #include static char const *script_name; @@ -409,7 +410,9 @@ static void print_sample_bts(union perf_event *event, printf(" => "); /* print branch_to information */ - if (PRINT_FIELD(ADDR)) + if (PRINT_FIELD(ADDR) || + ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) && + !output[attr->type].user_set)) print_sample_addr(event, sample, machine, thread, attr); printf("\n"); @@ -1113,10 +1116,14 @@ int find_scripts(char **scripts_array, char **scripts_path_array) char scripts_path[MAXPATHLEN], lang_path[MAXPATHLEN]; DIR *scripts_dir, *lang_dir; struct perf_session *session; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; char *temp; int i = 0; - session = perf_session__new(input_name, O_RDONLY, 0, false, NULL); + session = perf_session__new(&file, false, NULL); if (!session) return -1; @@ -1317,12 +1324,17 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) "perf script [] [script-args]", NULL }; + struct perf_data_file file = { + .mode = PERF_DATA_MODE_READ, + }; setup_scripting(); argc = parse_options(argc, argv, options, script_usage, PARSE_OPT_STOP_AT_NON_OPTION); + file.path = input_name; + if (argc > 1 && !strncmp(argv[0], "rec", strlen("rec"))) { rec_script_path = get_script_path(argv[1], RECORD_SUFFIX); if (!rec_script_path) @@ -1486,8 +1498,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) if (!script_name) setup_pager(); - session = perf_session__new(input_name, O_RDONLY, 0, false, - &perf_script); + session = perf_session__new(&file, false, &perf_script); if (session == NULL) return -ENOMEM; @@ -1514,7 +1525,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) return -1; } - input = open(session->filename, O_RDONLY); /* input_name */ + input = open(file.path, O_RDONLY); /* input_name */ if (input < 0) { perror("failed to open file"); return -1; diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index f686d5ff594e..1a9c95d270aa 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -46,6 +46,7 @@ #include "util/util.h" #include "util/parse-options.h" #include "util/parse-events.h" +#include "util/pmu.h" #include "util/event.h" #include "util/evlist.h" #include "util/evsel.h" @@ -70,6 +71,41 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix); static void print_counter(struct perf_evsel *counter, char *prefix); static void print_aggr(char *prefix); +/* Default events used for perf stat -T */ +static const char * const transaction_attrs[] = { + "task-clock", + "{" + "instructions," + "cycles," + "cpu/cycles-t/," + "cpu/tx-start/," + "cpu/el-start/," + "cpu/cycles-ct/" + "}" +}; + +/* More limited version when the CPU does not have all events. */ +static const char * const transaction_limited_attrs[] = { + "task-clock", + "{" + "instructions," + "cycles," + "cpu/cycles-t/," + "cpu/tx-start/" + "}" +}; + +/* must match transaction_attrs and the beginning limited_attrs */ +enum { + T_TASK_CLOCK, + T_INSTRUCTIONS, + T_CYCLES, + T_CYCLES_IN_TX, + T_TRANSACTION_START, + T_ELISION_START, + T_CYCLES_IN_TX_CP, +}; + static struct perf_evlist *evsel_list; static struct perf_target target = { @@ -90,6 +126,7 @@ static enum aggr_mode aggr_mode = AGGR_GLOBAL; static volatile pid_t child_pid = -1; static bool null_run = false; static int detailed_run = 0; +static bool transaction_run; static bool big_num = true; static int big_num_opt = -1; static const char *csv_sep = NULL; @@ -214,7 +251,10 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS]; static struct stats runtime_ll_cache_stats[MAX_NR_CPUS]; static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS]; static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS]; +static struct stats runtime_cycles_in_tx_stats[MAX_NR_CPUS]; static struct stats walltime_nsecs_stats; +static struct stats runtime_transaction_stats[MAX_NR_CPUS]; +static struct stats runtime_elision_stats[MAX_NR_CPUS]; static void perf_stat__reset_stats(struct perf_evlist *evlist) { @@ -236,6 +276,11 @@ static void perf_stat__reset_stats(struct perf_evlist *evlist) memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats)); memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats)); memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats)); + memset(runtime_cycles_in_tx_stats, 0, + sizeof(runtime_cycles_in_tx_stats)); + memset(runtime_transaction_stats, 0, + sizeof(runtime_transaction_stats)); + memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats)); memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats)); } @@ -274,6 +319,29 @@ static inline int nsec_counter(struct perf_evsel *evsel) return 0; } +static struct perf_evsel *nth_evsel(int n) +{ + static struct perf_evsel **array; + static int array_len; + struct perf_evsel *ev; + int j; + + /* Assumes this only called when evsel_list does not change anymore. */ + if (!array) { + list_for_each_entry(ev, &evsel_list->entries, node) + array_len++; + array = malloc(array_len * sizeof(void *)); + if (!array) + exit(ENOMEM); + j = 0; + list_for_each_entry(ev, &evsel_list->entries, node) + array[j++] = ev; + } + if (n < array_len) + return array[n]; + return NULL; +} + /* * Update various tracking values we maintain to print * more semantic information such as miss/hit ratios, @@ -285,6 +353,15 @@ static void update_shadow_stats(struct perf_evsel *counter, u64 *count) update_stats(&runtime_nsecs_stats[0], count[0]); else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) update_stats(&runtime_cycles_stats[0], count[0]); + else if (transaction_run && + perf_evsel__cmp(counter, nth_evsel(T_CYCLES_IN_TX))) + update_stats(&runtime_cycles_in_tx_stats[0], count[0]); + else if (transaction_run && + perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START))) + update_stats(&runtime_transaction_stats[0], count[0]); + else if (transaction_run && + perf_evsel__cmp(counter, nth_evsel(T_ELISION_START))) + update_stats(&runtime_elision_stats[0], count[0]); else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) update_stats(&runtime_stalled_cycles_front_stats[0], count[0]); else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND)) @@ -457,6 +534,7 @@ static int __run_perf_stat(int argc, const char **argv) perror("failed to prepare workload"); return -1; } + child_pid = evsel_list->workload.pid; } if (group) @@ -628,10 +706,13 @@ static void nsec_printout(int cpu, int nr, struct perf_evsel *evsel, double avg) { double msecs = avg / 1e6; const char *fmt = csv_output ? "%.6f%s%s" : "%18.6f%s%-25s"; + char name[25]; aggr_printout(evsel, cpu, nr); - fprintf(output, fmt, msecs, csv_sep, perf_evsel__name(evsel)); + scnprintf(name, sizeof(name), "%s%s", + perf_evsel__name(evsel), csv_output ? "" : " (msec)"); + fprintf(output, fmt, msecs, csv_sep, name); if (evsel->cgrp) fprintf(output, "%s%s", csv_sep, evsel->cgrp->name); @@ -827,7 +908,7 @@ static void print_ll_cache_misses(int cpu, static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg) { - double total, ratio = 0.0; + double total, ratio = 0.0, total2; const char *fmt; if (csv_output) @@ -852,11 +933,10 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg) if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) { total = avg_stats(&runtime_cycles_stats[cpu]); - if (total) + if (total) { ratio = avg / total; - - fprintf(output, " # %5.2f insns per cycle ", ratio); - + fprintf(output, " # %5.2f insns per cycle ", ratio); + } total = avg_stats(&runtime_stalled_cycles_front_stats[cpu]); total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[cpu])); @@ -919,10 +999,47 @@ static void abs_printout(int cpu, int nr, struct perf_evsel *evsel, double avg) } else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) { total = avg_stats(&runtime_nsecs_stats[cpu]); + if (total) { + ratio = avg / total; + fprintf(output, " # %8.3f GHz ", ratio); + } + } else if (transaction_run && + perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX))) { + total = avg_stats(&runtime_cycles_stats[cpu]); + if (total) + fprintf(output, + " # %5.2f%% transactional cycles ", + 100.0 * (avg / total)); + } else if (transaction_run && + perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_IN_TX_CP))) { + total = avg_stats(&runtime_cycles_stats[cpu]); + total2 = avg_stats(&runtime_cycles_in_tx_stats[cpu]); + if (total2 < avg) + total2 = avg; + if (total) + fprintf(output, + " # %5.2f%% aborted cycles ", + 100.0 * ((total2-avg) / total)); + } else if (transaction_run && + perf_evsel__cmp(evsel, nth_evsel(T_TRANSACTION_START)) && + avg > 0 && + runtime_cycles_in_tx_stats[cpu].n != 0) { + total = avg_stats(&runtime_cycles_in_tx_stats[cpu]); + + if (total) + ratio = total / avg; + + fprintf(output, " # %8.0f cycles / transaction ", ratio); + } else if (transaction_run && + perf_evsel__cmp(evsel, nth_evsel(T_ELISION_START)) && + avg > 0 && + runtime_cycles_in_tx_stats[cpu].n != 0) { + total = avg_stats(&runtime_cycles_in_tx_stats[cpu]); + if (total) - ratio = 1.0 * avg / total; + ratio = total / avg; - fprintf(output, " # %8.3f GHz ", ratio); + fprintf(output, " # %8.0f cycles / elision ", ratio); } else if (runtime_nsecs_stats[cpu].n != 0) { char unit = 'M'; @@ -1115,7 +1232,11 @@ static void print_stat(int argc, const char **argv) if (!csv_output) { fprintf(output, "\n"); fprintf(output, " Performance counter stats for "); - if (!perf_target__has_task(&target)) { + if (target.system_wide) + fprintf(output, "\'system wide"); + else if (target.cpu_list) + fprintf(output, "\'CPU(s) %s", target.cpu_list); + else if (!perf_target__has_task(&target)) { fprintf(output, "\'%s", argv[0]); for (i = 1; i < argc; i++) fprintf(output, " %s", argv[i]); @@ -1236,6 +1357,16 @@ static int perf_stat_init_aggr_mode(void) return 0; } +static int setup_events(const char * const *attrs, unsigned len) +{ + unsigned i; + + for (i = 0; i < len; i++) { + if (parse_events(evsel_list, attrs[i])) + return -1; + } + return 0; +} /* * Add default attributes, if there were no attributes specified or @@ -1354,6 +1485,22 @@ static int add_default_attributes(void) if (null_run) return 0; + if (transaction_run) { + int err; + if (pmu_have_event("cpu", "cycles-ct") && + pmu_have_event("cpu", "el-start")) + err = setup_events(transaction_attrs, + ARRAY_SIZE(transaction_attrs)); + else + err = setup_events(transaction_limited_attrs, + ARRAY_SIZE(transaction_limited_attrs)); + if (err < 0) { + fprintf(stderr, "Cannot set up transaction events\n"); + return -1; + } + return 0; + } + if (!evsel_list->nr_entries) { if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0) return -1; @@ -1388,6 +1535,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused) int output_fd = 0; const char *output_name = NULL; const struct option options[] = { + OPT_BOOLEAN('T', "transaction", &transaction_run, + "hardware transaction statistics"), OPT_CALLBACK('e', "event", &evsel_list, "event", "event selector. use 'perf list' to list available events", parse_events_option), @@ -1513,8 +1662,9 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused) } else if (big_num_opt == 0) /* User passed --no-big-num */ big_num = false; - if (!argc && !perf_target__has_task(&target)) + if (!argc && perf_target__none(&target)) usage_with_options(stat_usage, options); + if (run_count < 0) { usage_with_options(stat_usage, options); } else if (run_count == 0) { diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c index c2e02319347a..e11c61d9bda4 100644 --- a/tools/perf/builtin-timechart.c +++ b/tools/perf/builtin-timechart.c @@ -36,6 +36,7 @@ #include "util/session.h" #include "util/svghelper.h" #include "util/tool.h" +#include "util/data.h" #define SUPPORT_OLD_POWER_EVENTS 1 #define PWR_EVENT_EXIT -1 @@ -990,8 +991,13 @@ static int __cmd_timechart(const char *output_name) { "power:power_frequency", process_sample_power_frequency }, #endif }; - struct perf_session *session = perf_session__new(input_name, O_RDONLY, - 0, false, &perf_timechart); + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, + }; + + struct perf_session *session = perf_session__new(&file, false, + &perf_timechart); int ret = -EINVAL; if (session == NULL) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 212214162bb2..386d83324a8d 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -247,9 +247,8 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel, pthread_mutex_lock(&evsel->hists.lock); he = __hists__add_entry(&evsel->hists, al, NULL, sample->period, - sample->weight); + sample->weight, sample->transaction); pthread_mutex_unlock(&evsel->hists.lock); - if (he == NULL) return NULL; @@ -771,7 +770,8 @@ static void perf_event__process_sample(struct perf_tool *tool, sample->callchain) { err = machine__resolve_callchain(machine, evsel, al.thread, sample, - &parent, &al); + &parent, &al, + top->max_stack); if (err) return; } @@ -930,11 +930,8 @@ static int __cmd_top(struct perf_top *top) struct perf_record_opts *opts = &top->record_opts; pthread_t thread; int ret; - /* - * FIXME: perf_session__new should allow passing a O_MMAP, so that all this - * mmap reading, etc is encapsulated in it. Use O_WRONLY for now. - */ - top->session = perf_session__new(NULL, O_WRONLY, false, false, NULL); + + top->session = perf_session__new(NULL, false, NULL); if (top->session == NULL) return -ENOMEM; @@ -1051,10 +1048,11 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) .user_freq = UINT_MAX, .user_interval = ULLONG_MAX, .freq = 4000, /* 4 KHz */ - .target = { + .target = { .uses_mmap = true, }, }, + .max_stack = PERF_MAX_STACK_DEPTH, .sym_pcnt_filter = 5, }; struct perf_record_opts *opts = &top.record_opts; @@ -1074,10 +1072,13 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) "list of cpus to monitor"), OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name, "file", "vmlinux pathname"), + OPT_BOOLEAN(0, "ignore-vmlinux", &symbol_conf.ignore_vmlinux, + "don't load vmlinux even if found"), OPT_BOOLEAN('K', "hide_kernel_symbols", &top.hide_kernel_symbols, "hide kernel symbols"), - OPT_UINTEGER('m', "mmap-pages", &opts->mmap_pages, - "number of mmap data pages"), + OPT_CALLBACK('m', "mmap-pages", &opts->mmap_pages, "pages", + "number of mmap data pages", + perf_evlist__parse_mmap_pages), OPT_INTEGER('r', "realtime", &top.realtime_prio, "collect data with this RT SCHED_FIFO priority"), OPT_INTEGER('d', "delay", &top.delay_secs, @@ -1103,12 +1104,16 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused) OPT_INCR('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", - "sort by key(s): pid, comm, dso, symbol, parent, weight, local_weight"), + "sort by key(s): pid, comm, dso, symbol, parent, weight, local_weight," + " abort, in_tx, transaction"), OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples, "Show a column with the number of samples"), OPT_CALLBACK_DEFAULT('G', "call-graph", &top.record_opts, "mode[,dump_size]", record_callchain_help, &parse_callchain_opt, "fp"), + OPT_INTEGER(0, "max-stack", &top.max_stack, + "Set the maximum stack depth when parsing the callchain. " + "Default: " __stringify(PERF_MAX_STACK_DEPTH)), OPT_CALLBACK(0, "ignore-callees", NULL, "regex", "ignore callees of these functions in call graphs", report_parse_ignore_callees_opt), diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index fd4853404727..fa620bc1db69 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -10,9 +10,11 @@ #include "util/strlist.h" #include "util/intlist.h" #include "util/thread_map.h" +#include "util/stat.h" #include #include +#include #include #include @@ -33,49 +35,96 @@ # define MADV_UNMERGEABLE 13 #endif -static size_t syscall_arg__scnprintf_hex(char *bf, size_t size, - unsigned long arg, - u8 arg_idx __maybe_unused, - u8 *arg_mask __maybe_unused) +struct syscall_arg { + unsigned long val; + struct thread *thread; + struct trace *trace; + void *parm; + u8 idx; + u8 mask; +}; + +struct strarray { + int offset; + int nr_entries; + const char **entries; +}; + +#define DEFINE_STRARRAY(array) struct strarray strarray__##array = { \ + .nr_entries = ARRAY_SIZE(array), \ + .entries = array, \ +} + +#define DEFINE_STRARRAY_OFFSET(array, off) struct strarray strarray__##array = { \ + .offset = off, \ + .nr_entries = ARRAY_SIZE(array), \ + .entries = array, \ +} + +static size_t __syscall_arg__scnprintf_strarray(char *bf, size_t size, + const char *intfmt, + struct syscall_arg *arg) { - return scnprintf(bf, size, "%#lx", arg); + struct strarray *sa = arg->parm; + int idx = arg->val - sa->offset; + + if (idx < 0 || idx >= sa->nr_entries) + return scnprintf(bf, size, intfmt, arg->val); + + return scnprintf(bf, size, "%s", sa->entries[idx]); } -#define SCA_HEX syscall_arg__scnprintf_hex +static size_t syscall_arg__scnprintf_strarray(char *bf, size_t size, + struct syscall_arg *arg) +{ + return __syscall_arg__scnprintf_strarray(bf, size, "%d", arg); +} -static size_t syscall_arg__scnprintf_whence(char *bf, size_t size, - unsigned long arg, - u8 arg_idx __maybe_unused, - u8 *arg_mask __maybe_unused) +#define SCA_STRARRAY syscall_arg__scnprintf_strarray + +static size_t syscall_arg__scnprintf_strhexarray(char *bf, size_t size, + struct syscall_arg *arg) { - int whence = arg; + return __syscall_arg__scnprintf_strarray(bf, size, "%#x", arg); +} - switch (whence) { -#define P_WHENCE(n) case SEEK_##n: return scnprintf(bf, size, #n) - P_WHENCE(SET); - P_WHENCE(CUR); - P_WHENCE(END); -#ifdef SEEK_DATA - P_WHENCE(DATA); -#endif -#ifdef SEEK_HOLE - P_WHENCE(HOLE); -#endif -#undef P_WHENCE - default: break; - } +#define SCA_STRHEXARRAY syscall_arg__scnprintf_strhexarray + +static size_t syscall_arg__scnprintf_fd(char *bf, size_t size, + struct syscall_arg *arg); + +#define SCA_FD syscall_arg__scnprintf_fd + +static size_t syscall_arg__scnprintf_fd_at(char *bf, size_t size, + struct syscall_arg *arg) +{ + int fd = arg->val; + + if (fd == AT_FDCWD) + return scnprintf(bf, size, "CWD"); + + return syscall_arg__scnprintf_fd(bf, size, arg); +} - return scnprintf(bf, size, "%#x", whence); +#define SCA_FDAT syscall_arg__scnprintf_fd_at + +static size_t syscall_arg__scnprintf_close_fd(char *bf, size_t size, + struct syscall_arg *arg); + +#define SCA_CLOSE_FD syscall_arg__scnprintf_close_fd + +static size_t syscall_arg__scnprintf_hex(char *bf, size_t size, + struct syscall_arg *arg) +{ + return scnprintf(bf, size, "%#lx", arg->val); } -#define SCA_WHENCE syscall_arg__scnprintf_whence +#define SCA_HEX syscall_arg__scnprintf_hex static size_t syscall_arg__scnprintf_mmap_prot(char *bf, size_t size, - unsigned long arg, - u8 arg_idx __maybe_unused, - u8 *arg_mask __maybe_unused) + struct syscall_arg *arg) { - int printed = 0, prot = arg; + int printed = 0, prot = arg->val; if (prot == PROT_NONE) return scnprintf(bf, size, "NONE"); @@ -104,10 +153,9 @@ static size_t syscall_arg__scnprintf_mmap_prot(char *bf, size_t size, #define SCA_MMAP_PROT syscall_arg__scnprintf_mmap_prot static size_t syscall_arg__scnprintf_mmap_flags(char *bf, size_t size, - unsigned long arg, u8 arg_idx __maybe_unused, - u8 *arg_mask __maybe_unused) + struct syscall_arg *arg) { - int printed = 0, flags = arg; + int printed = 0, flags = arg->val; #define P_MMAP_FLAG(n) \ if (flags & MAP_##n) { \ @@ -148,10 +196,9 @@ static size_t syscall_arg__scnprintf_mmap_flags(char *bf, size_t size, #define SCA_MMAP_FLAGS syscall_arg__scnprintf_mmap_flags static size_t syscall_arg__scnprintf_madvise_behavior(char *bf, size_t size, - unsigned long arg, u8 arg_idx __maybe_unused, - u8 *arg_mask __maybe_unused) + struct syscall_arg *arg) { - int behavior = arg; + int behavior = arg->val; switch (behavior) { #define P_MADV_BHV(n) case MADV_##n: return scnprintf(bf, size, #n) @@ -190,8 +237,38 @@ static size_t syscall_arg__scnprintf_madvise_behavior(char *bf, size_t size, #define SCA_MADV_BHV syscall_arg__scnprintf_madvise_behavior -static size_t syscall_arg__scnprintf_futex_op(char *bf, size_t size, unsigned long arg, - u8 arg_idx __maybe_unused, u8 *arg_mask) +static size_t syscall_arg__scnprintf_flock(char *bf, size_t size, + struct syscall_arg *arg) +{ + int printed = 0, op = arg->val; + + if (op == 0) + return scnprintf(bf, size, "NONE"); +#define P_CMD(cmd) \ + if ((op & LOCK_##cmd) == LOCK_##cmd) { \ + printed += scnprintf(bf + printed, size - printed, "%s%s", printed ? "|" : "", #cmd); \ + op &= ~LOCK_##cmd; \ + } + + P_CMD(SH); + P_CMD(EX); + P_CMD(NB); + P_CMD(UN); + P_CMD(MAND); + P_CMD(RW); + P_CMD(READ); + P_CMD(WRITE); +#undef P_OP + + if (op) + printed += scnprintf(bf + printed, size - printed, "%s%#x", printed ? "|" : "", op); + + return printed; +} + +#define SCA_FLOCK syscall_arg__scnprintf_flock + +static size_t syscall_arg__scnprintf_futex_op(char *bf, size_t size, struct syscall_arg *arg) { enum syscall_futex_args { SCF_UADDR = (1 << 0), @@ -201,24 +278,24 @@ static size_t syscall_arg__scnprintf_futex_op(char *bf, size_t size, unsigned lo SCF_UADDR2 = (1 << 4), SCF_VAL3 = (1 << 5), }; - int op = arg; + int op = arg->val; int cmd = op & FUTEX_CMD_MASK; size_t printed = 0; switch (cmd) { #define P_FUTEX_OP(n) case FUTEX_##n: printed = scnprintf(bf, size, #n); - P_FUTEX_OP(WAIT); *arg_mask |= SCF_VAL3|SCF_UADDR2; break; - P_FUTEX_OP(WAKE); *arg_mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; - P_FUTEX_OP(FD); *arg_mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; - P_FUTEX_OP(REQUEUE); *arg_mask |= SCF_VAL3|SCF_TIMEOUT; break; - P_FUTEX_OP(CMP_REQUEUE); *arg_mask |= SCF_TIMEOUT; break; - P_FUTEX_OP(CMP_REQUEUE_PI); *arg_mask |= SCF_TIMEOUT; break; + P_FUTEX_OP(WAIT); arg->mask |= SCF_VAL3|SCF_UADDR2; break; + P_FUTEX_OP(WAKE); arg->mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; + P_FUTEX_OP(FD); arg->mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; + P_FUTEX_OP(REQUEUE); arg->mask |= SCF_VAL3|SCF_TIMEOUT; break; + P_FUTEX_OP(CMP_REQUEUE); arg->mask |= SCF_TIMEOUT; break; + P_FUTEX_OP(CMP_REQUEUE_PI); arg->mask |= SCF_TIMEOUT; break; P_FUTEX_OP(WAKE_OP); break; - P_FUTEX_OP(LOCK_PI); *arg_mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; - P_FUTEX_OP(UNLOCK_PI); *arg_mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; - P_FUTEX_OP(TRYLOCK_PI); *arg_mask |= SCF_VAL3|SCF_UADDR2; break; - P_FUTEX_OP(WAIT_BITSET); *arg_mask |= SCF_UADDR2; break; - P_FUTEX_OP(WAKE_BITSET); *arg_mask |= SCF_UADDR2; break; + P_FUTEX_OP(LOCK_PI); arg->mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; + P_FUTEX_OP(UNLOCK_PI); arg->mask |= SCF_VAL3|SCF_UADDR2|SCF_TIMEOUT; break; + P_FUTEX_OP(TRYLOCK_PI); arg->mask |= SCF_VAL3|SCF_UADDR2; break; + P_FUTEX_OP(WAIT_BITSET); arg->mask |= SCF_UADDR2; break; + P_FUTEX_OP(WAKE_BITSET); arg->mask |= SCF_UADDR2; break; P_FUTEX_OP(WAIT_REQUEUE_PI); break; default: printed = scnprintf(bf, size, "%#x", cmd); break; } @@ -234,14 +311,194 @@ static size_t syscall_arg__scnprintf_futex_op(char *bf, size_t size, unsigned lo #define SCA_FUTEX_OP syscall_arg__scnprintf_futex_op +static const char *epoll_ctl_ops[] = { "ADD", "DEL", "MOD", }; +static DEFINE_STRARRAY_OFFSET(epoll_ctl_ops, 1); + +static const char *itimers[] = { "REAL", "VIRTUAL", "PROF", }; +static DEFINE_STRARRAY(itimers); + +static const char *whences[] = { "SET", "CUR", "END", +#ifdef SEEK_DATA +"DATA", +#endif +#ifdef SEEK_HOLE +"HOLE", +#endif +}; +static DEFINE_STRARRAY(whences); + +static const char *fcntl_cmds[] = { + "DUPFD", "GETFD", "SETFD", "GETFL", "SETFL", "GETLK", "SETLK", + "SETLKW", "SETOWN", "GETOWN", "SETSIG", "GETSIG", "F_GETLK64", + "F_SETLK64", "F_SETLKW64", "F_SETOWN_EX", "F_GETOWN_EX", + "F_GETOWNER_UIDS", +}; +static DEFINE_STRARRAY(fcntl_cmds); + +static const char *rlimit_resources[] = { + "CPU", "FSIZE", "DATA", "STACK", "CORE", "RSS", "NPROC", "NOFILE", + "MEMLOCK", "AS", "LOCKS", "SIGPENDING", "MSGQUEUE", "NICE", "RTPRIO", + "RTTIME", +}; +static DEFINE_STRARRAY(rlimit_resources); + +static const char *sighow[] = { "BLOCK", "UNBLOCK", "SETMASK", }; +static DEFINE_STRARRAY(sighow); + +static const char *clockid[] = { + "REALTIME", "MONOTONIC", "PROCESS_CPUTIME_ID", "THREAD_CPUTIME_ID", + "MONOTONIC_RAW", "REALTIME_COARSE", "MONOTONIC_COARSE", +}; +static DEFINE_STRARRAY(clockid); + +static const char *socket_families[] = { + "UNSPEC", "LOCAL", "INET", "AX25", "IPX", "APPLETALK", "NETROM", + "BRIDGE", "ATMPVC", "X25", "INET6", "ROSE", "DECnet", "NETBEUI", + "SECURITY", "KEY", "NETLINK", "PACKET", "ASH", "ECONET", "ATMSVC", + "RDS", "SNA", "IRDA", "PPPOX", "WANPIPE", "LLC", "IB", "CAN", "TIPC", + "BLUETOOTH", "IUCV", "RXRPC", "ISDN", "PHONET", "IEEE802154", "CAIF", + "ALG", "NFC", "VSOCK", +}; +static DEFINE_STRARRAY(socket_families); + +#ifndef SOCK_TYPE_MASK +#define SOCK_TYPE_MASK 0xf +#endif + +static size_t syscall_arg__scnprintf_socket_type(char *bf, size_t size, + struct syscall_arg *arg) +{ + size_t printed; + int type = arg->val, + flags = type & ~SOCK_TYPE_MASK; + + type &= SOCK_TYPE_MASK; + /* + * Can't use a strarray, MIPS may override for ABI reasons. + */ + switch (type) { +#define P_SK_TYPE(n) case SOCK_##n: printed = scnprintf(bf, size, #n); break; + P_SK_TYPE(STREAM); + P_SK_TYPE(DGRAM); + P_SK_TYPE(RAW); + P_SK_TYPE(RDM); + P_SK_TYPE(SEQPACKET); + P_SK_TYPE(DCCP); + P_SK_TYPE(PACKET); +#undef P_SK_TYPE + default: + printed = scnprintf(bf, size, "%#x", type); + } + +#define P_SK_FLAG(n) \ + if (flags & SOCK_##n) { \ + printed += scnprintf(bf + printed, size - printed, "|%s", #n); \ + flags &= ~SOCK_##n; \ + } + + P_SK_FLAG(CLOEXEC); + P_SK_FLAG(NONBLOCK); +#undef P_SK_FLAG + + if (flags) + printed += scnprintf(bf + printed, size - printed, "|%#x", flags); + + return printed; +} + +#define SCA_SK_TYPE syscall_arg__scnprintf_socket_type + +#ifndef MSG_PROBE +#define MSG_PROBE 0x10 +#endif +#ifndef MSG_WAITFORONE +#define MSG_WAITFORONE 0x10000 +#endif +#ifndef MSG_SENDPAGE_NOTLAST +#define MSG_SENDPAGE_NOTLAST 0x20000 +#endif +#ifndef MSG_FASTOPEN +#define MSG_FASTOPEN 0x20000000 +#endif + +static size_t syscall_arg__scnprintf_msg_flags(char *bf, size_t size, + struct syscall_arg *arg) +{ + int printed = 0, flags = arg->val; + + if (flags == 0) + return scnprintf(bf, size, "NONE"); +#define P_MSG_FLAG(n) \ + if (flags & MSG_##n) { \ + printed += scnprintf(bf + printed, size - printed, "%s%s", printed ? "|" : "", #n); \ + flags &= ~MSG_##n; \ + } + + P_MSG_FLAG(OOB); + P_MSG_FLAG(PEEK); + P_MSG_FLAG(DONTROUTE); + P_MSG_FLAG(TRYHARD); + P_MSG_FLAG(CTRUNC); + P_MSG_FLAG(PROBE); + P_MSG_FLAG(TRUNC); + P_MSG_FLAG(DONTWAIT); + P_MSG_FLAG(EOR); + P_MSG_FLAG(WAITALL); + P_MSG_FLAG(FIN); + P_MSG_FLAG(SYN); + P_MSG_FLAG(CONFIRM); + P_MSG_FLAG(RST); + P_MSG_FLAG(ERRQUEUE); + P_MSG_FLAG(NOSIGNAL); + P_MSG_FLAG(MORE); + P_MSG_FLAG(WAITFORONE); + P_MSG_FLAG(SENDPAGE_NOTLAST); + P_MSG_FLAG(FASTOPEN); + P_MSG_FLAG(CMSG_CLOEXEC); +#undef P_MSG_FLAG + + if (flags) + printed += scnprintf(bf + printed, size - printed, "%s%#x", printed ? "|" : "", flags); + + return printed; +} + +#define SCA_MSG_FLAGS syscall_arg__scnprintf_msg_flags + +static size_t syscall_arg__scnprintf_access_mode(char *bf, size_t size, + struct syscall_arg *arg) +{ + size_t printed = 0; + int mode = arg->val; + + if (mode == F_OK) /* 0 */ + return scnprintf(bf, size, "F"); +#define P_MODE(n) \ + if (mode & n##_OK) { \ + printed += scnprintf(bf + printed, size - printed, "%s", #n); \ + mode &= ~n##_OK; \ + } + + P_MODE(R); + P_MODE(W); + P_MODE(X); +#undef P_MODE + + if (mode) + printed += scnprintf(bf + printed, size - printed, "|%#x", mode); + + return printed; +} + +#define SCA_ACCMODE syscall_arg__scnprintf_access_mode + static size_t syscall_arg__scnprintf_open_flags(char *bf, size_t size, - unsigned long arg, - u8 arg_idx, u8 *arg_mask) + struct syscall_arg *arg) { - int printed = 0, flags = arg; + int printed = 0, flags = arg->val; if (!(flags & O_CREAT)) - *arg_mask |= 1 << (arg_idx + 1); /* Mask the mode parm */ + arg->mask |= 1 << (arg->idx + 1); /* Mask the mode parm */ if (flags == 0) return scnprintf(bf, size, "RDONLY"); @@ -291,32 +548,225 @@ static size_t syscall_arg__scnprintf_open_flags(char *bf, size_t size, #define SCA_OPEN_FLAGS syscall_arg__scnprintf_open_flags +static size_t syscall_arg__scnprintf_eventfd_flags(char *bf, size_t size, + struct syscall_arg *arg) +{ + int printed = 0, flags = arg->val; + + if (flags == 0) + return scnprintf(bf, size, "NONE"); +#define P_FLAG(n) \ + if (flags & EFD_##n) { \ + printed += scnprintf(bf + printed, size - printed, "%s%s", printed ? "|" : "", #n); \ + flags &= ~EFD_##n; \ + } + + P_FLAG(SEMAPHORE); + P_FLAG(CLOEXEC); + P_FLAG(NONBLOCK); +#undef P_FLAG + + if (flags) + printed += scnprintf(bf + printed, size - printed, "%s%#x", printed ? "|" : "", flags); + + return printed; +} + +#define SCA_EFD_FLAGS syscall_arg__scnprintf_eventfd_flags + +static size_t syscall_arg__scnprintf_pipe_flags(char *bf, size_t size, + struct syscall_arg *arg) +{ + int printed = 0, flags = arg->val; + +#define P_FLAG(n) \ + if (flags & O_##n) { \ + printed += scnprintf(bf + printed, size - printed, "%s%s", printed ? "|" : "", #n); \ + flags &= ~O_##n; \ + } + + P_FLAG(CLOEXEC); + P_FLAG(NONBLOCK); +#undef P_FLAG + + if (flags) + printed += scnprintf(bf + printed, size - printed, "%s%#x", printed ? "|" : "", flags); + + return printed; +} + +#define SCA_PIPE_FLAGS syscall_arg__scnprintf_pipe_flags + +static size_t syscall_arg__scnprintf_signum(char *bf, size_t size, struct syscall_arg *arg) +{ + int sig = arg->val; + + switch (sig) { +#define P_SIGNUM(n) case SIG##n: return scnprintf(bf, size, #n) + P_SIGNUM(HUP); + P_SIGNUM(INT); + P_SIGNUM(QUIT); + P_SIGNUM(ILL); + P_SIGNUM(TRAP); + P_SIGNUM(ABRT); + P_SIGNUM(BUS); + P_SIGNUM(FPE); + P_SIGNUM(KILL); + P_SIGNUM(USR1); + P_SIGNUM(SEGV); + P_SIGNUM(USR2); + P_SIGNUM(PIPE); + P_SIGNUM(ALRM); + P_SIGNUM(TERM); + P_SIGNUM(STKFLT); + P_SIGNUM(CHLD); + P_SIGNUM(CONT); + P_SIGNUM(STOP); + P_SIGNUM(TSTP); + P_SIGNUM(TTIN); + P_SIGNUM(TTOU); + P_SIGNUM(URG); + P_SIGNUM(XCPU); + P_SIGNUM(XFSZ); + P_SIGNUM(VTALRM); + P_SIGNUM(PROF); + P_SIGNUM(WINCH); + P_SIGNUM(IO); + P_SIGNUM(PWR); + P_SIGNUM(SYS); + default: break; + } + + return scnprintf(bf, size, "%#x", sig); +} + +#define SCA_SIGNUM syscall_arg__scnprintf_signum + +#define TCGETS 0x5401 + +static const char *tioctls[] = { + "TCGETS", "TCSETS", "TCSETSW", "TCSETSF", "TCGETA", "TCSETA", "TCSETAW", + "TCSETAF", "TCSBRK", "TCXONC", "TCFLSH", "TIOCEXCL", "TIOCNXCL", + "TIOCSCTTY", "TIOCGPGRP", "TIOCSPGRP", "TIOCOUTQ", "TIOCSTI", + "TIOCGWINSZ", "TIOCSWINSZ", "TIOCMGET", "TIOCMBIS", "TIOCMBIC", + "TIOCMSET", "TIOCGSOFTCAR", "TIOCSSOFTCAR", "FIONREAD", "TIOCLINUX", + "TIOCCONS", "TIOCGSERIAL", "TIOCSSERIAL", "TIOCPKT", "FIONBIO", + "TIOCNOTTY", "TIOCSETD", "TIOCGETD", "TCSBRKP", [0x27] = "TIOCSBRK", + "TIOCCBRK", "TIOCGSID", "TCGETS2", "TCSETS2", "TCSETSW2", "TCSETSF2", + "TIOCGRS485", "TIOCSRS485", "TIOCGPTN", "TIOCSPTLCK", + "TIOCGDEV||TCGETX", "TCSETX", "TCSETXF", "TCSETXW", "TIOCSIG", + "TIOCVHANGUP", "TIOCGPKT", "TIOCGPTLCK", "TIOCGEXCL", + [0x50] = "FIONCLEX", "FIOCLEX", "FIOASYNC", "TIOCSERCONFIG", + "TIOCSERGWILD", "TIOCSERSWILD", "TIOCGLCKTRMIOS", "TIOCSLCKTRMIOS", + "TIOCSERGSTRUCT", "TIOCSERGETLSR", "TIOCSERGETMULTI", "TIOCSERSETMULTI", + "TIOCMIWAIT", "TIOCGICOUNT", [0x60] = "FIOQSIZE", +}; + +static DEFINE_STRARRAY_OFFSET(tioctls, 0x5401); + +#define STRARRAY(arg, name, array) \ + .arg_scnprintf = { [arg] = SCA_STRARRAY, }, \ + .arg_parm = { [arg] = &strarray__##array, } + static struct syscall_fmt { const char *name; const char *alias; - size_t (*arg_scnprintf[6])(char *bf, size_t size, unsigned long arg, u8 arg_idx, u8 *arg_mask); + size_t (*arg_scnprintf[6])(char *bf, size_t size, struct syscall_arg *arg); + void *arg_parm[6]; bool errmsg; bool timeout; bool hexret; } syscall_fmts[] = { - { .name = "access", .errmsg = true, }, + { .name = "access", .errmsg = true, + .arg_scnprintf = { [1] = SCA_ACCMODE, /* mode */ }, }, { .name = "arch_prctl", .errmsg = true, .alias = "prctl", }, { .name = "brk", .hexret = true, .arg_scnprintf = { [0] = SCA_HEX, /* brk */ }, }, - { .name = "mmap", .hexret = true, }, + { .name = "clock_gettime", .errmsg = true, STRARRAY(0, clk_id, clockid), }, + { .name = "close", .errmsg = true, + .arg_scnprintf = { [0] = SCA_CLOSE_FD, /* fd */ }, }, { .name = "connect", .errmsg = true, }, - { .name = "fstat", .errmsg = true, .alias = "newfstat", }, - { .name = "fstatat", .errmsg = true, .alias = "newfstatat", }, + { .name = "dup", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "dup2", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "dup3", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "epoll_ctl", .errmsg = true, STRARRAY(1, op, epoll_ctl_ops), }, + { .name = "eventfd2", .errmsg = true, + .arg_scnprintf = { [1] = SCA_EFD_FLAGS, /* flags */ }, }, + { .name = "faccessat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "fadvise64", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fallocate", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fchdir", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fchmod", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fchmodat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, + { .name = "fchown", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fchownat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, + { .name = "fcntl", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ + [1] = SCA_STRARRAY, /* cmd */ }, + .arg_parm = { [1] = &strarray__fcntl_cmds, /* cmd */ }, }, + { .name = "fdatasync", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "flock", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ + [1] = SCA_FLOCK, /* cmd */ }, }, + { .name = "fsetxattr", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fstat", .errmsg = true, .alias = "newfstat", + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fstatat", .errmsg = true, .alias = "newfstatat", + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "fstatfs", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "fsync", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "ftruncate", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, { .name = "futex", .errmsg = true, .arg_scnprintf = { [1] = SCA_FUTEX_OP, /* op */ }, }, + { .name = "futimesat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, + { .name = "getdents", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "getdents64", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "getitimer", .errmsg = true, STRARRAY(0, which, itimers), }, + { .name = "getrlimit", .errmsg = true, STRARRAY(0, resource, rlimit_resources), }, { .name = "ioctl", .errmsg = true, - .arg_scnprintf = { [2] = SCA_HEX, /* arg */ }, }, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ + [1] = SCA_STRHEXARRAY, /* cmd */ + [2] = SCA_HEX, /* arg */ }, + .arg_parm = { [1] = &strarray__tioctls, /* cmd */ }, }, + { .name = "kill", .errmsg = true, + .arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, }, + { .name = "linkat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, { .name = "lseek", .errmsg = true, - .arg_scnprintf = { [2] = SCA_WHENCE, /* whence */ }, }, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ + [2] = SCA_STRARRAY, /* whence */ }, + .arg_parm = { [2] = &strarray__whences, /* whence */ }, }, { .name = "lstat", .errmsg = true, .alias = "newlstat", }, { .name = "madvise", .errmsg = true, .arg_scnprintf = { [0] = SCA_HEX, /* start */ [2] = SCA_MADV_BHV, /* behavior */ }, }, + { .name = "mkdirat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, + { .name = "mknodat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, }, + { .name = "mlock", .errmsg = true, + .arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, }, + { .name = "mlockall", .errmsg = true, + .arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, }, { .name = "mmap", .hexret = true, .arg_scnprintf = { [0] = SCA_HEX, /* addr */ [2] = SCA_MMAP_PROT, /* prot */ @@ -327,24 +777,91 @@ static struct syscall_fmt { { .name = "mremap", .hexret = true, .arg_scnprintf = { [0] = SCA_HEX, /* addr */ [4] = SCA_HEX, /* new_addr */ }, }, + { .name = "munlock", .errmsg = true, + .arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, }, { .name = "munmap", .errmsg = true, .arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, }, + { .name = "name_to_handle_at", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "newfstatat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, { .name = "open", .errmsg = true, .arg_scnprintf = { [1] = SCA_OPEN_FLAGS, /* flags */ }, }, { .name = "open_by_handle_at", .errmsg = true, - .arg_scnprintf = { [2] = SCA_OPEN_FLAGS, /* flags */ }, }, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ + [2] = SCA_OPEN_FLAGS, /* flags */ }, }, { .name = "openat", .errmsg = true, - .arg_scnprintf = { [2] = SCA_OPEN_FLAGS, /* flags */ }, }, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ + [2] = SCA_OPEN_FLAGS, /* flags */ }, }, + { .name = "pipe2", .errmsg = true, + .arg_scnprintf = { [1] = SCA_PIPE_FLAGS, /* flags */ }, }, { .name = "poll", .errmsg = true, .timeout = true, }, { .name = "ppoll", .errmsg = true, .timeout = true, }, - { .name = "pread", .errmsg = true, .alias = "pread64", }, - { .name = "pwrite", .errmsg = true, .alias = "pwrite64", }, - { .name = "read", .errmsg = true, }, - { .name = "recvfrom", .errmsg = true, }, + { .name = "pread", .errmsg = true, .alias = "pread64", + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "preadv", .errmsg = true, .alias = "pread", + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "prlimit64", .errmsg = true, STRARRAY(1, resource, rlimit_resources), }, + { .name = "pwrite", .errmsg = true, .alias = "pwrite64", + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "pwritev", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "read", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "readlinkat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "readv", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "recvfrom", .errmsg = true, + .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "recvmmsg", .errmsg = true, + .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "recvmsg", .errmsg = true, + .arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "renameat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "rt_sigaction", .errmsg = true, + .arg_scnprintf = { [0] = SCA_SIGNUM, /* sig */ }, }, + { .name = "rt_sigprocmask", .errmsg = true, STRARRAY(0, how, sighow), }, + { .name = "rt_sigqueueinfo", .errmsg = true, + .arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, }, + { .name = "rt_tgsigqueueinfo", .errmsg = true, + .arg_scnprintf = { [2] = SCA_SIGNUM, /* sig */ }, }, { .name = "select", .errmsg = true, .timeout = true, }, - { .name = "socket", .errmsg = true, }, + { .name = "sendmmsg", .errmsg = true, + .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "sendmsg", .errmsg = true, + .arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "sendto", .errmsg = true, + .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, }, + { .name = "setitimer", .errmsg = true, STRARRAY(0, which, itimers), }, + { .name = "setrlimit", .errmsg = true, STRARRAY(0, resource, rlimit_resources), }, + { .name = "shutdown", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "socket", .errmsg = true, + .arg_scnprintf = { [0] = SCA_STRARRAY, /* family */ + [1] = SCA_SK_TYPE, /* type */ }, + .arg_parm = { [0] = &strarray__socket_families, /* family */ }, }, + { .name = "socketpair", .errmsg = true, + .arg_scnprintf = { [0] = SCA_STRARRAY, /* family */ + [1] = SCA_SK_TYPE, /* type */ }, + .arg_parm = { [0] = &strarray__socket_families, /* family */ }, }, { .name = "stat", .errmsg = true, .alias = "newstat", }, + { .name = "symlinkat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "tgkill", .errmsg = true, + .arg_scnprintf = { [2] = SCA_SIGNUM, /* sig */ }, }, + { .name = "tkill", .errmsg = true, + .arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, }, { .name = "uname", .errmsg = true, .alias = "newuname", }, + { .name = "unlinkat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, }, + { .name = "utimensat", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */ }, }, + { .name = "write", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, + { .name = "writev", .errmsg = true, + .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, }, }; static int syscall_fmt__cmp(const void *name, const void *fmtp) @@ -364,8 +881,8 @@ struct syscall { const char *name; bool filtered; struct syscall_fmt *fmt; - size_t (**arg_scnprintf)(char *bf, size_t size, - unsigned long arg, u8 arg_idx, u8 *args_mask); + size_t (**arg_scnprintf)(char *bf, size_t size, struct syscall_arg *arg); + void **arg_parm; }; static size_t fprintf_duration(unsigned long t, FILE *fp) @@ -389,11 +906,24 @@ struct thread_trace { unsigned long nr_events; char *entry_str; double runtime_ms; + struct { + int max; + char **table; + } paths; + + struct intlist *syscall_stats; }; static struct thread_trace *thread_trace__new(void) { - return zalloc(sizeof(struct thread_trace)); + struct thread_trace *ttrace = zalloc(sizeof(struct thread_trace)); + + if (ttrace) + ttrace->paths.max = -1; + + ttrace->syscall_stats = intlist__new(NULL); + + return ttrace; } static struct thread_trace *thread__trace(struct thread *thread, FILE *fp) @@ -421,26 +951,140 @@ fail: struct trace { struct perf_tool tool; - int audit_machine; + struct { + int machine; + int open_id; + } audit; struct { int max; struct syscall *table; } syscalls; struct perf_record_opts opts; - struct machine host; + struct machine *host; u64 base_time; + bool full_time; FILE *output; unsigned long nr_events; struct strlist *ev_qualifier; bool not_ev_qualifier; + bool live; + const char *last_vfs_getname; struct intlist *tid_list; struct intlist *pid_list; bool sched; bool multiple_threads; + bool summary; + bool show_comm; + bool show_tool_stats; double duration_filter; double runtime_ms; + struct { + u64 vfs_getname, proc_getname; + } stats; }; +static int trace__set_fd_pathname(struct thread *thread, int fd, const char *pathname) +{ + struct thread_trace *ttrace = thread->priv; + + if (fd > ttrace->paths.max) { + char **npath = realloc(ttrace->paths.table, (fd + 1) * sizeof(char *)); + + if (npath == NULL) + return -1; + + if (ttrace->paths.max != -1) { + memset(npath + ttrace->paths.max + 1, 0, + (fd - ttrace->paths.max) * sizeof(char *)); + } else { + memset(npath, 0, (fd + 1) * sizeof(char *)); + } + + ttrace->paths.table = npath; + ttrace->paths.max = fd; + } + + ttrace->paths.table[fd] = strdup(pathname); + + return ttrace->paths.table[fd] != NULL ? 0 : -1; +} + +static int thread__read_fd_path(struct thread *thread, int fd) +{ + char linkname[PATH_MAX], pathname[PATH_MAX]; + struct stat st; + int ret; + + if (thread->pid_ == thread->tid) { + scnprintf(linkname, sizeof(linkname), + "/proc/%d/fd/%d", thread->pid_, fd); + } else { + scnprintf(linkname, sizeof(linkname), + "/proc/%d/task/%d/fd/%d", thread->pid_, thread->tid, fd); + } + + if (lstat(linkname, &st) < 0 || st.st_size + 1 > (off_t)sizeof(pathname)) + return -1; + + ret = readlink(linkname, pathname, sizeof(pathname)); + + if (ret < 0 || ret > st.st_size) + return -1; + + pathname[ret] = '\0'; + return trace__set_fd_pathname(thread, fd, pathname); +} + +static const char *thread__fd_path(struct thread *thread, int fd, + struct trace *trace) +{ + struct thread_trace *ttrace = thread->priv; + + if (ttrace == NULL) + return NULL; + + if (fd < 0) + return NULL; + + if ((fd > ttrace->paths.max || ttrace->paths.table[fd] == NULL)) + if (!trace->live) + return NULL; + ++trace->stats.proc_getname; + if (thread__read_fd_path(thread, fd)) { + return NULL; + } + + return ttrace->paths.table[fd]; +} + +static size_t syscall_arg__scnprintf_fd(char *bf, size_t size, + struct syscall_arg *arg) +{ + int fd = arg->val; + size_t printed = scnprintf(bf, size, "%d", fd); + const char *path = thread__fd_path(arg->thread, fd, arg->trace); + + if (path) + printed += scnprintf(bf + printed, size - printed, "<%s>", path); + + return printed; +} + +static size_t syscall_arg__scnprintf_close_fd(char *bf, size_t size, + struct syscall_arg *arg) +{ + int fd = arg->val; + size_t printed = syscall_arg__scnprintf_fd(bf, size, arg); + struct thread_trace *ttrace = arg->thread->priv; + + if (ttrace && fd >= 0 && fd <= ttrace->paths.max) { + free(ttrace->paths.table[fd]); + ttrace->paths.table[fd] = NULL; + } + + return printed; +} + static bool trace__filter_duration(struct trace *trace, double t) { return t < (trace->duration_filter * NSEC_PER_MSEC); @@ -454,10 +1098,12 @@ static size_t trace__fprintf_tstamp(struct trace *trace, u64 tstamp, FILE *fp) } static bool done = false; +static bool interrupted = false; -static void sig_handler(int sig __maybe_unused) +static void sig_handler(int sig) { done = true; + interrupted = sig == SIGINT; } static size_t trace__fprintf_entry_head(struct trace *trace, struct thread *thread, @@ -466,8 +1112,11 @@ static size_t trace__fprintf_entry_head(struct trace *trace, struct thread *thre size_t printed = trace__fprintf_tstamp(trace, tstamp, fp); printed += fprintf_duration(duration, fp); - if (trace->multiple_threads) + if (trace->multiple_threads) { + if (trace->show_comm) + printed += fprintf(fp, "%.14s/", thread->comm); printed += fprintf(fp, "%d ", thread->tid); + } return printed; } @@ -506,16 +1155,17 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist) if (err) return err; - machine__init(&trace->host, "", HOST_KERNEL_ID); - machine__create_kernel_maps(&trace->host); + trace->host = machine__new_host(); + if (trace->host == NULL) + return -ENOMEM; if (perf_target__has_task(&trace->opts.target)) { err = perf_event__synthesize_thread_map(&trace->tool, evlist->threads, trace__tool_process, - &trace->host); + trace->host); } else { err = perf_event__synthesize_threads(&trace->tool, trace__tool_process, - &trace->host); + trace->host); } if (err) @@ -533,6 +1183,9 @@ static int syscall__set_arg_fmts(struct syscall *sc) if (sc->arg_scnprintf == NULL) return -1; + if (sc->fmt) + sc->arg_parm = sc->fmt->arg_parm; + for (field = sc->tp_format->format.fields->next; field; field = field->next) { if (sc->fmt && sc->fmt->arg_scnprintf[idx]) sc->arg_scnprintf[idx] = sc->fmt->arg_scnprintf[idx]; @@ -548,7 +1201,7 @@ static int trace__read_syscall_info(struct trace *trace, int id) { char tp_name[128]; struct syscall *sc; - const char *name = audit_syscall_to_name(id, trace->audit_machine); + const char *name = audit_syscall_to_name(id, trace->audit.machine); if (name == NULL) return -1; @@ -603,32 +1256,52 @@ static int trace__read_syscall_info(struct trace *trace, int id) } static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size, - unsigned long *args) + unsigned long *args, struct trace *trace, + struct thread *thread) { - int i = 0; size_t printed = 0; if (sc->tp_format != NULL) { struct format_field *field; - u8 mask = 0, bit = 1; + u8 bit = 1; + struct syscall_arg arg = { + .idx = 0, + .mask = 0, + .trace = trace, + .thread = thread, + }; for (field = sc->tp_format->format.fields->next; field; - field = field->next, ++i, bit <<= 1) { - if (mask & bit) + field = field->next, ++arg.idx, bit <<= 1) { + if (arg.mask & bit) + continue; + /* + * Suppress this argument if its value is zero and + * and we don't have a string associated in an + * strarray for it. + */ + if (args[arg.idx] == 0 && + !(sc->arg_scnprintf && + sc->arg_scnprintf[arg.idx] == SCA_STRARRAY && + sc->arg_parm[arg.idx])) continue; printed += scnprintf(bf + printed, size - printed, "%s%s: ", printed ? ", " : "", field->name); - - if (sc->arg_scnprintf && sc->arg_scnprintf[i]) { - printed += sc->arg_scnprintf[i](bf + printed, size - printed, - args[i], i, &mask); + if (sc->arg_scnprintf && sc->arg_scnprintf[arg.idx]) { + arg.val = args[arg.idx]; + if (sc->arg_parm) + arg.parm = sc->arg_parm[arg.idx]; + printed += sc->arg_scnprintf[arg.idx](bf + printed, + size - printed, &arg); } else { printed += scnprintf(bf + printed, size - printed, - "%ld", args[i]); + "%ld", args[arg.idx]); } } } else { + int i = 0; + while (i < 6) { printed += scnprintf(bf + printed, size - printed, "%sarg%d: %ld", @@ -644,10 +1317,8 @@ typedef int (*tracepoint_handler)(struct trace *trace, struct perf_evsel *evsel, struct perf_sample *sample); static struct syscall *trace__syscall_info(struct trace *trace, - struct perf_evsel *evsel, - struct perf_sample *sample) + struct perf_evsel *evsel, int id) { - int id = perf_evsel__intval(evsel, sample, "id"); if (id < 0) { @@ -688,6 +1359,32 @@ out_cant_read: return NULL; } +static void thread__update_stats(struct thread_trace *ttrace, + int id, struct perf_sample *sample) +{ + struct int_node *inode; + struct stats *stats; + u64 duration = 0; + + inode = intlist__findnew(ttrace->syscall_stats, id); + if (inode == NULL) + return; + + stats = inode->priv; + if (stats == NULL) { + stats = malloc(sizeof(struct stats)); + if (stats == NULL) + return; + init_stats(stats); + inode->priv = stats; + } + + if (ttrace->entry_time && sample->time > ttrace->entry_time) + duration = sample->time - ttrace->entry_time; + + update_stats(stats, duration); +} + static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel, struct perf_sample *sample) { @@ -695,7 +1392,8 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel, void *args; size_t printed = 0; struct thread *thread; - struct syscall *sc = trace__syscall_info(trace, evsel, sample); + int id = perf_evsel__intval(evsel, sample, "id"); + struct syscall *sc = trace__syscall_info(trace, evsel, id); struct thread_trace *ttrace; if (sc == NULL) @@ -704,8 +1402,7 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel, if (sc->filtered) return 0; - thread = machine__findnew_thread(&trace->host, sample->pid, - sample->tid); + thread = machine__findnew_thread(trace->host, sample->pid, sample->tid); ttrace = thread__trace(thread, trace->output); if (ttrace == NULL) return -1; @@ -728,7 +1425,8 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel, msg = ttrace->entry_str; printed += scnprintf(msg + printed, 1024 - printed, "%s(", sc->name); - printed += syscall__scnprintf_args(sc, msg + printed, 1024 - printed, args); + printed += syscall__scnprintf_args(sc, msg + printed, 1024 - printed, + args, trace, thread); if (!strcmp(sc->name, "exit_group") || !strcmp(sc->name, "exit")) { if (!trace->duration_filter) { @@ -747,7 +1445,8 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel, int ret; u64 duration = 0; struct thread *thread; - struct syscall *sc = trace__syscall_info(trace, evsel, sample); + int id = perf_evsel__intval(evsel, sample, "id"); + struct syscall *sc = trace__syscall_info(trace, evsel, id); struct thread_trace *ttrace; if (sc == NULL) @@ -756,14 +1455,22 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel, if (sc->filtered) return 0; - thread = machine__findnew_thread(&trace->host, sample->pid, - sample->tid); + thread = machine__findnew_thread(trace->host, sample->pid, sample->tid); ttrace = thread__trace(thread, trace->output); if (ttrace == NULL) return -1; + if (trace->summary) + thread__update_stats(ttrace, id, sample); + ret = perf_evsel__intval(evsel, sample, "ret"); + if (id == trace->audit.open_id && ret >= 0 && trace->last_vfs_getname) { + trace__set_fd_pathname(thread, ret, trace->last_vfs_getname); + trace->last_vfs_getname = NULL; + ++trace->stats.vfs_getname; + } + ttrace = thread->priv; ttrace->exit_time = sample->time; @@ -808,12 +1515,19 @@ out: return 0; } +static int trace__vfs_getname(struct trace *trace, struct perf_evsel *evsel, + struct perf_sample *sample) +{ + trace->last_vfs_getname = perf_evsel__rawptr(evsel, sample, "pathname"); + return 0; +} + static int trace__sched_stat_runtime(struct trace *trace, struct perf_evsel *evsel, struct perf_sample *sample) { u64 runtime = perf_evsel__intval(evsel, sample, "runtime"); double runtime_ms = (double)runtime / NSEC_PER_MSEC; - struct thread *thread = machine__findnew_thread(&trace->host, + struct thread *thread = machine__findnew_thread(trace->host, sample->pid, sample->tid); struct thread_trace *ttrace = thread__trace(thread, trace->output); @@ -861,7 +1575,7 @@ static int trace__process_sample(struct perf_tool *tool, if (skip_sample(trace, sample)) return 0; - if (trace->base_time == 0) + if (!trace->full_time && trace->base_time == 0) trace->base_time = sample->time; if (handler) @@ -901,6 +1615,51 @@ static int parse_target_str(struct trace *trace) return 0; } +static int trace__record(int argc, const char **argv) +{ + unsigned int rec_argc, i, j; + const char **rec_argv; + const char * const record_args[] = { + "record", + "-R", + "-m", "1024", + "-c", "1", + "-e", "raw_syscalls:sys_enter,raw_syscalls:sys_exit", + }; + + rec_argc = ARRAY_SIZE(record_args) + argc; + rec_argv = calloc(rec_argc + 1, sizeof(char *)); + + if (rec_argv == NULL) + return -ENOMEM; + + for (i = 0; i < ARRAY_SIZE(record_args); i++) + rec_argv[i] = record_args[i]; + + for (j = 0; j < (unsigned int)argc; j++, i++) + rec_argv[i] = argv[j]; + + return cmd_record(i, rec_argv, NULL); +} + +static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp); + +static void perf_evlist__add_vfs_getname(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel = perf_evsel__newtp("probe", "vfs_getname", + evlist->nr_entries); + if (evsel == NULL) + return; + + if (perf_evsel__field(evsel, "pathname") == NULL) { + perf_evsel__delete(evsel); + return; + } + + evsel->handler.func = trace__vfs_getname; + perf_evlist__add(evlist, evsel); +} + static int trace__run(struct trace *trace, int argc, const char **argv) { struct perf_evlist *evlist = perf_evlist__new(); @@ -909,23 +1668,23 @@ static int trace__run(struct trace *trace, int argc, const char **argv) unsigned long before; const bool forks = argc > 0; + trace->live = true; + if (evlist == NULL) { fprintf(trace->output, "Not enough memory to run!\n"); goto out; } if (perf_evlist__add_newtp(evlist, "raw_syscalls", "sys_enter", trace__sys_enter) || - perf_evlist__add_newtp(evlist, "raw_syscalls", "sys_exit", trace__sys_exit)) { - fprintf(trace->output, "Couldn't read the raw_syscalls tracepoints information!\n"); - goto out_delete_evlist; - } + perf_evlist__add_newtp(evlist, "raw_syscalls", "sys_exit", trace__sys_exit)) + goto out_error_tp; + + perf_evlist__add_vfs_getname(evlist); if (trace->sched && - perf_evlist__add_newtp(evlist, "sched", "sched_stat_runtime", - trace__sched_stat_runtime)) { - fprintf(trace->output, "Couldn't read the sched_stat_runtime tracepoint information!\n"); - goto out_delete_evlist; - } + perf_evlist__add_newtp(evlist, "sched", "sched_stat_runtime", + trace__sched_stat_runtime)) + goto out_error_tp; err = perf_evlist__create_maps(evlist, &trace->opts.target); if (err < 0) { @@ -954,10 +1713,8 @@ static int trace__run(struct trace *trace, int argc, const char **argv) } err = perf_evlist__open(evlist); - if (err < 0) { - fprintf(trace->output, "Couldn't create the events: %s\n", strerror(errno)); - goto out_delete_maps; - } + if (err < 0) + goto out_error_open; err = perf_evlist__mmap(evlist, UINT_MAX, false); if (err < 0) { @@ -990,11 +1747,11 @@ again: continue; } - if (trace->base_time == 0) + if (!trace->full_time && trace->base_time == 0) trace->base_time = sample.time; if (type != PERF_RECORD_SAMPLE) { - trace__process_event(trace, &trace->host, event); + trace__process_event(trace, trace->host, event); continue; } @@ -1014,24 +1771,36 @@ again: handler = evsel->handler.func; handler(trace, evsel, &sample); - if (done) - goto out_unmap_evlist; + if (interrupted) + goto out_disable; } } if (trace->nr_events == before) { - if (done) - goto out_unmap_evlist; + int timeout = done ? 100 : -1; - poll(evlist->pollfd, evlist->nr_fds, -1); + if (poll(evlist->pollfd, evlist->nr_fds, timeout) > 0) + goto again; + } else { + goto again; } - if (done) - perf_evlist__disable(evlist); +out_disable: + perf_evlist__disable(evlist); - goto again; + if (!err) { + if (trace->summary) + trace__fprintf_thread_summary(trace, trace->output); + + if (trace->show_tool_stats) { + fprintf(trace->output, "Stats:\n " + " vfs_getname : %" PRIu64 "\n" + " proc_getname: %" PRIu64 "\n", + trace->stats.vfs_getname, + trace->stats.proc_getname); + } + } -out_unmap_evlist: perf_evlist__munmap(evlist); out_close_evlist: perf_evlist__close(evlist); @@ -1040,7 +1809,22 @@ out_delete_maps: out_delete_evlist: perf_evlist__delete(evlist); out: + trace->live = false; return err; +{ + char errbuf[BUFSIZ]; + +out_error_tp: + perf_evlist__strerror_tp(evlist, errno, errbuf, sizeof(errbuf)); + goto out_error; + +out_error_open: + perf_evlist__strerror_open(evlist, errno, errbuf, sizeof(errbuf)); + +out_error: + fprintf(trace->output, "%s\n", errbuf); + goto out_delete_evlist; +} } static int trace__replay(struct trace *trace) @@ -1048,13 +1832,18 @@ static int trace__replay(struct trace *trace) const struct perf_evsel_str_handler handlers[] = { { "raw_syscalls:sys_enter", trace__sys_enter, }, { "raw_syscalls:sys_exit", trace__sys_exit, }, + { "probe:vfs_getname", trace__vfs_getname, }, + }; + struct perf_data_file file = { + .path = input_name, + .mode = PERF_DATA_MODE_READ, }; - struct perf_session *session; int err = -1; trace->tool.sample = trace__process_sample; trace->tool.mmap = perf_event__process_mmap; + trace->tool.mmap2 = perf_event__process_mmap2; trace->tool.comm = perf_event__process_comm; trace->tool.exit = perf_event__process_exit; trace->tool.fork = perf_event__process_fork; @@ -1071,11 +1860,12 @@ static int trace__replay(struct trace *trace) if (symbol__init() < 0) return -1; - session = perf_session__new(input_name, O_RDONLY, 0, false, - &trace->tool); + session = perf_session__new(&file, false, &trace->tool); if (session == NULL) return -ENOMEM; + trace->host = &session->machines.host; + err = perf_session__set_tracepoints_handlers(session, handlers); if (err) goto out; @@ -1100,6 +1890,9 @@ static int trace__replay(struct trace *trace) if (err) pr_err("Failed to process events, error %d", err); + else if (trace->summary) + trace__fprintf_thread_summary(trace, trace->output); + out: perf_session__delete(session); @@ -1110,47 +1903,111 @@ static size_t trace__fprintf_threads_header(FILE *fp) { size_t printed; - printed = fprintf(fp, "\n _____________________________________________________________________\n"); - printed += fprintf(fp," __) Summary of events (__\n\n"); - printed += fprintf(fp," [ task - pid ] [ events ] [ ratio ] [ runtime ]\n"); - printed += fprintf(fp," _____________________________________________________________________\n\n"); + printed = fprintf(fp, "\n _____________________________________________________________________________\n"); + printed += fprintf(fp, " __) Summary of events (__\n\n"); + printed += fprintf(fp, " [ task - pid ] [ events ] [ ratio ] [ runtime ]\n"); + printed += fprintf(fp, " syscall count min max avg stddev\n"); + printed += fprintf(fp, " msec msec msec %%\n"); + printed += fprintf(fp, " _____________________________________________________________________________\n\n"); return printed; } -static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp) +static size_t thread__dump_stats(struct thread_trace *ttrace, + struct trace *trace, FILE *fp) { - size_t printed = trace__fprintf_threads_header(fp); - struct rb_node *nd; - - for (nd = rb_first(&trace->host.threads); nd; nd = rb_next(nd)) { - struct thread *thread = rb_entry(nd, struct thread, rb_node); - struct thread_trace *ttrace = thread->priv; - const char *color; - double ratio; - - if (ttrace == NULL) - continue; - - ratio = (double)ttrace->nr_events / trace->nr_events * 100.0; - - color = PERF_COLOR_NORMAL; - if (ratio > 50.0) - color = PERF_COLOR_RED; - else if (ratio > 25.0) - color = PERF_COLOR_GREEN; - else if (ratio > 5.0) - color = PERF_COLOR_YELLOW; - - printed += color_fprintf(fp, color, "%20s", thread->comm); - printed += fprintf(fp, " - %-5d :%11lu [", thread->tid, ttrace->nr_events); - printed += color_fprintf(fp, color, "%5.1f%%", ratio); - printed += fprintf(fp, " ] %10.3f ms\n", ttrace->runtime_ms); + struct stats *stats; + size_t printed = 0; + struct syscall *sc; + struct int_node *inode = intlist__first(ttrace->syscall_stats); + + if (inode == NULL) + return 0; + + printed += fprintf(fp, "\n"); + + /* each int_node is a syscall */ + while (inode) { + stats = inode->priv; + if (stats) { + double min = (double)(stats->min) / NSEC_PER_MSEC; + double max = (double)(stats->max) / NSEC_PER_MSEC; + double avg = avg_stats(stats); + double pct; + u64 n = (u64) stats->n; + + pct = avg ? 100.0 * stddev_stats(stats)/avg : 0.0; + avg /= NSEC_PER_MSEC; + + sc = &trace->syscalls.table[inode->i]; + printed += fprintf(fp, "%24s %14s : ", "", sc->name); + printed += fprintf(fp, "%5" PRIu64 " %8.3f %8.3f", + n, min, max); + printed += fprintf(fp, " %8.3f %6.2f\n", avg, pct); + } + + inode = intlist__next(inode); } + printed += fprintf(fp, "\n\n"); + return printed; } +/* struct used to pass data to per-thread function */ +struct summary_data { + FILE *fp; + struct trace *trace; + size_t printed; +}; + +static int trace__fprintf_one_thread(struct thread *thread, void *priv) +{ + struct summary_data *data = priv; + FILE *fp = data->fp; + size_t printed = data->printed; + struct trace *trace = data->trace; + struct thread_trace *ttrace = thread->priv; + const char *color; + double ratio; + + if (ttrace == NULL) + return 0; + + ratio = (double)ttrace->nr_events / trace->nr_events * 100.0; + + color = PERF_COLOR_NORMAL; + if (ratio > 50.0) + color = PERF_COLOR_RED; + else if (ratio > 25.0) + color = PERF_COLOR_GREEN; + else if (ratio > 5.0) + color = PERF_COLOR_YELLOW; + + printed += color_fprintf(fp, color, "%20s", thread->comm); + printed += fprintf(fp, " - %-5d :%11lu [", thread->tid, ttrace->nr_events); + printed += color_fprintf(fp, color, "%5.1f%%", ratio); + printed += fprintf(fp, " ] %10.3f ms\n", ttrace->runtime_ms); + printed += thread__dump_stats(ttrace, trace, fp); + + data->printed += printed; + + return 0; +} + +static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp) +{ + struct summary_data data = { + .fp = fp, + .trace = trace + }; + data.printed = trace__fprintf_threads_header(fp); + + machine__for_each_thread(trace->host, trace__fprintf_one_thread, &data); + + return data.printed; +} + static int trace__set_duration(const struct option *opt, const char *str, int unset __maybe_unused) { @@ -1182,10 +2039,15 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) const char * const trace_usage[] = { "perf trace [] []", "perf trace [] -- []", + "perf trace record [] []", + "perf trace record [] -- []", NULL }; struct trace trace = { - .audit_machine = audit_detect_machine(), + .audit = { + .machine = audit_detect_machine(), + .open_id = audit_name_to_syscall("open", trace.audit.machine), + }, .syscalls = { . max = -1, }, @@ -1200,10 +2062,14 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) .mmap_pages = 1024, }, .output = stdout, + .show_comm = true, }; const char *output_name = NULL; const char *ev_qualifier_str = NULL; const struct option trace_options[] = { + OPT_BOOLEAN(0, "comm", &trace.show_comm, + "show the thread COMM next to its id"), + OPT_BOOLEAN(0, "tool_stats", &trace.show_tool_stats, "show tool stats"), OPT_STRING('e', "expr", &ev_qualifier_str, "expr", "list of events to trace"), OPT_STRING('o', "output", &output_name, "file", "output file name"), @@ -1218,8 +2084,9 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) "list of cpus to monitor"), OPT_BOOLEAN(0, "no-inherit", &trace.opts.no_inherit, "child tasks do not inherit counters"), - OPT_UINTEGER('m', "mmap-pages", &trace.opts.mmap_pages, - "number of mmap data pages"), + OPT_CALLBACK('m', "mmap-pages", &trace.opts.mmap_pages, "pages", + "number of mmap data pages", + perf_evlist__parse_mmap_pages), OPT_STRING('u', "uid", &trace.opts.target.uid_str, "user", "user to profile"), OPT_CALLBACK(0, "duration", &trace, "float", @@ -1227,11 +2094,18 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) trace__set_duration), OPT_BOOLEAN(0, "sched", &trace.sched, "show blocking scheduler events"), OPT_INCR('v', "verbose", &verbose, "be more verbose"), + OPT_BOOLEAN('T', "time", &trace.full_time, + "Show full timestamp, not time relative to first start"), + OPT_BOOLEAN(0, "summary", &trace.summary, + "Show syscall summary with statistics"), OPT_END() }; int err; char bf[BUFSIZ]; + if ((argc > 1) && (strcmp(argv[1], "record") == 0)) + return trace__record(argc-2, &argv[2]); + argc = parse_options(argc, argv, trace_options, trace_usage, 0); if (output_name != NULL) { @@ -1279,9 +2153,6 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused) else err = trace__run(&trace, argc, argv); - if (trace.sched && !err) - trace__fprintf_thread_summary(&trace, trace.output); - out_close: if (output_name != NULL) fclose(trace.output); diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 346ee929d250..c516d6ba6716 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -23,7 +23,7 @@ ifeq ($(ARCH),x86_64) endif ifeq (${IS_X86_64}, 1) RAW_ARCH := x86_64 - CFLAGS += -DARCH_X86_64 + CFLAGS += -DHAVE_ARCH_X86_64_SUPPORT ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S ../../arch/x86/lib/memset_64.S endif NO_PERF_REGS := 0 @@ -31,7 +31,7 @@ ifeq ($(ARCH),x86_64) endif ifeq ($(NO_PERF_REGS),0) - CFLAGS += -DHAVE_PERF_REGS + CFLAGS += -DHAVE_PERF_REGS_SUPPORT endif ifeq ($(src-perf),) @@ -51,7 +51,6 @@ LIB_INCLUDE := $(srctree)/tools/lib/ # include ARCH specific config -include $(src-perf)/arch/$(ARCH)/Makefile -include $(src-perf)/config/feature-tests.mak include $(src-perf)/config/utilities.mak ifeq ($(call get-executable,$(FLEX)),) @@ -67,10 +66,7 @@ ifneq ($(WERROR),0) CFLAGS += -Werror endif -ifeq ("$(origin DEBUG)", "command line") - PERF_DEBUG = $(DEBUG) -endif -ifndef PERF_DEBUG +ifeq ($(DEBUG),0) CFLAGS += -O6 endif @@ -87,22 +83,127 @@ CFLAGS += -Wall CFLAGS += -Wextra CFLAGS += -std=gnu99 -EXTLIBS = -lelf -lpthread -lrt -lm +EXTLIBS = -lelf -lpthread -lrt -lm -ldl -ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror -fstack-protector-all,-fstack-protector-all),y) - CFLAGS += -fstack-protector-all +ifneq ($(OUTPUT),) + OUTPUT_FEATURES = $(OUTPUT)config/feature-checks/ + $(shell mkdir -p $(OUTPUT_FEATURES)) endif -ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror -Wstack-protector,-Wstack-protector),y) - CFLAGS += -Wstack-protector +feature_check = $(eval $(feature_check_code)) +define feature_check_code + feature-$(1) := $(shell $(MAKE) OUTPUT=$(OUTPUT_FEATURES) LDFLAGS=$(LDFLAGS) -C config/feature-checks test-$1 >/dev/null 2>/dev/null && echo 1 || echo 0) +endef + +feature_set = $(eval $(feature_set_code)) +define feature_set_code + feature-$(1) := 1 +endef + +# +# Build the feature check binaries in parallel, ignore errors, ignore return value and suppress output: +# + +# +# Note that this is not a complete list of all feature tests, just +# those that are typically built on a fully configured system. +# +# [ Feature tests not mentioned here have to be built explicitly in +# the rule that uses them - an example for that is the 'bionic' +# feature check. ] +# +CORE_FEATURE_TESTS = \ + backtrace \ + dwarf \ + fortify-source \ + glibc \ + gtk2 \ + gtk2-infobar \ + libaudit \ + libbfd \ + libelf \ + libelf-getphdrnum \ + libelf-mmap \ + libnuma \ + libperl \ + libpython \ + libpython-version \ + libslang \ + libunwind \ + on-exit \ + stackprotector \ + stackprotector-all + +# +# So here we detect whether test-all was rebuilt, to be able +# to skip the print-out of the long features list if the file +# existed before and after it was built: +# +ifeq ($(wildcard $(OUTPUT)config/feature-checks/test-all),) + test-all-failed := 1 +else + test-all-failed := 0 +endif + +# +# Special fast-path for the 'all features are available' case: +# +$(call feature_check,all,$(MSG)) + +# +# Just in case the build freshly failed, make sure we print the +# feature matrix: +# +ifeq ($(feature-all), 0) + test-all-failed := 1 endif -ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror -Wvolatile-register-var,-Wvolatile-register-var),y) - CFLAGS += -Wvolatile-register-var +ifeq ($(test-all-failed),1) + $(info ) + $(info Auto-detecting system features:) endif -ifndef PERF_DEBUG - ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -D_FORTIFY_SOURCE=2,-D_FORTIFY_SOURCE=2),y) +ifeq ($(feature-all), 1) + # + # test-all.c passed - just set all the core feature flags to 1: + # + $(foreach feat,$(CORE_FEATURE_TESTS),$(call feature_set,$(feat))) +else + $(shell $(MAKE) OUTPUT=$(OUTPUT_FEATURES) LDFLAGS=$(LDFLAGS) -i -j -C config/feature-checks $(CORE_FEATURE_TESTS) >/dev/null 2>&1) + $(foreach feat,$(CORE_FEATURE_TESTS),$(call feature_check,$(feat))) +endif + +# +# Print the result of the feature test: +# +feature_print = $(eval $(feature_print_code)) $(info $(MSG)) + +define feature_print_code + ifeq ($(feature-$(1)), 1) + MSG = $(shell printf '...%30s: [ \033[32mon\033[m ]' $(1)) + else + MSG = $(shell printf '...%30s: [ \033[31mOFF\033[m ]' $(1)) + endif +endef + +# +# Only print out our features if we rebuilt the testcases or if a test failed: +# +ifeq ($(test-all-failed), 1) + $(foreach feat,$(CORE_FEATURE_TESTS),$(call feature_print,$(feat))) + $(info ) +endif + +ifeq ($(feature-stackprotector-all), 1) + CFLAGS += -fstack-protector-all +endif + +ifeq ($(feature-stackprotector), 1) + CFLAGS += -Wstack-protector +endif + +ifeq ($(DEBUG),0) + ifeq ($(feature-fortify-source), 1) CFLAGS += -D_FORTIFY_SOURCE=2 endif endif @@ -128,84 +229,74 @@ CFLAGS += -I$(LIB_INCLUDE) CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE ifndef NO_BIONIC -ifeq ($(call try-cc,$(SOURCE_BIONIC),$(CFLAGS),bionic),y) - BIONIC := 1 - EXTLIBS := $(filter-out -lrt,$(EXTLIBS)) - EXTLIBS := $(filter-out -lpthread,$(EXTLIBS)) + $(feature_check,bionic) + ifeq ($(feature-bionic), 1) + BIONIC := 1 + EXTLIBS := $(filter-out -lrt,$(EXTLIBS)) + EXTLIBS := $(filter-out -lpthread,$(EXTLIBS)) + endif endif -endif # NO_BIONIC ifdef NO_LIBELF NO_DWARF := 1 NO_DEMANGLE := 1 NO_LIBUNWIND := 1 else -FLAGS_LIBELF=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -ifneq ($(call try-cc,$(SOURCE_LIBELF),$(FLAGS_LIBELF),libelf),y) - FLAGS_GLIBC=$(CFLAGS) $(LDFLAGS) - ifeq ($(call try-cc,$(SOURCE_GLIBC),$(FLAGS_GLIBC),glibc),y) - LIBC_SUPPORT := 1 - endif - ifeq ($(BIONIC),1) - LIBC_SUPPORT := 1 - endif - ifeq ($(LIBC_SUPPORT),1) - msg := $(warning No libelf found, disables 'probe' tool, please install elfutils-libelf-devel/libelf-dev); + ifeq ($(feature-libelf), 0) + ifeq ($(feature-glibc), 1) + LIBC_SUPPORT := 1 + endif + ifeq ($(BIONIC),1) + LIBC_SUPPORT := 1 + endif + ifeq ($(LIBC_SUPPORT),1) + msg := $(warning No libelf found, disables 'probe' tool, please install elfutils-libelf-devel/libelf-dev); - NO_LIBELF := 1 - NO_DWARF := 1 - NO_DEMANGLE := 1 + NO_LIBELF := 1 + NO_DWARF := 1 + NO_DEMANGLE := 1 + else + msg := $(error No gnu/libc-version.h found, please install glibc-dev[el]/glibc-static); + endif else - msg := $(error No gnu/libc-version.h found, please install glibc-dev[el]/glibc-static); - endif -else - # for linking with debug library, run like: - # make DEBUG=1 LIBDW_DIR=/opt/libdw/ - ifdef LIBDW_DIR - LIBDW_CFLAGS := -I$(LIBDW_DIR)/include - LIBDW_LDFLAGS := -L$(LIBDW_DIR)/lib - endif + # for linking with debug library, run like: + # make DEBUG=1 LIBDW_DIR=/opt/libdw/ + ifdef LIBDW_DIR + LIBDW_CFLAGS := -I$(LIBDW_DIR)/include + LIBDW_LDFLAGS := -L$(LIBDW_DIR)/lib + endif - FLAGS_DWARF=$(CFLAGS) $(LIBDW_CFLAGS) -ldw -lz -lelf $(LIBDW_LDFLAGS) $(LDFLAGS) $(EXTLIBS) - ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF),libdw),y) - msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev); - NO_DWARF := 1 - endif # Dwarf support -endif # SOURCE_LIBELF + ifneq ($(feature-dwarf), 1) + msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev); + NO_DWARF := 1 + endif # Dwarf support + endif # libelf support endif # NO_LIBELF ifndef NO_LIBELF -CFLAGS += -DLIBELF_SUPPORT -FLAGS_LIBELF=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -ifeq ($(call try-cc,$(SOURCE_ELF_MMAP),$(FLAGS_LIBELF),-DLIBELF_MMAP),y) - CFLAGS += -DLIBELF_MMAP -endif -ifeq ($(call try-cc,$(SOURCE_ELF_GETPHDRNUM),$(FLAGS_LIBELF),-DHAVE_ELF_GETPHDRNUM),y) - CFLAGS += -DHAVE_ELF_GETPHDRNUM -endif + CFLAGS += -DHAVE_LIBELF_SUPPORT -# include ARCH specific config --include $(src-perf)/arch/$(ARCH)/Makefile + ifeq ($(feature-libelf-mmap), 1) + CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT + endif -ifndef NO_DWARF -ifeq ($(origin PERF_HAVE_DWARF_REGS), undefined) - msg := $(warning DWARF register mappings have not been defined for architecture $(ARCH), DWARF support disabled); - NO_DWARF := 1 -else - CFLAGS += -DDWARF_SUPPORT $(LIBDW_CFLAGS) - LDFLAGS += $(LIBDW_LDFLAGS) - EXTLIBS += -lelf -ldw -endif # PERF_HAVE_DWARF_REGS -endif # NO_DWARF + ifeq ($(feature-libelf-getphdrnum), 1) + CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT + endif -endif # NO_LIBELF + # include ARCH specific config + -include $(src-perf)/arch/$(ARCH)/Makefile -ifndef NO_LIBELF -CFLAGS += -DLIBELF_SUPPORT -FLAGS_LIBELF=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -ifeq ($(call try-cc,$(SOURCE_ELF_MMAP),$(FLAGS_LIBELF),-DLIBELF_MMAP),y) - CFLAGS += -DLIBELF_MMAP -endif # try-cc + ifndef NO_DWARF + ifeq ($(origin PERF_HAVE_DWARF_REGS), undefined) + msg := $(warning DWARF register mappings have not been defined for architecture $(ARCH), DWARF support disabled); + NO_DWARF := 1 + else + CFLAGS += -DHAVE_DWARF_SUPPORT $(LIBDW_CFLAGS) + LDFLAGS += $(LIBDW_LDFLAGS) + EXTLIBS += -lelf -ldw + endif # PERF_HAVE_DWARF_REGS + endif # NO_DWARF endif # NO_LIBELF # There's only x86 (both 32 and 64) support for CFI unwind so far @@ -214,34 +305,35 @@ ifneq ($(ARCH),x86) endif ifndef NO_LIBUNWIND -# for linking with debug library, run like: -# make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/ -ifdef LIBUNWIND_DIR - LIBUNWIND_CFLAGS := -I$(LIBUNWIND_DIR)/include - LIBUNWIND_LDFLAGS := -L$(LIBUNWIND_DIR)/lib -endif + # + # For linking with debug library, run like: + # + # make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/ + # + ifdef LIBUNWIND_DIR + LIBUNWIND_CFLAGS := -I$(LIBUNWIND_DIR)/include + LIBUNWIND_LDFLAGS := -L$(LIBUNWIND_DIR)/lib + endif -FLAGS_UNWIND=$(LIBUNWIND_CFLAGS) $(CFLAGS) $(LIBUNWIND_LDFLAGS) $(LDFLAGS) $(EXTLIBS) $(LIBUNWIND_LIBS) -ifneq ($(call try-cc,$(SOURCE_LIBUNWIND),$(FLAGS_UNWIND),libunwind),y) - msg := $(warning No libunwind found, disabling post unwind support. Please install libunwind-dev[el] >= 0.99); - NO_LIBUNWIND := 1 -endif # Libunwind support -endif # NO_LIBUNWIND + ifneq ($(feature-libunwind), 1) + msg := $(warning No libunwind found, disabling post unwind support. Please install libunwind-dev[el] >= 0.99); + NO_LIBUNWIND := 1 + endif +endif ifndef NO_LIBUNWIND - CFLAGS += -DLIBUNWIND_SUPPORT + CFLAGS += -DHAVE_LIBUNWIND_SUPPORT EXTLIBS += $(LIBUNWIND_LIBS) CFLAGS += $(LIBUNWIND_CFLAGS) LDFLAGS += $(LIBUNWIND_LDFLAGS) -endif # NO_LIBUNWIND +endif ifndef NO_LIBAUDIT - FLAGS_LIBAUDIT = $(CFLAGS) $(LDFLAGS) -laudit - ifneq ($(call try-cc,$(SOURCE_LIBAUDIT),$(FLAGS_LIBAUDIT),libaudit),y) + ifneq ($(feature-libaudit), 1) msg := $(warning No libaudit.h found, disables 'trace' tool, please install audit-libs-devel or libaudit-dev); NO_LIBAUDIT := 1 else - CFLAGS += -DLIBAUDIT_SUPPORT + CFLAGS += -DHAVE_LIBAUDIT_SUPPORT EXTLIBS += -laudit endif endif @@ -251,30 +343,30 @@ ifdef NO_NEWT endif ifndef NO_SLANG - FLAGS_SLANG=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -I/usr/include/slang -lslang - ifneq ($(call try-cc,$(SOURCE_SLANG),$(FLAGS_SLANG),libslang),y) + ifneq ($(feature-libslang), 1) msg := $(warning slang not found, disables TUI support. Please install slang-devel or libslang-dev); NO_SLANG := 1 else # Fedora has /usr/include/slang/slang.h, but ubuntu /usr/include/slang.h CFLAGS += -I/usr/include/slang - CFLAGS += -DSLANG_SUPPORT + CFLAGS += -DHAVE_SLANG_SUPPORT EXTLIBS += -lslang endif endif ifndef NO_GTK2 FLAGS_GTK2=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) $(shell pkg-config --libs --cflags gtk+-2.0 2>/dev/null) - ifneq ($(call try-cc,$(SOURCE_GTK2),$(FLAGS_GTK2),gtk2),y) + ifneq ($(feature-gtk2), 1) msg := $(warning GTK2 not found, disables GTK2 support. Please install gtk2-devel or libgtk2.0-dev); NO_GTK2 := 1 else - ifeq ($(call try-cc,$(SOURCE_GTK2_INFOBAR),$(FLAGS_GTK2),-DHAVE_GTK_INFO_BAR),y) - CFLAGS += -DHAVE_GTK_INFO_BAR + ifeq ($(feature-gtk2-infobar), 1) + GTK_CFLAGS := -DHAVE_GTK_INFO_BAR_SUPPORT endif - CFLAGS += -DGTK2_SUPPORT - CFLAGS += $(shell pkg-config --cflags gtk+-2.0 2>/dev/null) - EXTLIBS += $(shell pkg-config --libs gtk+-2.0 2>/dev/null) + CFLAGS += -DHAVE_GTK2_SUPPORT + GTK_CFLAGS += $(shell pkg-config --cflags gtk+-2.0 2>/dev/null) + GTK_LIBS := $(shell pkg-config --libs gtk+-2.0 2>/dev/null) + EXTLIBS += -ldl endif endif @@ -290,7 +382,7 @@ else PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS) - ifneq ($(call try-cc,$(SOURCE_PERL_EMBED),$(FLAGS_PERL_EMBED),perl),y) + ifneq ($(feature-libperl), 1) CFLAGS += -DNO_LIBPERL NO_LIBPERL := 1 else @@ -335,11 +427,11 @@ else PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null) FLAGS_PYTHON_EMBED := $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) - ifneq ($(call try-cc,$(SOURCE_PYTHON_EMBED),$(FLAGS_PYTHON_EMBED),python),y) + ifneq ($(feature-libpython), 1) $(call disable-python,Python.h (for Python 2.x)) else - ifneq ($(call try-cc,$(SOURCE_PYTHON_VERSION),$(FLAGS_PYTHON_EMBED),python version),y) + ifneq ($(feature-libpython-version), 1) $(warning Python 3 is not yet supported; please set) $(warning PYTHON and/or PYTHON_CONFIG appropriately.) $(warning If you also have Python 2 installed, then) @@ -362,33 +454,30 @@ else endif endif +ifeq ($(feature-libbfd), 1) + EXTLIBS += -lbfd +endif + ifdef NO_DEMANGLE CFLAGS += -DNO_DEMANGLE else - ifdef HAVE_CPLUS_DEMANGLE + ifdef HAVE_CPLUS_DEMANGLE_SUPPORT EXTLIBS += -liberty - CFLAGS += -DHAVE_CPLUS_DEMANGLE + CFLAGS += -DHAVE_CPLUS_DEMANGLE_SUPPORT else - FLAGS_BFD=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -DPACKAGE='perf' -lbfd - has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD),libbfd) - ifeq ($(has_bfd),y) - EXTLIBS += -lbfd - else - FLAGS_BFD_IBERTY=$(FLAGS_BFD) -liberty - has_bfd_iberty := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD_IBERTY),liberty) - ifeq ($(has_bfd_iberty),y) + ifneq ($(feature-libbfd), 1) + $(feature_check,liberty) + ifeq ($(feature-liberty), 1) EXTLIBS += -lbfd -liberty else - FLAGS_BFD_IBERTY_Z=$(FLAGS_BFD_IBERTY) -lz - has_bfd_iberty_z := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD_IBERTY_Z),libz) - ifeq ($(has_bfd_iberty_z),y) + $(feature_check,liberty-z) + ifeq ($(feature-liberty-z), 1) EXTLIBS += -lbfd -liberty -lz else - FLAGS_CPLUS_DEMANGLE=$(CFLAGS) $(LDFLAGS) $(EXTLIBS) -liberty - has_cplus_demangle := $(call try-cc,$(SOURCE_CPLUS_DEMANGLE),$(FLAGS_CPLUS_DEMANGLE),demangle) - ifeq ($(has_cplus_demangle),y) + $(feature_check,cplus-demangle) + ifeq ($(feature-cplus-demangle), 1) EXTLIBS += -liberty - CFLAGS += -DHAVE_CPLUS_DEMANGLE + CFLAGS += -DHAVE_CPLUS_DEMANGLE_SUPPORT else msg := $(warning No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling) CFLAGS += -DNO_DEMANGLE @@ -399,31 +488,28 @@ else endif endif -ifndef NO_STRLCPY - ifeq ($(call try-cc,$(SOURCE_STRLCPY),,-DHAVE_STRLCPY),y) - CFLAGS += -DHAVE_STRLCPY - endif +ifneq ($(filter -lbfd,$(EXTLIBS)),) + CFLAGS += -DHAVE_LIBBFD_SUPPORT endif ifndef NO_ON_EXIT - ifeq ($(call try-cc,$(SOURCE_ON_EXIT),,-DHAVE_ON_EXIT),y) - CFLAGS += -DHAVE_ON_EXIT + ifeq ($(feature-on-exit), 1) + CFLAGS += -DHAVE_ON_EXIT_SUPPORT endif endif ifndef NO_BACKTRACE - ifeq ($(call try-cc,$(SOURCE_BACKTRACE),,-DBACKTRACE_SUPPORT),y) - CFLAGS += -DBACKTRACE_SUPPORT + ifeq ($(feature-backtrace), 1) + CFLAGS += -DHAVE_BACKTRACE_SUPPORT endif endif ifndef NO_LIBNUMA - FLAGS_LIBNUMA = $(CFLAGS) $(LDFLAGS) -lnuma - ifneq ($(call try-cc,$(SOURCE_LIBNUMA),$(FLAGS_LIBNUMA),libnuma),y) + ifeq ($(feature-libnuma), 0) msg := $(warning No numa.h found, disables 'perf bench numa mem' benchmark, please install numa-libs-devel or libnuma-dev); NO_LIBNUMA := 1 else - CFLAGS += -DLIBNUMA_SUPPORT + CFLAGS += -DHAVE_LIBNUMA_SUPPORT EXTLIBS += -lnuma endif endif @@ -459,7 +545,12 @@ else sysconfdir = $(prefix)/etc ETC_PERFCONFIG = etc/perfconfig endif +ifeq ($(IS_X86_64),1) +lib = lib64 +else lib = lib +endif +libdir = $(prefix)/$(lib) # Shell quote (do not use $(call) to accommodate ancient setups); ETC_PERFCONFIG_SQ = $(subst ','\'',$(ETC_PERFCONFIG)) @@ -472,6 +563,7 @@ template_dir_SQ = $(subst ','\'',$(template_dir)) htmldir_SQ = $(subst ','\'',$(htmldir)) prefix_SQ = $(subst ','\'',$(prefix)) sysconfdir_SQ = $(subst ','\'',$(sysconfdir)) +libdir_SQ = $(subst ','\'',$(libdir)) ifneq ($(filter /%,$(firstword $(perfexecdir))),) perfexec_instdir = $(perfexecdir) diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile new file mode 100644 index 000000000000..452b67cc4d7b --- /dev/null +++ b/tools/perf/config/feature-checks/Makefile @@ -0,0 +1,144 @@ + +FILES= \ + test-all \ + test-backtrace \ + test-bionic \ + test-dwarf \ + test-fortify-source \ + test-glibc \ + test-gtk2 \ + test-gtk2-infobar \ + test-hello \ + test-libaudit \ + test-libbfd \ + test-liberty \ + test-liberty-z \ + test-cplus-demangle \ + test-libelf \ + test-libelf-getphdrnum \ + test-libelf-mmap \ + test-libnuma \ + test-libperl \ + test-libpython \ + test-libpython-version \ + test-libslang \ + test-libunwind \ + test-on-exit \ + test-stackprotector-all \ + test-stackprotector + +CC := $(CC) -MD + +all: $(FILES) + +BUILD = $(CC) $(LDFLAGS) -o $(OUTPUT)$@ $@.c + +############################### + +test-all: + $(BUILD) -Werror -fstack-protector -fstack-protector-all -O2 -Werror -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lunwind -lunwind-x86_64 -lelf -laudit -I/usr/include/slang -lslang $(shell pkg-config --libs --cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -DPACKAGE='"perf"' -lbfd -ldl + +test-hello: + $(BUILD) + +test-stackprotector-all: + $(BUILD) -Werror -fstack-protector-all + +test-stackprotector: + $(BUILD) -Werror -fstack-protector -Wstack-protector + +test-fortify-source: + $(BUILD) -O2 -Werror -D_FORTIFY_SOURCE=2 + +test-bionic: + $(BUILD) + +test-libelf: + $(BUILD) -lelf + +test-glibc: + $(BUILD) + +test-dwarf: + $(BUILD) -ldw + +test-libelf-mmap: + $(BUILD) -lelf + +test-libelf-getphdrnum: + $(BUILD) -lelf + +test-libnuma: + $(BUILD) -lnuma + +test-libunwind: + $(BUILD) -lunwind -lunwind-x86_64 -lelf + +test-libaudit: + $(BUILD) -laudit + +test-libslang: + $(BUILD) -I/usr/include/slang -lslang + +test-gtk2: + $(BUILD) $(shell pkg-config --libs --cflags gtk+-2.0 2>/dev/null) + +test-gtk2-infobar: + $(BUILD) $(shell pkg-config --libs --cflags gtk+-2.0 2>/dev/null) + +grep-libs = $(filter -l%,$(1)) +strip-libs = $(filter-out -l%,$(1)) + +PERL_EMBED_LDOPTS = $(shell perl -MExtUtils::Embed -e ldopts 2>/dev/null) +PERL_EMBED_LDFLAGS = $(call strip-libs,$(PERL_EMBED_LDOPTS)) +PERL_EMBED_LIBADD = $(call grep-libs,$(PERL_EMBED_LDOPTS)) +PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` +FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS) + +test-libperl: + $(BUILD) $(FLAGS_PERL_EMBED) + +override PYTHON := python +override PYTHON_CONFIG := python-config + +escape-for-shell-sq = $(subst ','\'',$(1)) +shell-sq = '$(escape-for-shell-sq)' + +PYTHON_CONFIG_SQ = $(call shell-sq,$(PYTHON_CONFIG)) + +PYTHON_EMBED_LDOPTS = $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null) +PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS)) +PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) +PYTHON_EMBED_CCOPTS = $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null) +FLAGS_PYTHON_EMBED = $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) + +test-libpython: + $(BUILD) $(FLAGS_PYTHON_EMBED) + +test-libpython-version: + $(BUILD) $(FLAGS_PYTHON_EMBED) + +test-libbfd: + $(BUILD) -DPACKAGE='"perf"' -lbfd -ldl + +test-liberty: + $(CC) -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty + +test-liberty-z: + $(CC) -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty -lz + +test-cplus-demangle: + $(BUILD) -liberty + +test-on-exit: + $(BUILD) + +test-backtrace: + $(BUILD) + +-include *.d + +############################### + +clean: + rm -f $(FILES) *.d diff --git a/tools/perf/config/feature-checks/test-all.c b/tools/perf/config/feature-checks/test-all.c new file mode 100644 index 000000000000..50d431892a0c --- /dev/null +++ b/tools/perf/config/feature-checks/test-all.c @@ -0,0 +1,106 @@ +/* + * test-all.c: Try to build all the main testcases at once. + * + * A well-configured system will have all the prereqs installed, so we can speed + * up auto-detection on such systems. + */ + +/* + * Quirk: Python and Perl headers cannot be in arbitrary places, so keep + * these 3 testcases at the top: + */ +#define main main_test_libpython +# include "test-libpython.c" +#undef main + +#define main main_test_libpython_version +# include "test-libpython-version.c" +#undef main + +#define main main_test_libperl +# include "test-libperl.c" +#undef main + +#define main main_test_hello +# include "test-hello.c" +#undef main + +#define main main_test_libelf +# include "test-libelf.c" +#undef main + +#define main main_test_libelf_mmap +# include "test-libelf-mmap.c" +#undef main + +#define main main_test_glibc +# include "test-glibc.c" +#undef main + +#define main main_test_dwarf +# include "test-dwarf.c" +#undef main + +#define main main_test_libelf_getphdrnum +# include "test-libelf-getphdrnum.c" +#undef main + +#define main main_test_libunwind +# include "test-libunwind.c" +#undef main + +#define main main_test_libaudit +# include "test-libaudit.c" +#undef main + +#define main main_test_libslang +# include "test-libslang.c" +#undef main + +#define main main_test_gtk2 +# include "test-gtk2.c" +#undef main + +#define main main_test_gtk2_infobar +# include "test-gtk2-infobar.c" +#undef main + +#define main main_test_libbfd +# include "test-libbfd.c" +#undef main + +#define main main_test_on_exit +# include "test-on-exit.c" +#undef main + +#define main main_test_backtrace +# include "test-backtrace.c" +#undef main + +#define main main_test_libnuma +# include "test-libnuma.c" +#undef main + +int main(int argc, char *argv[]) +{ + main_test_libpython(); + main_test_libpython_version(); + main_test_libperl(); + main_test_hello(); + main_test_libelf(); + main_test_libelf_mmap(); + main_test_glibc(); + main_test_dwarf(); + main_test_libelf_getphdrnum(); + main_test_libunwind(); + main_test_libaudit(); + main_test_libslang(); + main_test_gtk2(argc, argv); + main_test_gtk2_infobar(argc, argv); + main_test_libbfd(); + main_test_on_exit(); + main_test_backtrace(); + main_test_libnuma(); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-backtrace.c b/tools/perf/config/feature-checks/test-backtrace.c new file mode 100644 index 000000000000..7124aa1dc8fb --- /dev/null +++ b/tools/perf/config/feature-checks/test-backtrace.c @@ -0,0 +1,13 @@ +#include +#include + +int main(void) +{ + void *backtrace_fns[10]; + size_t entries; + + entries = backtrace(backtrace_fns, 10); + backtrace_symbols_fd(backtrace_fns, entries, 1); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-bionic.c b/tools/perf/config/feature-checks/test-bionic.c new file mode 100644 index 000000000000..eac24e9513eb --- /dev/null +++ b/tools/perf/config/feature-checks/test-bionic.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return __ANDROID_API__; +} diff --git a/tools/perf/config/feature-checks/test-cplus-demangle.c b/tools/perf/config/feature-checks/test-cplus-demangle.c new file mode 100644 index 000000000000..610c686e0009 --- /dev/null +++ b/tools/perf/config/feature-checks/test-cplus-demangle.c @@ -0,0 +1,14 @@ +extern int printf(const char *format, ...); +extern char *cplus_demangle(const char *, int); + +int main(void) +{ + char symbol[4096] = "FieldName__9ClassNameFd"; + char *tmp; + + tmp = cplus_demangle(symbol, 0); + + printf("demangled symbol: {%s}\n", tmp); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-dwarf.c b/tools/perf/config/feature-checks/test-dwarf.c new file mode 100644 index 000000000000..3fc1801ce4a9 --- /dev/null +++ b/tools/perf/config/feature-checks/test-dwarf.c @@ -0,0 +1,10 @@ +#include +#include +#include + +int main(void) +{ + Dwarf *dbg = dwarf_begin(0, DWARF_C_READ); + + return (long)dbg; +} diff --git a/tools/perf/config/feature-checks/test-fortify-source.c b/tools/perf/config/feature-checks/test-fortify-source.c new file mode 100644 index 000000000000..c9f398d87868 --- /dev/null +++ b/tools/perf/config/feature-checks/test-fortify-source.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return puts("hi"); +} diff --git a/tools/perf/config/feature-checks/test-glibc.c b/tools/perf/config/feature-checks/test-glibc.c new file mode 100644 index 000000000000..b0820345cd98 --- /dev/null +++ b/tools/perf/config/feature-checks/test-glibc.c @@ -0,0 +1,8 @@ +#include + +int main(void) +{ + const char *version = gnu_get_libc_version(); + + return (long)version; +} diff --git a/tools/perf/config/feature-checks/test-gtk2-infobar.c b/tools/perf/config/feature-checks/test-gtk2-infobar.c new file mode 100644 index 000000000000..397b4646d066 --- /dev/null +++ b/tools/perf/config/feature-checks/test-gtk2-infobar.c @@ -0,0 +1,11 @@ +#pragma GCC diagnostic ignored "-Wstrict-prototypes" +#include +#pragma GCC diagnostic error "-Wstrict-prototypes" + +int main(int argc, char *argv[]) +{ + gtk_init(&argc, &argv); + gtk_info_bar_new(); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-gtk2.c b/tools/perf/config/feature-checks/test-gtk2.c new file mode 100644 index 000000000000..6bd80e509439 --- /dev/null +++ b/tools/perf/config/feature-checks/test-gtk2.c @@ -0,0 +1,10 @@ +#pragma GCC diagnostic ignored "-Wstrict-prototypes" +#include +#pragma GCC diagnostic error "-Wstrict-prototypes" + +int main(int argc, char *argv[]) +{ + gtk_init(&argc, &argv); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-hello.c b/tools/perf/config/feature-checks/test-hello.c new file mode 100644 index 000000000000..c9f398d87868 --- /dev/null +++ b/tools/perf/config/feature-checks/test-hello.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return puts("hi"); +} diff --git a/tools/perf/config/feature-checks/test-libaudit.c b/tools/perf/config/feature-checks/test-libaudit.c new file mode 100644 index 000000000000..afc019f08641 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libaudit.c @@ -0,0 +1,10 @@ +#include + +extern int printf(const char *format, ...); + +int main(void) +{ + printf("error message: %s\n", audit_errno_to_name(0)); + + return audit_open(); +} diff --git a/tools/perf/config/feature-checks/test-libbfd.c b/tools/perf/config/feature-checks/test-libbfd.c new file mode 100644 index 000000000000..24059907e990 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libbfd.c @@ -0,0 +1,15 @@ +#include + +extern int printf(const char *format, ...); + +int main(void) +{ + char symbol[4096] = "FieldName__9ClassNameFd"; + char *tmp; + + tmp = bfd_demangle(0, symbol, 0); + + printf("demangled symbol: {%s}\n", tmp); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-libelf-getphdrnum.c b/tools/perf/config/feature-checks/test-libelf-getphdrnum.c new file mode 100644 index 000000000000..d710459306c3 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libelf-getphdrnum.c @@ -0,0 +1,8 @@ +#include + +int main(void) +{ + size_t dst; + + return elf_getphdrnum(0, &dst); +} diff --git a/tools/perf/config/feature-checks/test-libelf-mmap.c b/tools/perf/config/feature-checks/test-libelf-mmap.c new file mode 100644 index 000000000000..564427d7ef18 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libelf-mmap.c @@ -0,0 +1,8 @@ +#include + +int main(void) +{ + Elf *elf = elf_begin(0, ELF_C_READ_MMAP, 0); + + return (long)elf; +} diff --git a/tools/perf/config/feature-checks/test-libelf.c b/tools/perf/config/feature-checks/test-libelf.c new file mode 100644 index 000000000000..08db322d8957 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libelf.c @@ -0,0 +1,8 @@ +#include + +int main(void) +{ + Elf *elf = elf_begin(0, ELF_C_READ, 0); + + return (long)elf; +} diff --git a/tools/perf/config/feature-checks/test-libnuma.c b/tools/perf/config/feature-checks/test-libnuma.c new file mode 100644 index 000000000000..4763d9cd587d --- /dev/null +++ b/tools/perf/config/feature-checks/test-libnuma.c @@ -0,0 +1,9 @@ +#include +#include + +int main(void) +{ + numa_available(); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-libperl.c b/tools/perf/config/feature-checks/test-libperl.c new file mode 100644 index 000000000000..8871f6a0fdb4 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libperl.c @@ -0,0 +1,9 @@ +#include +#include + +int main(void) +{ + perl_alloc(); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-libpython-version.c b/tools/perf/config/feature-checks/test-libpython-version.c new file mode 100644 index 000000000000..facea122d812 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libpython-version.c @@ -0,0 +1,10 @@ +#include + +#if PY_VERSION_HEX >= 0x03000000 + #error +#endif + +int main(void) +{ + return 0; +} diff --git a/tools/perf/config/feature-checks/test-libpython.c b/tools/perf/config/feature-checks/test-libpython.c new file mode 100644 index 000000000000..b24b28ad6324 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libpython.c @@ -0,0 +1,8 @@ +#include + +int main(void) +{ + Py_Initialize(); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-libslang.c b/tools/perf/config/feature-checks/test-libslang.c new file mode 100644 index 000000000000..22ff22ed94d1 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libslang.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return SLsmg_init_smg(); +} diff --git a/tools/perf/config/feature-checks/test-libunwind.c b/tools/perf/config/feature-checks/test-libunwind.c new file mode 100644 index 000000000000..43b9369bcab7 --- /dev/null +++ b/tools/perf/config/feature-checks/test-libunwind.c @@ -0,0 +1,27 @@ +#include +#include + +extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as, + unw_word_t ip, + unw_dyn_info_t *di, + unw_proc_info_t *pi, + int need_unwind_info, void *arg); + + +#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table) + +static unw_accessors_t accessors; + +int main(void) +{ + unw_addr_space_t addr_space; + + addr_space = unw_create_addr_space(&accessors, 0); + if (addr_space) + return 0; + + unw_init_remote(NULL, addr_space, NULL); + dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL); + + return 0; +} diff --git a/tools/perf/config/feature-checks/test-on-exit.c b/tools/perf/config/feature-checks/test-on-exit.c new file mode 100644 index 000000000000..8e88b16e6ded --- /dev/null +++ b/tools/perf/config/feature-checks/test-on-exit.c @@ -0,0 +1,16 @@ +#include +#include + +static void exit_fn(int status, void *__data) +{ + printf("exit status: %d, data: %d\n", status, *(int *)__data); +} + +static int data = 123; + +int main(void) +{ + on_exit(exit_fn, &data); + + return 321; +} diff --git a/tools/perf/config/feature-checks/test-stackprotector-all.c b/tools/perf/config/feature-checks/test-stackprotector-all.c new file mode 100644 index 000000000000..c9f398d87868 --- /dev/null +++ b/tools/perf/config/feature-checks/test-stackprotector-all.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return puts("hi"); +} diff --git a/tools/perf/config/feature-checks/test-stackprotector.c b/tools/perf/config/feature-checks/test-stackprotector.c new file mode 100644 index 000000000000..c9f398d87868 --- /dev/null +++ b/tools/perf/config/feature-checks/test-stackprotector.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return puts("hi"); +} diff --git a/tools/perf/config/feature-checks/test-volatile-register-var.c b/tools/perf/config/feature-checks/test-volatile-register-var.c new file mode 100644 index 000000000000..c9f398d87868 --- /dev/null +++ b/tools/perf/config/feature-checks/test-volatile-register-var.c @@ -0,0 +1,6 @@ +#include + +int main(void) +{ + return puts("hi"); +} diff --git a/tools/perf/config/feature-tests.mak b/tools/perf/config/feature-tests.mak deleted file mode 100644 index d5a8dd44945f..000000000000 --- a/tools/perf/config/feature-tests.mak +++ /dev/null @@ -1,246 +0,0 @@ -define SOURCE_HELLO -#include -int main(void) -{ - return puts(\"hi\"); -} -endef - -ifndef NO_DWARF -define SOURCE_DWARF -#include -#include -#include -#ifndef _ELFUTILS_PREREQ -#error -#endif - -int main(void) -{ - Dwarf *dbg = dwarf_begin(0, DWARF_C_READ); - return (long)dbg; -} -endef -endif - -define SOURCE_LIBELF -#include - -int main(void) -{ - Elf *elf = elf_begin(0, ELF_C_READ, 0); - return (long)elf; -} -endef - -define SOURCE_GLIBC -#include - -int main(void) -{ - const char *version = gnu_get_libc_version(); - return (long)version; -} -endef - -define SOURCE_BIONIC -#include - -int main(void) -{ - return __ANDROID_API__; -} -endef - -define SOURCE_ELF_MMAP -#include -int main(void) -{ - Elf *elf = elf_begin(0, ELF_C_READ_MMAP, 0); - return (long)elf; -} -endef - -define SOURCE_ELF_GETPHDRNUM -#include -int main(void) -{ - size_t dst; - return elf_getphdrnum(0, &dst); -} -endef - -ifndef NO_SLANG -define SOURCE_SLANG -#include - -int main(void) -{ - return SLsmg_init_smg(); -} -endef -endif - -ifndef NO_GTK2 -define SOURCE_GTK2 -#pragma GCC diagnostic ignored \"-Wstrict-prototypes\" -#include -#pragma GCC diagnostic error \"-Wstrict-prototypes\" - -int main(int argc, char *argv[]) -{ - gtk_init(&argc, &argv); - - return 0; -} -endef - -define SOURCE_GTK2_INFOBAR -#pragma GCC diagnostic ignored \"-Wstrict-prototypes\" -#include -#pragma GCC diagnostic error \"-Wstrict-prototypes\" - -int main(void) -{ - gtk_info_bar_new(); - - return 0; -} -endef -endif - -ifndef NO_LIBPERL -define SOURCE_PERL_EMBED -#include -#include - -int main(void) -{ -perl_alloc(); -return 0; -} -endef -endif - -ifndef NO_LIBPYTHON -define SOURCE_PYTHON_VERSION -#include -#if PY_VERSION_HEX >= 0x03000000 - #error -#endif -int main(void) -{ - return 0; -} -endef -define SOURCE_PYTHON_EMBED -#include -int main(void) -{ - Py_Initialize(); - return 0; -} -endef -endif - -define SOURCE_BFD -#include - -int main(void) -{ - bfd_demangle(0, 0, 0); - return 0; -} -endef - -define SOURCE_CPLUS_DEMANGLE -extern char *cplus_demangle(const char *, int); - -int main(void) -{ - cplus_demangle(0, 0); - return 0; -} -endef - -define SOURCE_STRLCPY -#include -extern size_t strlcpy(char *dest, const char *src, size_t size); - -int main(void) -{ - strlcpy(NULL, NULL, 0); - return 0; -} -endef - -ifndef NO_LIBUNWIND -define SOURCE_LIBUNWIND -#include -#include - -extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as, - unw_word_t ip, - unw_dyn_info_t *di, - unw_proc_info_t *pi, - int need_unwind_info, void *arg); - - -#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table) - -int main(void) -{ - unw_addr_space_t addr_space; - addr_space = unw_create_addr_space(NULL, 0); - unw_init_remote(NULL, addr_space, NULL); - dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL); - return 0; -} -endef -endif - -ifndef NO_BACKTRACE -define SOURCE_BACKTRACE -#include -#include - -int main(void) -{ - backtrace(NULL, 0); - backtrace_symbols(NULL, 0); - return 0; -} -endef -endif - -ifndef NO_LIBAUDIT -define SOURCE_LIBAUDIT -#include - -int main(void) -{ - printf(\"error message: %s\n\", audit_errno_to_name(0)); - return audit_open(); -} -endef -endif - -define SOURCE_ON_EXIT -#include - -int main(void) -{ - return on_exit(NULL, NULL); -} -endef - -define SOURCE_LIBNUMA -#include -#include - -int main(void) -{ - numa_available(); - return 0; -} -endef diff --git a/tools/perf/config/utilities.mak b/tools/perf/config/utilities.mak index 94d2d4f9c35d..f168debc5be2 100644 --- a/tools/perf/config/utilities.mak +++ b/tools/perf/config/utilities.mak @@ -179,16 +179,9 @@ _ge_attempt = $(if $(get-executable),$(get-executable),$(_gea_warn)$(call _gea_e _gea_warn = $(warning The path '$(1)' is not executable.) _gea_err = $(if $(1),$(error Please set '$(1)' appropriately)) -# try-cc -# Usage: option = $(call try-cc, source-to-build, cc-options, msg) -ifneq ($(V),1) -TRY_CC_OUTPUT= > /dev/null 2>&1 +ifneq ($(findstring $(MAKEFLAGS),s),s) + ifneq ($(V),1) + QUIET_CLEAN = @printf ' CLEAN %s\n' $1; + QUIET_INSTALL = @printf ' INSTALL %s\n' $1; + endif endif -TRY_CC_MSG=echo " CHK $(3)" 1>&2; - -try-cc = $(shell sh -c \ - 'TMP="$(OUTPUT)$(TMPOUT).$$$$"; \ - $(TRY_CC_MSG) \ - echo "$(1)" | \ - $(CC) -x c - $(2) -o "$$TMP" $(TRY_CC_OUTPUT) && echo y; \ - rm -f "$$TMP"') diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 85e1aed95204..8b38b4e80ec2 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -49,14 +49,14 @@ static struct cmd_struct commands[] = { { "version", cmd_version, 0 }, { "script", cmd_script, 0 }, { "sched", cmd_sched, 0 }, -#ifdef LIBELF_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT { "probe", cmd_probe, 0 }, #endif { "kmem", cmd_kmem, 0 }, { "lock", cmd_lock, 0 }, { "kvm", cmd_kvm, 0 }, { "test", cmd_test, 0 }, -#ifdef LIBAUDIT_SUPPORT +#ifdef HAVE_LIBAUDIT_SUPPORT { "trace", cmd_trace, 0 }, #endif { "inject", cmd_inject, 0 }, @@ -456,6 +456,7 @@ int main(int argc, const char **argv) { const char *cmd; + /* The page_size is placed in util object. */ page_size = sysconf(_SC_PAGE_SIZE); cmd = perf_extract_argv0_path(argv[0]); @@ -480,7 +481,14 @@ int main(int argc, const char **argv) fprintf(stderr, "cannot handle %s internally", cmd); goto out; } - +#ifdef HAVE_LIBAUDIT_SUPPORT + if (!prefixcmp(cmd, "trace")) { + set_buildid_dir(); + setup_path(); + argv[0] = "trace"; + return cmd_trace(argc, argv, NULL); + } +#endif /* Look for flags.. */ argv++; argc--; diff --git a/tools/perf/perf.h b/tools/perf/perf.h index cf20187eee0a..f61c230beec4 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -182,7 +182,9 @@ struct ip_callchain { struct branch_flags { u64 mispred:1; u64 predicted:1; - u64 reserved:62; + u64 in_tx:1; + u64 abort:1; + u64 reserved:60; }; struct branch_entry { @@ -218,7 +220,6 @@ struct perf_record_opts { bool no_delay; bool no_inherit; bool no_samples; - bool pipe_output; bool raw_samples; bool sample_address; bool sample_weight; @@ -231,6 +232,7 @@ struct perf_record_opts { u64 default_interval; u64 user_interval; u16 stack_dump_size; + bool sample_transaction; }; #endif diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c index dffe0551acaa..9cc81a3eb9b4 100644 --- a/tools/perf/tests/dso-data.c +++ b/tools/perf/tests/dso-data.c @@ -35,6 +35,7 @@ static char *test_file(int size) if (size != write(fd, buf, size)) templ = NULL; + free(buf); close(fd); return templ; } diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c index 4228ffc0d968..025503a22ff7 100644 --- a/tools/perf/tests/hists_link.c +++ b/tools/perf/tests/hists_link.c @@ -222,7 +222,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) &sample) < 0) goto out; - he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, &al, NULL, + 1, 1, 0); if (he == NULL) goto out; @@ -244,7 +245,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) &sample) < 0) goto out; - he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1); + he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1, + 0); if (he == NULL) goto out; diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c index b8a7056519ac..82ac71550091 100644 --- a/tools/perf/tests/perf-record.c +++ b/tools/perf/tests/perf-record.c @@ -45,7 +45,7 @@ int test__PERF_RECORD(void) }; cpu_set_t cpu_mask; size_t cpu_mask_size = sizeof(cpu_mask); - struct perf_evlist *evlist = perf_evlist__new(); + struct perf_evlist *evlist = perf_evlist__new_default(); struct perf_evsel *evsel; struct perf_sample sample; const char *cmd = "sleep"; @@ -65,16 +65,6 @@ int test__PERF_RECORD(void) goto out; } - /* - * We need at least one evsel in the evlist, use the default - * one: "cycles". - */ - err = perf_evlist__add_default(evlist); - if (err < 0) { - pr_debug("Not enough memory to create evsel\n"); - goto out_delete_evlist; - } - /* * Create maps of threads and cpus to monitor. In this case * we start with all threads and cpus (-1, -1) but then in diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c index 28fe5894b061..b07f8a14e15d 100644 --- a/tools/perf/tests/task-exit.c +++ b/tools/perf/tests/task-exit.c @@ -37,20 +37,11 @@ int test__task_exit(void) signal(SIGCHLD, sig_handler); signal(SIGUSR1, sig_handler); - evlist = perf_evlist__new(); + evlist = perf_evlist__new_default(); if (evlist == NULL) { - pr_debug("perf_evlist__new\n"); + pr_debug("perf_evlist__new_default\n"); return -1; } - /* - * We need at least one evsel in the evlist, use the default - * one: "cycles". - */ - err = perf_evlist__add_default(evlist); - if (err < 0) { - pr_debug("Not enough memory to create evsel\n"); - goto out_free_evlist; - } /* * Create maps of threads and cpus to monitor. In this case @@ -117,7 +108,6 @@ out_close_evlist: perf_evlist__close(evlist); out_delete_maps: perf_evlist__delete_maps(evlist); -out_free_evlist: perf_evlist__delete(evlist); return err; } diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 08545ae46992..f0697a3aede0 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -442,35 +442,37 @@ static bool annotate_browser__callq(struct annotate_browser *browser, { struct map_symbol *ms = browser->b.priv; struct disasm_line *dl = browser->selection; - struct symbol *sym = ms->sym; struct annotation *notes; - struct symbol *target; - u64 ip; + struct addr_map_symbol target = { + .map = ms->map, + .addr = map__objdump_2mem(ms->map, dl->ops.target.addr), + }; char title[SYM_TITLE_MAX_SIZE]; if (!ins__is_call(dl->ins)) return false; - ip = ms->map->map_ip(ms->map, dl->ops.target.addr); - target = map__find_symbol(ms->map, ip, NULL); - if (target == NULL) { + if (map_groups__find_ams(&target, NULL) || + map__rip_2objdump(target.map, target.map->map_ip(target.map, + target.addr)) != + dl->ops.target.addr) { ui_helpline__puts("The called function was not found."); return true; } - notes = symbol__annotation(target); + notes = symbol__annotation(target.sym); pthread_mutex_lock(¬es->lock); - if (notes->src == NULL && symbol__alloc_hist(target) < 0) { + if (notes->src == NULL && symbol__alloc_hist(target.sym) < 0) { pthread_mutex_unlock(¬es->lock); ui__warning("Not enough memory for annotating '%s' symbol!\n", - target->name); + target.sym->name); return true; } pthread_mutex_unlock(¬es->lock); - symbol__tui_annotate(target, ms->map, evsel, hbt); - sym_title(sym, ms->map, title, sizeof(title)); + symbol__tui_annotate(target.sym, target.map, evsel, hbt); + sym_title(ms->sym, ms->map, title, sizeof(title)); ui_browser__show_title(&browser->b, title); return true; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index f538794615db..9c7ff8d31b27 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -154,9 +154,9 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym, return 0; } -int symbol__gtk_annotate(struct symbol *sym, struct map *map, - struct perf_evsel *evsel, - struct hist_browser_timer *hbt) +static int symbol__gtk_annotate(struct symbol *sym, struct map *map, + struct perf_evsel *evsel, + struct hist_browser_timer *hbt) { GtkWidget *window; GtkWidget *notebook; @@ -226,6 +226,13 @@ int symbol__gtk_annotate(struct symbol *sym, struct map *map, return 0; } +int hist_entry__gtk_annotate(struct hist_entry *he, + struct perf_evsel *evsel, + struct hist_browser_timer *hbt) +{ + return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt); +} + void perf_gtk__show_annotations(void) { GtkWidget *window; diff --git a/tools/perf/ui/gtk/browser.c b/tools/perf/ui/gtk/browser.c index c95012cdb438..c24d91221290 100644 --- a/tools/perf/ui/gtk/browser.c +++ b/tools/perf/ui/gtk/browser.c @@ -43,7 +43,7 @@ const char *perf_gtk__get_percent_color(double percent) return NULL; } -#ifdef HAVE_GTK_INFO_BAR +#ifdef HAVE_GTK_INFO_BAR_SUPPORT GtkWidget *perf_gtk__setup_info_bar(void) { GtkWidget *info_bar; diff --git a/tools/perf/ui/gtk/gtk.h b/tools/perf/ui/gtk/gtk.h index 3d96785ef155..8576cf194872 100644 --- a/tools/perf/ui/gtk/gtk.h +++ b/tools/perf/ui/gtk/gtk.h @@ -12,7 +12,7 @@ struct perf_gtk_context { GtkWidget *main_window; GtkWidget *notebook; -#ifdef HAVE_GTK_INFO_BAR +#ifdef HAVE_GTK_INFO_BAR_SUPPORT GtkWidget *info_bar; GtkWidget *message_label; #endif @@ -20,6 +20,9 @@ struct perf_gtk_context { guint statbar_ctx_id; }; +int perf_gtk__init(void); +void perf_gtk__exit(bool wait_for_ok); + extern struct perf_gtk_context *pgctx; static inline bool perf_gtk__is_active_context(struct perf_gtk_context *ctx) @@ -39,7 +42,7 @@ void perf_gtk__resize_window(GtkWidget *window); const char *perf_gtk__get_percent_color(double percent); GtkWidget *perf_gtk__setup_statusbar(void); -#ifdef HAVE_GTK_INFO_BAR +#ifdef HAVE_GTK_INFO_BAR_SUPPORT GtkWidget *perf_gtk__setup_info_bar(void); #else static inline GtkWidget *perf_gtk__setup_info_bar(void) @@ -48,4 +51,17 @@ static inline GtkWidget *perf_gtk__setup_info_bar(void) } #endif +struct perf_evsel; +struct perf_evlist; +struct hist_entry; +struct hist_browser_timer; + +int perf_evlist__gtk_browse_hists(struct perf_evlist *evlist, const char *help, + struct hist_browser_timer *hbt, + float min_pcnt); +int hist_entry__gtk_annotate(struct hist_entry *he, + struct perf_evsel *evsel, + struct hist_browser_timer *hbt); +void perf_gtk__show_annotations(void); + #endif /* _PERF_GTK_H_ */ diff --git a/tools/perf/ui/gtk/util.c b/tools/perf/ui/gtk/util.c index c06942a41c78..696c1fbe4248 100644 --- a/tools/perf/ui/gtk/util.c +++ b/tools/perf/ui/gtk/util.c @@ -53,7 +53,7 @@ static int perf_gtk__error(const char *format, va_list args) return 0; } -#ifdef HAVE_GTK_INFO_BAR +#ifdef HAVE_GTK_INFO_BAR_SUPPORT static int perf_gtk__warning_info_bar(const char *format, va_list args) { char *msg; @@ -105,7 +105,7 @@ static int perf_gtk__warning_statusbar(const char *format, va_list args) struct perf_error_ops perf_gtk_eops = { .error = perf_gtk__error, -#ifdef HAVE_GTK_INFO_BAR +#ifdef HAVE_GTK_INFO_BAR_SUPPORT .warning = perf_gtk__warning_info_bar, #else .warning = perf_gtk__warning_statusbar, diff --git a/tools/perf/ui/setup.c b/tools/perf/ui/setup.c index 47d9a571f261..5df5140a9f29 100644 --- a/tools/perf/ui/setup.c +++ b/tools/perf/ui/setup.c @@ -1,10 +1,64 @@ #include +#include #include "../util/cache.h" #include "../util/debug.h" #include "../util/hist.h" pthread_mutex_t ui__lock = PTHREAD_MUTEX_INITIALIZER; +void *perf_gtk_handle; + +#ifdef HAVE_GTK2_SUPPORT +static int setup_gtk_browser(void) +{ + int (*perf_ui_init)(void); + + if (perf_gtk_handle) + return 0; + + perf_gtk_handle = dlopen(PERF_GTK_DSO, RTLD_LAZY); + if (perf_gtk_handle == NULL) { + char buf[PATH_MAX]; + scnprintf(buf, sizeof(buf), "%s/%s", LIBDIR, PERF_GTK_DSO); + perf_gtk_handle = dlopen(buf, RTLD_LAZY); + } + if (perf_gtk_handle == NULL) + return -1; + + perf_ui_init = dlsym(perf_gtk_handle, "perf_gtk__init"); + if (perf_ui_init == NULL) + goto out_close; + + if (perf_ui_init() == 0) + return 0; + +out_close: + dlclose(perf_gtk_handle); + return -1; +} + +static void exit_gtk_browser(bool wait_for_ok) +{ + void (*perf_ui_exit)(bool); + + if (perf_gtk_handle == NULL) + return; + + perf_ui_exit = dlsym(perf_gtk_handle, "perf_gtk__exit"); + if (perf_ui_exit == NULL) + goto out_close; + + perf_ui_exit(wait_for_ok); + +out_close: + dlclose(perf_gtk_handle); + + perf_gtk_handle = NULL; +} +#else +static inline int setup_gtk_browser(void) { return -1; } +static inline void exit_gtk_browser(bool wait_for_ok __maybe_unused) {} +#endif void setup_browser(bool fallback_to_pager) { @@ -17,8 +71,11 @@ void setup_browser(bool fallback_to_pager) switch (use_browser) { case 2: - if (perf_gtk__init() == 0) + if (setup_gtk_browser() == 0) break; + printf("GTK browser requested but could not find %s\n", + PERF_GTK_DSO); + sleep(1); /* fall through */ case 1: use_browser = 1; @@ -39,7 +96,7 @@ void exit_browser(bool wait_for_ok) { switch (use_browser) { case 2: - perf_gtk__exit(wait_for_ok); + exit_gtk_browser(wait_for_ok); break; case 1: diff --git a/tools/perf/ui/ui.h b/tools/perf/ui/ui.h index 70cb0d4eb8aa..ab88383f8be8 100644 --- a/tools/perf/ui/ui.h +++ b/tools/perf/ui/ui.h @@ -6,13 +6,14 @@ #include extern pthread_mutex_t ui__lock; +extern void *perf_gtk_handle; extern int use_browser; void setup_browser(bool fallback_to_pager); void exit_browser(bool wait_for_ok); -#ifdef SLANG_SUPPORT +#ifdef HAVE_SLANG_SUPPORT int ui__init(void); void ui__exit(bool wait_for_ok); #else @@ -23,17 +24,6 @@ static inline int ui__init(void) static inline void ui__exit(bool wait_for_ok __maybe_unused) {} #endif -#ifdef GTK2_SUPPORT -int perf_gtk__init(void); -void perf_gtk__exit(bool wait_for_ok); -#else -static inline int perf_gtk__init(void) -{ - return -1; -} -static inline void perf_gtk__exit(bool wait_for_ok __maybe_unused) {} -#endif - void ui__refresh_dimensions(bool force); #endif /* _PERF_UI_H_ */ diff --git a/tools/perf/util/PERF-VERSION-GEN b/tools/perf/util/PERF-VERSION-GEN index 15a77b7c0e36..ce7a804b2951 100755 --- a/tools/perf/util/PERF-VERSION-GEN +++ b/tools/perf/util/PERF-VERSION-GEN @@ -40,7 +40,7 @@ else VC=unset fi test "$VN" = "$VC" || { - echo >&2 "PERF_VERSION = $VN" + echo >&2 " PERF_VERSION = $VN" echo "#define PERF_VERSION \"$VN\"" >$GVF } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 7eae5488ecea..cf6242c92ee2 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -825,20 +825,16 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map, dl->ops.target.offset = dl->ops.target.addr - map__rip_2objdump(map, sym->start); - /* - * kcore has no symbols, so add the call target name if it is on the - * same map. - */ + /* kcore has no symbols, so add the call target name */ if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.name) { - struct symbol *s; - u64 ip = dl->ops.target.addr; - - if (ip >= map->start && ip <= map->end) { - ip = map->map_ip(map, ip); - s = map__find_symbol(map, ip, NULL); - if (s && s->start == ip) - dl->ops.target.name = strdup(s->name); - } + struct addr_map_symbol target = { + .map = map, + .addr = dl->ops.target.addr, + }; + + if (!map_groups__find_ams(&target, NULL) && + target.sym->start == target.al_addr) + dl->ops.target.name = strdup(target.sym->name); } disasm__add(¬es->src->source, dl); @@ -879,6 +875,8 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize) FILE *file; int err = 0; char symfs_filename[PATH_MAX]; + struct kcore_extract kce; + bool delete_extract = false; if (filename) { snprintf(symfs_filename, sizeof(symfs_filename), "%s%s", @@ -940,6 +938,23 @@ fallback: pr_debug("annotating [%p] %30s : [%p] %30s\n", dso, dso->long_name, sym, sym->name); + if (dso__is_kcore(dso)) { + kce.kcore_filename = symfs_filename; + kce.addr = map__rip_2objdump(map, sym->start); + kce.offs = sym->start; + kce.len = sym->end + 1 - sym->start; + if (!kcore_extract__create(&kce)) { + delete_extract = true; + strlcpy(symfs_filename, kce.extract_filename, + sizeof(symfs_filename)); + if (free_filename) { + free(filename); + free_filename = false; + } + filename = symfs_filename; + } + } + snprintf(command, sizeof(command), "%s %s%s --start-address=0x%016" PRIx64 " --stop-address=0x%016" PRIx64 @@ -972,6 +987,8 @@ fallback: pclose(file); out_free_filename: + if (delete_extract) + kcore_extract__delete(&kce); if (free_filename) free(filename); return err; @@ -1070,7 +1087,7 @@ static void symbol__free_source_line(struct symbol *sym, int len) (sizeof(src_line->p) * (src_line->nr_pcnt - 1)); for (i = 0; i < len; i++) { - free(src_line->path); + free_srcline(src_line->path); src_line = (void *)src_line + sizeof_src_line; } @@ -1081,13 +1098,11 @@ static void symbol__free_source_line(struct symbol *sym, int len) /* Get the filename:line for the colored entries */ static int symbol__get_source_line(struct symbol *sym, struct map *map, struct perf_evsel *evsel, - struct rb_root *root, int len, - const char *filename) + struct rb_root *root, int len) { u64 start; int i, k; int evidx = evsel->idx; - char cmd[PATH_MAX * 2]; struct source_line *src_line; struct annotation *notes = symbol__annotation(sym); struct sym_hist *h = annotation__histogram(notes, evidx); @@ -1115,10 +1130,7 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map, start = map__rip_2objdump(map, sym->start); for (i = 0; i < len; i++) { - char *path = NULL; - size_t line_len; u64 offset; - FILE *fp; double percent_max = 0.0; src_line->nr_pcnt = nr_pcnt; @@ -1135,23 +1147,9 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map, goto next; offset = start + i; - sprintf(cmd, "addr2line -e %s %016" PRIx64, filename, offset); - fp = popen(cmd, "r"); - if (!fp) - goto next; - - if (getline(&path, &line_len, fp) < 0 || !line_len) - goto next_close; - - src_line->path = malloc(sizeof(char) * line_len + 1); - if (!src_line->path) - goto next_close; - - strcpy(src_line->path, path); + src_line->path = get_srcline(map->dso, offset); insert_source_line(&tmp_root, src_line); - next_close: - pclose(fp); next: src_line = (void *)src_line + sizeof_src_line; } @@ -1192,7 +1190,7 @@ static void print_summary(struct rb_root *root, const char *filename) path = src_line->path; color = get_percent_color(percent_max); - color_fprintf(stdout, color, " %s", path); + color_fprintf(stdout, color, " %s\n", path); node = rb_next(node); } @@ -1356,7 +1354,6 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map, bool full_paths, int min_pcnt, int max_lines) { struct dso *dso = map->dso; - const char *filename = dso->long_name; struct rb_root source_line = RB_ROOT; u64 len; @@ -1366,9 +1363,8 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map, len = symbol__size(sym); if (print_lines) { - symbol__get_source_line(sym, map, evsel, &source_line, - len, filename); - print_summary(&source_line, filename); + symbol__get_source_line(sym, map, evsel, &source_line, len); + print_summary(&source_line, dso->long_name); } symbol__annotate_printf(sym, map, evsel, full_paths, diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index af755156d278..834b7b57b788 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -150,7 +150,7 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map, struct perf_evsel *evsel, bool print_lines, bool full_paths, int min_pcnt, int max_lines); -#ifdef SLANG_SUPPORT +#ifdef HAVE_SLANG_SUPPORT int symbol__tui_annotate(struct symbol *sym, struct map *map, struct perf_evsel *evsel, struct hist_browser_timer *hbt); @@ -165,30 +165,6 @@ static inline int symbol__tui_annotate(struct symbol *sym __maybe_unused, } #endif -#ifdef GTK2_SUPPORT -int symbol__gtk_annotate(struct symbol *sym, struct map *map, - struct perf_evsel *evsel, - struct hist_browser_timer *hbt); - -static inline int hist_entry__gtk_annotate(struct hist_entry *he, - struct perf_evsel *evsel, - struct hist_browser_timer *hbt) -{ - return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt); -} - -void perf_gtk__show_annotations(void); -#else -static inline int hist_entry__gtk_annotate(struct hist_entry *he __maybe_unused, - struct perf_evsel *evsel __maybe_unused, - struct hist_browser_timer *hbt __maybe_unused) -{ - return 0; -} - -static inline void perf_gtk__show_annotations(void) {} -#endif - extern const char *disassembler_style; #endif /* __PERF_ANNOTATE_H */ diff --git a/tools/perf/util/cache.h b/tools/perf/util/cache.h index 26e367239873..7b176dd02e1a 100644 --- a/tools/perf/util/cache.h +++ b/tools/perf/util/cache.h @@ -70,8 +70,7 @@ extern char *perf_path(const char *fmt, ...) __attribute__((format (printf, 1, 2 extern char *perf_pathdup(const char *fmt, ...) __attribute__((format (printf, 1, 2))); -#ifndef HAVE_STRLCPY +/* Matches the libc/libbsd function attribute so we declare this unconditionally: */ extern size_t strlcpy(char *dest, const char *src, size_t size); -#endif #endif /* __PERF_CACHE_H */ diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 482f68081cd8..e3970e3eaacf 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -21,12 +21,6 @@ __thread struct callchain_cursor callchain_cursor; -#define chain_for_each_child(child, parent) \ - list_for_each_entry(child, &parent->children, siblings) - -#define chain_for_each_child_safe(child, next, parent) \ - list_for_each_entry_safe(child, next, &parent->children, siblings) - static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain, enum chain_mode mode) @@ -71,10 +65,16 @@ static void __sort_chain_flat(struct rb_root *rb_root, struct callchain_node *node, u64 min_hit) { + struct rb_node *n; struct callchain_node *child; - chain_for_each_child(child, node) + n = rb_first(&node->rb_root_in); + while (n) { + child = rb_entry(n, struct callchain_node, rb_node_in); + n = rb_next(n); + __sort_chain_flat(rb_root, child, min_hit); + } if (node->hit && node->hit >= min_hit) rb_insert_callchain(rb_root, node, CHAIN_FLAT); @@ -94,11 +94,16 @@ sort_chain_flat(struct rb_root *rb_root, struct callchain_root *root, static void __sort_chain_graph_abs(struct callchain_node *node, u64 min_hit) { + struct rb_node *n; struct callchain_node *child; node->rb_root = RB_ROOT; + n = rb_first(&node->rb_root_in); + + while (n) { + child = rb_entry(n, struct callchain_node, rb_node_in); + n = rb_next(n); - chain_for_each_child(child, node) { __sort_chain_graph_abs(child, min_hit); if (callchain_cumul_hits(child) >= min_hit) rb_insert_callchain(&node->rb_root, child, @@ -117,13 +122,18 @@ sort_chain_graph_abs(struct rb_root *rb_root, struct callchain_root *chain_root, static void __sort_chain_graph_rel(struct callchain_node *node, double min_percent) { + struct rb_node *n; struct callchain_node *child; u64 min_hit; node->rb_root = RB_ROOT; min_hit = ceil(node->children_hit * min_percent); - chain_for_each_child(child, node) { + n = rb_first(&node->rb_root_in); + while (n) { + child = rb_entry(n, struct callchain_node, rb_node_in); + n = rb_next(n); + __sort_chain_graph_rel(child, min_percent); if (callchain_cumul_hits(child) >= min_hit) rb_insert_callchain(&node->rb_root, child, @@ -173,19 +183,26 @@ create_child(struct callchain_node *parent, bool inherit_children) return NULL; } new->parent = parent; - INIT_LIST_HEAD(&new->children); INIT_LIST_HEAD(&new->val); if (inherit_children) { - struct callchain_node *next; + struct rb_node *n; + struct callchain_node *child; + + new->rb_root_in = parent->rb_root_in; + parent->rb_root_in = RB_ROOT; - list_splice(&parent->children, &new->children); - INIT_LIST_HEAD(&parent->children); + n = rb_first(&new->rb_root_in); + while (n) { + child = rb_entry(n, struct callchain_node, rb_node_in); + child->parent = new; + n = rb_next(n); + } - chain_for_each_child(next, new) - next->parent = new; + /* make it the first child */ + rb_link_node(&new->rb_node_in, NULL, &parent->rb_root_in.rb_node); + rb_insert_color(&new->rb_node_in, &parent->rb_root_in); } - list_add_tail(&new->siblings, &parent->children); return new; } @@ -223,7 +240,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor) } } -static void +static struct callchain_node * add_child(struct callchain_node *parent, struct callchain_cursor *cursor, u64 period) @@ -235,6 +252,19 @@ add_child(struct callchain_node *parent, new->children_hit = 0; new->hit = period; + return new; +} + +static s64 match_chain(struct callchain_cursor_node *node, + struct callchain_list *cnode) +{ + struct symbol *sym = node->sym; + + if (cnode->ms.sym && sym && + callchain_param.key == CCKEY_FUNCTION) + return cnode->ms.sym->start - sym->start; + else + return cnode->ip - node->ip; } /* @@ -272,9 +302,33 @@ split_add_child(struct callchain_node *parent, /* create a new child for the new branch if any */ if (idx_total < cursor->nr) { + struct callchain_node *first; + struct callchain_list *cnode; + struct callchain_cursor_node *node; + struct rb_node *p, **pp; + parent->hit = 0; - add_child(parent, cursor, period); parent->children_hit += period; + + node = callchain_cursor_current(cursor); + new = add_child(parent, cursor, period); + + /* + * This is second child since we moved parent's children + * to new (first) child above. + */ + p = parent->rb_root_in.rb_node; + first = rb_entry(p, struct callchain_node, rb_node_in); + cnode = list_first_entry(&first->val, struct callchain_list, + list); + + if (match_chain(node, cnode) < 0) + pp = &p->rb_left; + else + pp = &p->rb_right; + + rb_link_node(&new->rb_node_in, p, pp); + rb_insert_color(&new->rb_node_in, &parent->rb_root_in); } else { parent->hit = period; } @@ -291,16 +345,40 @@ append_chain_children(struct callchain_node *root, u64 period) { struct callchain_node *rnode; + struct callchain_cursor_node *node; + struct rb_node **p = &root->rb_root_in.rb_node; + struct rb_node *parent = NULL; + + node = callchain_cursor_current(cursor); + if (!node) + return; /* lookup in childrens */ - chain_for_each_child(rnode, root) { - unsigned int ret = append_chain(rnode, cursor, period); + while (*p) { + s64 ret; + struct callchain_list *cnode; - if (!ret) + parent = *p; + rnode = rb_entry(parent, struct callchain_node, rb_node_in); + cnode = list_first_entry(&rnode->val, struct callchain_list, + list); + + /* just check first entry */ + ret = match_chain(node, cnode); + if (ret == 0) { + append_chain(rnode, cursor, period); goto inc_children_hit; + } + + if (ret < 0) + p = &parent->rb_left; + else + p = &parent->rb_right; } /* nothing in children, add to the current node */ - add_child(root, cursor, period); + rnode = add_child(root, cursor, period); + rb_link_node(&rnode->rb_node_in, parent, p); + rb_insert_color(&rnode->rb_node_in, &root->rb_root_in); inc_children_hit: root->children_hit += period; @@ -325,28 +403,20 @@ append_chain(struct callchain_node *root, */ list_for_each_entry(cnode, &root->val, list) { struct callchain_cursor_node *node; - struct symbol *sym; node = callchain_cursor_current(cursor); if (!node) break; - sym = node->sym; - - if (cnode->ms.sym && sym && - callchain_param.key == CCKEY_FUNCTION) { - if (cnode->ms.sym->start != sym->start) - break; - } else if (cnode->ip != node->ip) + if (match_chain(node, cnode) != 0) break; - if (!found) - found = true; + found = true; callchain_cursor_advance(cursor); } - /* matches not, relay on the parent */ + /* matches not, relay no the parent */ if (!found) { cursor->curr = curr_snap; cursor->pos = start; @@ -395,8 +465,9 @@ merge_chain_branch(struct callchain_cursor *cursor, struct callchain_node *dst, struct callchain_node *src) { struct callchain_cursor_node **old_last = cursor->last; - struct callchain_node *child, *next_child; + struct callchain_node *child; struct callchain_list *list, *next_list; + struct rb_node *n; int old_pos = cursor->nr; int err = 0; @@ -412,12 +483,16 @@ merge_chain_branch(struct callchain_cursor *cursor, append_chain_children(dst, cursor, src->hit); } - chain_for_each_child_safe(child, next_child, src) { + n = rb_first(&src->rb_root_in); + while (n) { + child = container_of(n, struct callchain_node, rb_node_in); + n = rb_next(n); + rb_erase(&child->rb_node_in, &src->rb_root_in); + err = merge_chain_branch(cursor, dst, child); if (err) break; - list_del(&child->siblings); free(child); } diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 2b585bc308cf..7bb36022377f 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -21,11 +21,11 @@ enum chain_order { struct callchain_node { struct callchain_node *parent; - struct list_head siblings; - struct list_head children; struct list_head val; - struct rb_node rb_node; /* to sort nodes in an rbtree */ - struct rb_root rb_root; /* sorted tree of children */ + struct rb_node rb_node_in; /* to insert nodes in an rbtree */ + struct rb_node rb_node; /* to sort nodes in an output tree */ + struct rb_root rb_root_in; /* input tree of children */ + struct rb_root rb_root; /* sorted output tree of children */ unsigned int val_nr; u64 hit; u64 children_hit; @@ -86,13 +86,12 @@ extern __thread struct callchain_cursor callchain_cursor; static inline void callchain_init(struct callchain_root *root) { - INIT_LIST_HEAD(&root->node.siblings); - INIT_LIST_HEAD(&root->node.children); INIT_LIST_HEAD(&root->node.val); root->node.parent = NULL; root->node.hit = 0; root->node.children_hit = 0; + root->node.rb_root_in = RB_ROOT; root->max_depth = 0; } diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c new file mode 100644 index 000000000000..7d09faf85cf1 --- /dev/null +++ b/tools/perf/util/data.c @@ -0,0 +1,120 @@ +#include +#include +#include +#include +#include +#include + +#include "data.h" +#include "util.h" + +static bool check_pipe(struct perf_data_file *file) +{ + struct stat st; + bool is_pipe = false; + int fd = perf_data_file__is_read(file) ? + STDIN_FILENO : STDOUT_FILENO; + + if (!file->path) { + if (!fstat(fd, &st) && S_ISFIFO(st.st_mode)) + is_pipe = true; + } else { + if (!strcmp(file->path, "-")) + is_pipe = true; + } + + if (is_pipe) + file->fd = fd; + + return file->is_pipe = is_pipe; +} + +static int check_backup(struct perf_data_file *file) +{ + struct stat st; + + if (!stat(file->path, &st) && st.st_size) { + /* TODO check errors properly */ + char oldname[PATH_MAX]; + snprintf(oldname, sizeof(oldname), "%s.old", + file->path); + unlink(oldname); + rename(file->path, oldname); + } + + return 0; +} + +static int open_file_read(struct perf_data_file *file) +{ + struct stat st; + int fd; + + fd = open(file->path, O_RDONLY); + if (fd < 0) { + int err = errno; + + pr_err("failed to open %s: %s", file->path, strerror(err)); + if (err == ENOENT && !strcmp(file->path, "perf.data")) + pr_err(" (try 'perf record' first)"); + pr_err("\n"); + return -err; + } + + if (fstat(fd, &st) < 0) + goto out_close; + + if (!file->force && st.st_uid && (st.st_uid != geteuid())) { + pr_err("file %s not owned by current user or root\n", + file->path); + goto out_close; + } + + if (!st.st_size) { + pr_info("zero-sized file (%s), nothing to do!\n", + file->path); + goto out_close; + } + + file->size = st.st_size; + return fd; + + out_close: + close(fd); + return -1; +} + +static int open_file_write(struct perf_data_file *file) +{ + if (check_backup(file)) + return -1; + + return open(file->path, O_CREAT|O_RDWR|O_TRUNC, S_IRUSR|S_IWUSR); +} + +static int open_file(struct perf_data_file *file) +{ + int fd; + + fd = perf_data_file__is_read(file) ? + open_file_read(file) : open_file_write(file); + + file->fd = fd; + return fd < 0 ? -1 : 0; +} + +int perf_data_file__open(struct perf_data_file *file) +{ + if (check_pipe(file)) + return 0; + + if (!file->path) + file->path = "perf.data"; + + return open_file(file); +} + +void perf_data_file__close(struct perf_data_file *file) +{ + close(file->fd); +} diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h new file mode 100644 index 000000000000..8c2df80152a5 --- /dev/null +++ b/tools/perf/util/data.h @@ -0,0 +1,48 @@ +#ifndef __PERF_DATA_H +#define __PERF_DATA_H + +#include + +enum perf_data_mode { + PERF_DATA_MODE_WRITE, + PERF_DATA_MODE_READ, +}; + +struct perf_data_file { + const char *path; + int fd; + bool is_pipe; + bool force; + unsigned long size; + enum perf_data_mode mode; +}; + +static inline bool perf_data_file__is_read(struct perf_data_file *file) +{ + return file->mode == PERF_DATA_MODE_READ; +} + +static inline bool perf_data_file__is_write(struct perf_data_file *file) +{ + return file->mode == PERF_DATA_MODE_WRITE; +} + +static inline int perf_data_file__is_pipe(struct perf_data_file *file) +{ + return file->is_pipe; +} + +static inline int perf_data_file__fd(struct perf_data_file *file) +{ + return file->fd; +} + +static inline unsigned long perf_data_file__size(struct perf_data_file *file) +{ + return file->size; +} + +int perf_data_file__open(struct perf_data_file *file); +void perf_data_file__close(struct perf_data_file *file); + +#endif /* __PERF_DATA_H */ diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index e3c1ff8512c8..af4c687cc49b 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -7,19 +7,20 @@ char dso__symtab_origin(const struct dso *dso) { static const char origin[] = { - [DSO_BINARY_TYPE__KALLSYMS] = 'k', - [DSO_BINARY_TYPE__VMLINUX] = 'v', - [DSO_BINARY_TYPE__JAVA_JIT] = 'j', - [DSO_BINARY_TYPE__DEBUGLINK] = 'l', - [DSO_BINARY_TYPE__BUILD_ID_CACHE] = 'B', - [DSO_BINARY_TYPE__FEDORA_DEBUGINFO] = 'f', - [DSO_BINARY_TYPE__UBUNTU_DEBUGINFO] = 'u', - [DSO_BINARY_TYPE__BUILDID_DEBUGINFO] = 'b', - [DSO_BINARY_TYPE__SYSTEM_PATH_DSO] = 'd', - [DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE] = 'K', - [DSO_BINARY_TYPE__GUEST_KALLSYMS] = 'g', - [DSO_BINARY_TYPE__GUEST_KMODULE] = 'G', - [DSO_BINARY_TYPE__GUEST_VMLINUX] = 'V', + [DSO_BINARY_TYPE__KALLSYMS] = 'k', + [DSO_BINARY_TYPE__VMLINUX] = 'v', + [DSO_BINARY_TYPE__JAVA_JIT] = 'j', + [DSO_BINARY_TYPE__DEBUGLINK] = 'l', + [DSO_BINARY_TYPE__BUILD_ID_CACHE] = 'B', + [DSO_BINARY_TYPE__FEDORA_DEBUGINFO] = 'f', + [DSO_BINARY_TYPE__UBUNTU_DEBUGINFO] = 'u', + [DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO] = 'o', + [DSO_BINARY_TYPE__BUILDID_DEBUGINFO] = 'b', + [DSO_BINARY_TYPE__SYSTEM_PATH_DSO] = 'd', + [DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE] = 'K', + [DSO_BINARY_TYPE__GUEST_KALLSYMS] = 'g', + [DSO_BINARY_TYPE__GUEST_KMODULE] = 'G', + [DSO_BINARY_TYPE__GUEST_VMLINUX] = 'V', }; if (dso == NULL || dso->symtab_type == DSO_BINARY_TYPE__NOT_FOUND) @@ -64,6 +65,28 @@ int dso__binary_type_file(struct dso *dso, enum dso_binary_type type, symbol_conf.symfs, dso->long_name); break; + case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO: + { + char *last_slash; + size_t len; + size_t dir_size; + + last_slash = dso->long_name + dso->long_name_len; + while (last_slash != dso->long_name && *last_slash != '/') + last_slash--; + + len = scnprintf(file, size, "%s", symbol_conf.symfs); + dir_size = last_slash - dso->long_name + 2; + if (dir_size > (size - len)) { + ret = -1; + break; + } + len += scnprintf(file + len, dir_size, "%s", dso->long_name); + len += scnprintf(file + len , size - len, ".debug%s", + last_slash); + break; + } + case DSO_BINARY_TYPE__BUILDID_DEBUGINFO: if (!dso->has_build_id) { ret = -1; @@ -427,6 +450,7 @@ struct dso *dso__new(const char *name) dso->rel = 0; dso->sorted_by_name = 0; dso->has_build_id = 0; + dso->has_srcline = 1; dso->kernel = DSO_TYPE_USER; dso->needs_swap = DSO_SWAP__UNSET; INIT_LIST_HEAD(&dso->node); diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h index b793053335d6..9ac666abbe7e 100644 --- a/tools/perf/util/dso.h +++ b/tools/perf/util/dso.h @@ -6,6 +6,7 @@ #include #include "types.h" #include "map.h" +#include "build-id.h" enum dso_binary_type { DSO_BINARY_TYPE__KALLSYMS = 0, @@ -23,6 +24,7 @@ enum dso_binary_type { DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE, DSO_BINARY_TYPE__KCORE, DSO_BINARY_TYPE__GUEST_KCORE, + DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO, DSO_BINARY_TYPE__NOT_FOUND, }; @@ -81,6 +83,7 @@ struct dso { enum dso_binary_type data_type; u8 adjust_symbols:1; u8 has_build_id:1; + u8 has_srcline:1; u8 hit:1; u8 annotate_warned:1; u8 sname_alloc:1; diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index e23bde19d590..7defd77105d0 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -426,7 +426,7 @@ static int __die_search_func_cb(Dwarf_Die *fn_die, void *data) * @die_mem: a buffer for result DIE * * Search a non-inlined function DIE which includes @addr. Stores the - * DIE to @die_mem and returns it if found. Returns NULl if failed. + * DIE to @die_mem and returns it if found. Returns NULL if failed. */ Dwarf_Die *die_find_realfunc(Dwarf_Die *cu_die, Dwarf_Addr addr, Dwarf_Die *die_mem) @@ -453,16 +453,33 @@ static int __die_find_inline_cb(Dwarf_Die *die_mem, void *data) return DIE_FIND_CB_CONTINUE; } +/** + * die_find_top_inlinefunc - Search the top inlined function at given address + * @sp_die: a subprogram DIE which including @addr + * @addr: target address + * @die_mem: a buffer for result DIE + * + * Search an inlined function DIE which includes @addr. Stores the + * DIE to @die_mem and returns it if found. Returns NULL if failed. + * Even if several inlined functions are expanded recursively, this + * doesn't trace it down, and returns the topmost one. + */ +Dwarf_Die *die_find_top_inlinefunc(Dwarf_Die *sp_die, Dwarf_Addr addr, + Dwarf_Die *die_mem) +{ + return die_find_child(sp_die, __die_find_inline_cb, &addr, die_mem); +} + /** * die_find_inlinefunc - Search an inlined function at given address - * @cu_die: a CU DIE which including @addr + * @sp_die: a subprogram DIE which including @addr * @addr: target address * @die_mem: a buffer for result DIE * * Search an inlined function DIE which includes @addr. Stores the - * DIE to @die_mem and returns it if found. Returns NULl if failed. + * DIE to @die_mem and returns it if found. Returns NULL if failed. * If several inlined functions are expanded recursively, this trace - * it and returns deepest one. + * it down and returns deepest one. */ Dwarf_Die *die_find_inlinefunc(Dwarf_Die *sp_die, Dwarf_Addr addr, Dwarf_Die *die_mem) diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h index 8658d41697d2..b4fe90c6cb2d 100644 --- a/tools/perf/util/dwarf-aux.h +++ b/tools/perf/util/dwarf-aux.h @@ -79,7 +79,11 @@ extern Dwarf_Die *die_find_child(Dwarf_Die *rt_die, extern Dwarf_Die *die_find_realfunc(Dwarf_Die *cu_die, Dwarf_Addr addr, Dwarf_Die *die_mem); -/* Search an inlined function including given address */ +/* Search the top inlined function including given address */ +extern Dwarf_Die *die_find_top_inlinefunc(Dwarf_Die *sp_die, Dwarf_Addr addr, + Dwarf_Die *die_mem); + +/* Search the deepest inlined function including given address */ extern Dwarf_Die *die_find_inlinefunc(Dwarf_Die *sp_die, Dwarf_Addr addr, Dwarf_Die *die_mem); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 9b393e7dca6f..63df031fc9c7 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -187,7 +187,7 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool, return -1; } - event->header.type = PERF_RECORD_MMAP2; + event->header.type = PERF_RECORD_MMAP; /* * Just like the kernel, see __perf_event_mmap in kernel/perf_event.c */ @@ -198,7 +198,6 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool, char prot[5]; char execname[PATH_MAX]; char anonstr[] = "//anon"; - unsigned int ino; size_t size; ssize_t n; @@ -209,13 +208,10 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool, strcpy(execname, ""); /* 00400000-0040c000 r-xp 00000000 fd:01 41038 /bin/cat */ - n = sscanf(bf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %x:%x %u %s\n", - &event->mmap2.start, &event->mmap2.len, prot, - &event->mmap2.pgoff, &event->mmap2.maj, - &event->mmap2.min, - &ino, execname); - - event->mmap2.ino = (u64)ino; + n = sscanf(bf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n", + &event->mmap.start, &event->mmap.len, prot, + &event->mmap.pgoff, + execname); if (n != 8) continue; @@ -227,15 +223,15 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool, strcpy(execname, anonstr); size = strlen(execname) + 1; - memcpy(event->mmap2.filename, execname, size); + memcpy(event->mmap.filename, execname, size); size = PERF_ALIGN(size, sizeof(u64)); - event->mmap2.len -= event->mmap.start; - event->mmap2.header.size = (sizeof(event->mmap2) - - (sizeof(event->mmap2.filename) - size)); - memset(event->mmap2.filename + size, 0, machine->id_hdr_size); - event->mmap2.header.size += machine->id_hdr_size; - event->mmap2.pid = tgid; - event->mmap2.tid = pid; + event->mmap.len -= event->mmap.start; + event->mmap.header.size = (sizeof(event->mmap) - + (sizeof(event->mmap.filename) - size)); + memset(event->mmap.filename + size, 0, machine->id_hdr_size); + event->mmap.header.size += machine->id_hdr_size; + event->mmap.pid = tgid; + event->mmap.tid = pid; if (process(tool, event, &synth_sample, machine) != 0) { rc = -1; diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index c67ecc457d29..752709ccfb00 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -61,6 +61,12 @@ struct read_event { u64 id; }; +struct throttle_event { + struct perf_event_header header; + u64 time; + u64 id; + u64 stream_id; +}; #define PERF_SAMPLE_MASK \ (PERF_SAMPLE_IP | PERF_SAMPLE_TID | \ @@ -69,6 +75,9 @@ struct read_event { PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD | \ PERF_SAMPLE_IDENTIFIER) +/* perf sample has 16 bits size limit */ +#define PERF_SAMPLE_MAX_SIZE (1 << 16) + struct sample_event { struct perf_event_header header; u64 array[]; @@ -111,6 +120,7 @@ struct perf_sample { u64 stream_id; u64 period; u64 weight; + u64 transaction; u32 cpu; u32 raw_size; u64 data_src; @@ -177,6 +187,7 @@ union perf_event { struct fork_event fork; struct lost_event lost; struct read_event read; + struct throttle_event throttle; struct sample_event sample; struct attr_event attr; struct event_type_event event_type; diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f9f77bee0b1b..85c4c80bcac8 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -18,6 +18,7 @@ #include #include "parse-events.h" +#include "parse-options.h" #include @@ -49,6 +50,18 @@ struct perf_evlist *perf_evlist__new(void) return evlist; } +struct perf_evlist *perf_evlist__new_default(void) +{ + struct perf_evlist *evlist = perf_evlist__new(); + + if (evlist && perf_evlist__add_default(evlist)) { + perf_evlist__delete(evlist); + evlist = NULL; + } + + return evlist; +} + /** * perf_evlist__set_id_pos - set the positions of event ids. * @evlist: selected event list @@ -527,7 +540,7 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx) if ((old & md->mask) + size != ((old + size) & md->mask)) { unsigned int offset = old; unsigned int len = min(sizeof(*event), size), cpy; - void *dst = &md->event_copy; + void *dst = md->event_copy; do { cpy = min(md->mask + 1 - (offset & md->mask), len); @@ -537,7 +550,7 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx) len -= cpy; } while (len); - event = &md->event_copy; + event = (union perf_event *) md->event_copy; } old += size; @@ -595,9 +608,36 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, return 0; } -static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot, int mask) +static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx, + int prot, int mask, int cpu, int thread, + int *output) { struct perf_evsel *evsel; + + list_for_each_entry(evsel, &evlist->entries, node) { + int fd = FD(evsel, cpu, thread); + + if (*output == -1) { + *output = fd; + if (__perf_evlist__mmap(evlist, idx, prot, mask, + *output) < 0) + return -1; + } else { + if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0) + return -1; + } + + if ((evsel->attr.read_format & PERF_FORMAT_ID) && + perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0) + return -1; + } + + return 0; +} + +static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot, + int mask) +{ int cpu, thread; int nr_cpus = cpu_map__nr(evlist->cpus); int nr_threads = thread_map__nr(evlist->threads); @@ -607,23 +647,9 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot, int m int output = -1; for (thread = 0; thread < nr_threads; thread++) { - list_for_each_entry(evsel, &evlist->entries, node) { - int fd = FD(evsel, cpu, thread); - - if (output == -1) { - output = fd; - if (__perf_evlist__mmap(evlist, cpu, - prot, mask, output) < 0) - goto out_unmap; - } else { - if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, output) != 0) - goto out_unmap; - } - - if ((evsel->attr.read_format & PERF_FORMAT_ID) && - perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0) - goto out_unmap; - } + if (perf_evlist__mmap_per_evsel(evlist, cpu, prot, mask, + cpu, thread, &output)) + goto out_unmap; } } @@ -635,9 +661,9 @@ out_unmap: return -1; } -static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot, int mask) +static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot, + int mask) { - struct perf_evsel *evsel; int thread; int nr_threads = thread_map__nr(evlist->threads); @@ -645,23 +671,9 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot, in for (thread = 0; thread < nr_threads; thread++) { int output = -1; - list_for_each_entry(evsel, &evlist->entries, node) { - int fd = FD(evsel, 0, thread); - - if (output == -1) { - output = fd; - if (__perf_evlist__mmap(evlist, thread, - prot, mask, output) < 0) - goto out_unmap; - } else { - if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, output) != 0) - goto out_unmap; - } - - if ((evsel->attr.read_format & PERF_FORMAT_ID) && - perf_evlist__id_add_fd(evlist, evsel, 0, thread, fd) < 0) - goto out_unmap; - } + if (perf_evlist__mmap_per_evsel(evlist, thread, prot, mask, 0, + thread, &output)) + goto out_unmap; } return 0; @@ -672,20 +684,70 @@ out_unmap: return -1; } -/** perf_evlist__mmap - Create per cpu maps to receive events - * - * @evlist - list of events - * @pages - map length in pages - * @overwrite - overwrite older events? - * - * If overwrite is false the user needs to signal event consuption using: - * - * struct perf_mmap *m = &evlist->mmap[cpu]; - * unsigned int head = perf_mmap__read_head(m); +static size_t perf_evlist__mmap_size(unsigned long pages) +{ + /* 512 kiB: default amount of unprivileged mlocked memory */ + if (pages == UINT_MAX) + pages = (512 * 1024) / page_size; + else if (!is_power_of_2(pages)) + return 0; + + return (pages + 1) * page_size; +} + +int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str, + int unset __maybe_unused) +{ + unsigned int pages, val, *mmap_pages = opt->value; + size_t size; + static struct parse_tag tags[] = { + { .tag = 'B', .mult = 1 }, + { .tag = 'K', .mult = 1 << 10 }, + { .tag = 'M', .mult = 1 << 20 }, + { .tag = 'G', .mult = 1 << 30 }, + { .tag = 0 }, + }; + + val = parse_tag_value(str, tags); + if (val != (unsigned int) -1) { + /* we got file size value */ + pages = PERF_ALIGN(val, page_size) / page_size; + if (!is_power_of_2(pages)) { + pages = next_pow2(pages); + pr_info("rounding mmap pages size to %u (%u pages)\n", + pages * page_size, pages); + } + } else { + /* we got pages count value */ + char *eptr; + pages = strtoul(str, &eptr, 10); + if (*eptr != '\0') { + pr_err("failed to parse --mmap_pages/-m value\n"); + return -1; + } + } + + size = perf_evlist__mmap_size(pages); + if (!size) { + pr_err("--mmap_pages/-m value must be a power of two."); + return -1; + } + + *mmap_pages = pages; + return 0; +} + +/** + * perf_evlist__mmap - Create mmaps to receive events. + * @evlist: list of events + * @pages: map length in pages + * @overwrite: overwrite older events? * - * perf_mmap__write_tail(m, head) + * If @overwrite is %false the user needs to signal event consumption using + * perf_mmap__write_tail(). Using perf_evlist__mmap_read() does this + * automatically. * - * Using perf_evlist__read_on_cpu does this automatically. + * Return: %0 on success, negative error code otherwise. */ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages, bool overwrite) @@ -695,14 +757,6 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages, const struct thread_map *threads = evlist->threads; int prot = PROT_READ | (overwrite ? 0 : PROT_WRITE), mask; - /* 512 kiB: default amount of unprivileged mlocked memory */ - if (pages == UINT_MAX) - pages = (512 * 1024) / page_size; - else if (!is_power_of_2(pages)) - return -EINVAL; - - mask = pages * page_size - 1; - if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0) return -ENOMEM; @@ -710,7 +764,9 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages, return -ENOMEM; evlist->overwrite = overwrite; - evlist->mmap_len = (pages + 1) * page_size; + evlist->mmap_len = perf_evlist__mmap_size(pages); + pr_debug("mmap size %zuB\n", evlist->mmap_len); + mask = evlist->mmap_len - page_size - 1; list_for_each_entry(evsel, &evlist->entries, node) { if ((evsel->attr.read_format & PERF_FORMAT_ID) && @@ -1066,3 +1122,66 @@ size_t perf_evlist__fprintf(struct perf_evlist *evlist, FILE *fp) return printed + fprintf(fp, "\n");; } + +int perf_evlist__strerror_tp(struct perf_evlist *evlist __maybe_unused, + int err, char *buf, size_t size) +{ + char sbuf[128]; + + switch (err) { + case ENOENT: + scnprintf(buf, size, "%s", + "Error:\tUnable to find debugfs\n" + "Hint:\tWas your kernel was compiled with debugfs support?\n" + "Hint:\tIs the debugfs filesystem mounted?\n" + "Hint:\tTry 'sudo mount -t debugfs nodev /sys/kernel/debug'"); + break; + case EACCES: + scnprintf(buf, size, + "Error:\tNo permissions to read %s/tracing/events/raw_syscalls\n" + "Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n", + debugfs_mountpoint, debugfs_mountpoint); + break; + default: + scnprintf(buf, size, "%s", strerror_r(err, sbuf, sizeof(sbuf))); + break; + } + + return 0; +} + +int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused, + int err, char *buf, size_t size) +{ + int printed, value; + char sbuf[128], *emsg = strerror_r(err, sbuf, sizeof(sbuf)); + + switch (err) { + case EACCES: + case EPERM: + printed = scnprintf(buf, size, + "Error:\t%s.\n" + "Hint:\tCheck /proc/sys/kernel/perf_event_paranoid setting.", emsg); + + if (filename__read_int("/proc/sys/kernel/perf_event_paranoid", &value)) + break; + + printed += scnprintf(buf + printed, size - printed, "\nHint:\t"); + + if (value >= 2) { + printed += scnprintf(buf + printed, size - printed, + "For your workloads it needs to be <= 1\nHint:\t"); + } + printed += scnprintf(buf + printed, size - printed, + "For system wide tracing it needs to be set to -1"); + + printed += scnprintf(buf + printed, size - printed, + ".\nHint:\tThe current value is %d.", value); + break; + default: + scnprintf(buf, size, "%s", emsg); + break; + } + + return 0; +} diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 880d7139d2fb..7f8f1aeb9cfe 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -21,7 +21,7 @@ struct perf_mmap { void *base; int mask; unsigned int prev; - union perf_event event_copy; + char event_copy[PERF_SAMPLE_MAX_SIZE]; }; struct perf_evlist { @@ -31,7 +31,7 @@ struct perf_evlist { int nr_groups; int nr_fds; int nr_mmaps; - int mmap_len; + size_t mmap_len; int id_pos; int is_pos; u64 combined_sample_type; @@ -53,6 +53,7 @@ struct perf_evsel_str_handler { }; struct perf_evlist *perf_evlist__new(void); +struct perf_evlist *perf_evlist__new_default(void); void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads); void perf_evlist__exit(struct perf_evlist *evlist); @@ -103,6 +104,10 @@ int perf_evlist__prepare_workload(struct perf_evlist *evlist, bool want_signal); int perf_evlist__start_workload(struct perf_evlist *evlist); +int perf_evlist__parse_mmap_pages(const struct option *opt, + const char *str, + int unset); + int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages, bool overwrite); void perf_evlist__munmap(struct perf_evlist *evlist); @@ -163,6 +168,9 @@ static inline struct perf_evsel *perf_evlist__last(struct perf_evlist *evlist) size_t perf_evlist__fprintf(struct perf_evlist *evlist, FILE *fp); +int perf_evlist__strerror_tp(struct perf_evlist *evlist, int err, char *buf, size_t size); +int perf_evlist__strerror_open(struct perf_evlist *evlist, int err, char *buf, size_t size); + static inline unsigned int perf_mmap__read_head(struct perf_mmap *mm) { struct perf_event_mmap_page *pc = mm->base; diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 0ce9febf1ba0..3a334f001997 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -678,9 +678,11 @@ void perf_evsel__config(struct perf_evsel *evsel, attr->sample_type |= PERF_SAMPLE_WEIGHT; attr->mmap = track; - attr->mmap2 = track && !perf_missing_features.mmap2; attr->comm = track; + if (opts->sample_transaction) + attr->sample_type |= PERF_SAMPLE_TRANSACTION; + /* * XXX see the function comment above * @@ -983,6 +985,7 @@ static size_t perf_event_attr__fprintf(struct perf_event_attr *attr, FILE *fp) ret += PRINT_ATTR2(exclude_host, exclude_guest); ret += PRINT_ATTR2N("excl.callchain_kern", exclude_callchain_kernel, "excl.callchain_user", exclude_callchain_user); + ret += PRINT_ATTR_U32(mmap2); ret += PRINT_ATTR_U32(wakeup_events); ret += PRINT_ATTR_U32(wakeup_watermark); @@ -1214,6 +1217,7 @@ static int perf_evsel__parse_id_sample(const struct perf_evsel *evsel, sample->pid = u.val32[0]; sample->tid = u.val32[1]; + array--; } return 0; @@ -1453,6 +1457,9 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event, array = (void *)array + sz; OVERFLOW_CHECK_u64(array); data->user_stack.size = *array++; + if (WARN_ONCE(data->user_stack.size > sz, + "user stack dump failure\n")) + return -EFAULT; } } @@ -1470,6 +1477,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event, array++; } + data->transaction = 0; + if (type & PERF_SAMPLE_TRANSACTION) { + data->transaction = *array; + array++; + } + return 0; } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 4a7bdc713bab..5aa68cddc7d9 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -197,6 +197,12 @@ static inline bool perf_evsel__match2(struct perf_evsel *e1, (e1->attr.config == e2->attr.config); } +#define perf_evsel__cmp(a, b) \ + ((a) && \ + (b) && \ + (a)->attr.type == (b)->attr.type && \ + (a)->attr.config == (b)->attr.config) + int __perf_evsel__read_on_cpu(struct perf_evsel *evsel, int cpu, int thread, bool scale); diff --git a/tools/perf/util/generate-cmdlist.sh b/tools/perf/util/generate-cmdlist.sh index 3ac38031d534..36a885d2cd22 100755 --- a/tools/perf/util/generate-cmdlist.sh +++ b/tools/perf/util/generate-cmdlist.sh @@ -22,7 +22,7 @@ do }' "Documentation/perf-$cmd.txt" done -echo "#ifdef LIBELF_SUPPORT" +echo "#ifdef HAVE_LIBELF_SUPPORT" sed -n -e 's/^perf-\([^ ]*\)[ ].* full.*/\1/p' command-list.txt | sort | while read cmd @@ -35,5 +35,5 @@ do p }' "Documentation/perf-$cmd.txt" done -echo "#endif /* LIBELF_SUPPORT */" +echo "#endif /* HAVE_LIBELF_SUPPORT */" echo "};" diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index ce69901176d8..26d9520a0c1b 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -22,6 +22,7 @@ #include "vdso.h" #include "strbuf.h" #include "build-id.h" +#include "data.h" static bool no_buildid_cache = false; @@ -2189,7 +2190,7 @@ int perf_header__fprintf_info(struct perf_session *session, FILE *fp, bool full) { struct header_print_data hd; struct perf_header *header = &session->header; - int fd = session->fd; + int fd = perf_data_file__fd(session->file); hd.fp = fp; hd.full = full; @@ -2650,7 +2651,8 @@ static int perf_header__read_pipe(struct perf_session *session) struct perf_header *header = &session->header; struct perf_pipe_file_header f_header; - if (perf_file_header__read_pipe(&f_header, header, session->fd, + if (perf_file_header__read_pipe(&f_header, header, + perf_data_file__fd(session->file), session->repipe) < 0) { pr_debug("incompatible file format\n"); return -EINVAL; @@ -2751,23 +2753,36 @@ static int perf_evlist__prepare_tracepoint_events(struct perf_evlist *evlist, int perf_session__read_header(struct perf_session *session) { + struct perf_data_file *file = session->file; struct perf_header *header = &session->header; struct perf_file_header f_header; struct perf_file_attr f_attr; u64 f_id; int nr_attrs, nr_ids, i, j; - int fd = session->fd; + int fd = perf_data_file__fd(file); session->evlist = perf_evlist__new(); if (session->evlist == NULL) return -ENOMEM; - if (session->fd_pipe) + if (perf_data_file__is_pipe(file)) return perf_header__read_pipe(session); if (perf_file_header__read(&f_header, header, fd) < 0) return -EINVAL; + /* + * Sanity check that perf.data was written cleanly; data size is + * initialized to 0 and updated only if the on_exit function is run. + * If data size is still 0 then the file contains only partial + * information. Just warn user and process it as much as it can. + */ + if (f_header.data.size == 0) { + pr_warning("WARNING: The %s file's data size field is 0 which is unexpected.\n" + "Was the 'perf record' command properly terminated?\n", + file->path); + } + nr_attrs = f_header.attrs.size / f_header.attr_size; lseek(fd, f_header.attrs.offset, SEEK_SET); @@ -2978,18 +2993,19 @@ int perf_event__process_tracing_data(struct perf_tool *tool __maybe_unused, struct perf_session *session) { ssize_t size_read, padding, size = event->tracing_data.size; - off_t offset = lseek(session->fd, 0, SEEK_CUR); + int fd = perf_data_file__fd(session->file); + off_t offset = lseek(fd, 0, SEEK_CUR); char buf[BUFSIZ]; /* setup for reading amidst mmap */ - lseek(session->fd, offset + sizeof(struct tracing_data_event), + lseek(fd, offset + sizeof(struct tracing_data_event), SEEK_SET); - size_read = trace_report(session->fd, &session->pevent, + size_read = trace_report(fd, &session->pevent, session->repipe); padding = PERF_ALIGN(size_read, sizeof(u64)) - size_read; - if (readn(session->fd, buf, padding) < 0) { + if (readn(fd, buf, padding) < 0) { pr_err("%s: reading input file", __func__); return -1; } diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 9ff6cf3e9a99..cca03831f41a 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -160,6 +160,10 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h) hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3); hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12); hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12); + + if (h->transaction) + hists__new_col_len(hists, HISTC_TRANSACTION, + hist_entry__transaction_len()); } void hists__output_recalc_col_len(struct hists *hists, int max_rows) @@ -346,7 +350,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists, struct rb_node **p; struct rb_node *parent = NULL; struct hist_entry *he; - int cmp; + int64_t cmp; p = &hists->entries_in->rb_node; @@ -466,7 +470,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self, struct hist_entry *__hists__add_entry(struct hists *self, struct addr_location *al, struct symbol *sym_parent, u64 period, - u64 weight) + u64 weight, u64 transaction) { struct hist_entry entry = { .thread = al->thread, @@ -487,6 +491,7 @@ struct hist_entry *__hists__add_entry(struct hists *self, .hists = self, .branch_info = NULL, .mem_info = NULL, + .transaction = transaction, }; return add_hist_entry(self, &entry, al, period, weight); @@ -530,6 +535,7 @@ void hist_entry__free(struct hist_entry *he) { free(he->branch_info); free(he->mem_info); + free_srcline(he->srcline); free(he); } @@ -884,7 +890,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists, struct rb_node **p; struct rb_node *parent = NULL; struct hist_entry *he; - int cmp; + int64_t cmp; if (sort__need_collapse) root = &hists->entries_collapsed; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 1329b6b6ffe6..20b175808cd3 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -45,6 +45,8 @@ enum hist_column { HISTC_CPU, HISTC_SRCLINE, HISTC_MISPREDICT, + HISTC_IN_TX, + HISTC_ABORT, HISTC_SYMBOL_FROM, HISTC_SYMBOL_TO, HISTC_DSO_FROM, @@ -57,6 +59,7 @@ enum hist_column { HISTC_MEM_TLB, HISTC_MEM_LVL, HISTC_MEM_SNOOP, + HISTC_TRANSACTION, HISTC_NR_COLS, /* Last entry */ }; @@ -82,9 +85,10 @@ struct hists { struct hist_entry *__hists__add_entry(struct hists *self, struct addr_location *al, struct symbol *parent, u64 period, - u64 weight); + u64 weight, u64 transaction); int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right); int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right); +int hist_entry__transaction_len(void); int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size, struct hists *hists); void hist_entry__free(struct hist_entry *); @@ -183,7 +187,7 @@ struct hist_browser_timer { int refresh; }; -#ifdef SLANG_SUPPORT +#ifdef HAVE_SLANG_SUPPORT #include "../ui/keysyms.h" int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel, struct hist_browser_timer *hbt); @@ -224,20 +228,5 @@ static inline int script_browse(const char *script_opt __maybe_unused) #define K_SWITCH_INPUT_DATA -3000 #endif -#ifdef GTK2_SUPPORT -int perf_evlist__gtk_browse_hists(struct perf_evlist *evlist, const char *help, - struct hist_browser_timer *hbt __maybe_unused, - float min_pcnt); -#else -static inline -int perf_evlist__gtk_browse_hists(struct perf_evlist *evlist __maybe_unused, - const char *help __maybe_unused, - struct hist_browser_timer *hbt __maybe_unused, - float min_pcnt __maybe_unused) -{ - return 0; -} -#endif - unsigned int hists__sort_list_width(struct hists *self); #endif /* __PERF_HIST_H */ diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h index cf6727e99c44..8f149655f497 100644 --- a/tools/perf/util/include/dwarf-regs.h +++ b/tools/perf/util/include/dwarf-regs.h @@ -1,7 +1,7 @@ #ifndef _PERF_DWARF_REGS_H_ #define _PERF_DWARF_REGS_H_ -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT const char *get_arch_regstr(unsigned int n); #endif diff --git a/tools/perf/util/include/linux/compiler.h b/tools/perf/util/include/linux/compiler.h index 96b919dae11c..b003ad7200b2 100644 --- a/tools/perf/util/include/linux/compiler.h +++ b/tools/perf/util/include/linux/compiler.h @@ -2,20 +2,29 @@ #define _PERF_LINUX_COMPILER_H_ #ifndef __always_inline -#define __always_inline inline +# define __always_inline inline __attribute__((always_inline)) #endif + #define __user + #ifndef __attribute_const__ -#define __attribute_const__ +# define __attribute_const__ #endif #ifndef __maybe_unused -#define __maybe_unused __attribute__((unused)) +# define __maybe_unused __attribute__((unused)) +#endif + +#ifndef __packed +# define __packed __attribute__((__packed__)) #endif -#define __packed __attribute__((__packed__)) #ifndef __force -#define __force +# define __force +#endif + +#ifndef __weak +# define __weak __attribute__((weak)) #endif #endif diff --git a/tools/perf/util/intlist.c b/tools/perf/util/intlist.c index 11a8d86f7fea..89715b64a315 100644 --- a/tools/perf/util/intlist.c +++ b/tools/perf/util/intlist.c @@ -20,6 +20,7 @@ static struct rb_node *intlist__node_new(struct rblist *rblist __maybe_unused, if (node != NULL) { node->i = i; + node->priv = NULL; rc = &node->rb_node; } @@ -57,22 +58,36 @@ void intlist__remove(struct intlist *ilist, struct int_node *node) rblist__remove_node(&ilist->rblist, &node->rb_node); } -struct int_node *intlist__find(struct intlist *ilist, int i) +static struct int_node *__intlist__findnew(struct intlist *ilist, + int i, bool create) { - struct int_node *node; + struct int_node *node = NULL; struct rb_node *rb_node; if (ilist == NULL) return NULL; - node = NULL; - rb_node = rblist__find(&ilist->rblist, (void *)((long)i)); + if (create) + rb_node = rblist__findnew(&ilist->rblist, (void *)((long)i)); + else + rb_node = rblist__find(&ilist->rblist, (void *)((long)i)); + if (rb_node) node = container_of(rb_node, struct int_node, rb_node); return node; } +struct int_node *intlist__find(struct intlist *ilist, int i) +{ + return __intlist__findnew(ilist, i, false); +} + +struct int_node *intlist__findnew(struct intlist *ilist, int i) +{ + return __intlist__findnew(ilist, i, true); +} + static int intlist__parse_list(struct intlist *ilist, const char *s) { char *sep; diff --git a/tools/perf/util/intlist.h b/tools/perf/util/intlist.h index 62351dad848f..aa6877d36858 100644 --- a/tools/perf/util/intlist.h +++ b/tools/perf/util/intlist.h @@ -9,6 +9,7 @@ struct int_node { struct rb_node rb_node; int i; + void *priv; }; struct intlist { @@ -23,6 +24,7 @@ int intlist__add(struct intlist *ilist, int i); struct int_node *intlist__entry(const struct intlist *ilist, unsigned int idx); struct int_node *intlist__find(struct intlist *ilist, int i); +struct int_node *intlist__findnew(struct intlist *ilist, int i); static inline bool intlist__has_entry(struct intlist *ilist, int i) { diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 933d14f287ca..ea93425cce95 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -46,6 +46,23 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid) return 0; } +struct machine *machine__new_host(void) +{ + struct machine *machine = malloc(sizeof(*machine)); + + if (machine != NULL) { + machine__init(machine, "", HOST_KERNEL_ID); + + if (machine__create_kernel_maps(machine) < 0) + goto out_delete; + } + + return machine; +out_delete: + free(machine); + return NULL; +} + static void dsos__delete(struct list_head *dsos) { struct dso *pos, *n; @@ -776,75 +793,44 @@ static int machine__set_modules_path(struct machine *machine) return map_groups__set_modules_path_dir(&machine->kmaps, modules_path); } -static int machine__create_modules(struct machine *machine) +static int machine__create_module(void *arg, const char *name, u64 start) { - char *line = NULL; - size_t n; - FILE *file; + struct machine *machine = arg; struct map *map; + + map = machine__new_module(machine, start, name); + if (map == NULL) + return -1; + + dso__kernel_module_get_build_id(map->dso, machine->root_dir); + + return 0; +} + +static int machine__create_modules(struct machine *machine) +{ const char *modules; char path[PATH_MAX]; - if (machine__is_default_guest(machine)) + if (machine__is_default_guest(machine)) { modules = symbol_conf.default_guest_modules; - else { - sprintf(path, "%s/proc/modules", machine->root_dir); + } else { + snprintf(path, PATH_MAX, "%s/proc/modules", machine->root_dir); modules = path; } - if (symbol__restricted_filename(path, "/proc/modules")) + if (symbol__restricted_filename(modules, "/proc/modules")) return -1; - file = fopen(modules, "r"); - if (file == NULL) + if (modules__parse(modules, machine, machine__create_module)) return -1; - while (!feof(file)) { - char name[PATH_MAX]; - u64 start; - char *sep; - int line_len; - - line_len = getline(&line, &n, file); - if (line_len < 0) - break; - - if (!line) - goto out_failure; - - line[--line_len] = '\0'; /* \n */ - - sep = strrchr(line, 'x'); - if (sep == NULL) - continue; - - hex2u64(sep + 1, &start); - - sep = strchr(line, ' '); - if (sep == NULL) - continue; - - *sep = '\0'; - - snprintf(name, sizeof(name), "[%s]", line); - map = machine__new_module(machine, start, name); - if (map == NULL) - goto out_delete_line; - dso__kernel_module_get_build_id(map->dso, machine->root_dir); - } + if (!machine__set_modules_path(machine)) + return 0; - free(line); - fclose(file); + pr_debug("Problems setting modules path maps, continuing anyway...\n"); - if (machine__set_modules_path(machine) < 0) { - pr_debug("Problems setting modules path maps, continuing anyway...\n"); - } return 0; - -out_delete_line: - free(line); -out_failure: - return -1; } int machine__create_kernel_maps(struct machine *machine) @@ -1267,10 +1253,12 @@ static int machine__resolve_callchain_sample(struct machine *machine, struct thread *thread, struct ip_callchain *chain, struct symbol **parent, - struct addr_location *root_al) + struct addr_location *root_al, + int max_stack) { u8 cpumode = PERF_RECORD_MISC_USER; - unsigned int i; + int chain_nr = min(max_stack, (int)chain->nr); + int i; int err; callchain_cursor_reset(&callchain_cursor); @@ -1280,7 +1268,7 @@ static int machine__resolve_callchain_sample(struct machine *machine, return 0; } - for (i = 0; i < chain->nr; i++) { + for (i = 0; i < chain_nr; i++) { u64 ip; struct addr_location al; @@ -1352,12 +1340,14 @@ int machine__resolve_callchain(struct machine *machine, struct thread *thread, struct perf_sample *sample, struct symbol **parent, - struct addr_location *root_al) + struct addr_location *root_al, + int max_stack) { int ret; ret = machine__resolve_callchain_sample(machine, thread, - sample->callchain, parent, root_al); + sample->callchain, parent, + root_al, max_stack); if (ret) return ret; @@ -1376,3 +1366,26 @@ int machine__resolve_callchain(struct machine *machine, sample); } + +int machine__for_each_thread(struct machine *machine, + int (*fn)(struct thread *thread, void *p), + void *priv) +{ + struct rb_node *nd; + struct thread *thread; + int rc = 0; + + for (nd = rb_first(&machine->threads); nd; nd = rb_next(nd)) { + thread = rb_entry(nd, struct thread, rb_node); + rc = fn(thread, priv); + if (rc != 0) + return rc; + } + + list_for_each_entry(thread, &machine->dead_threads, node) { + rc = fn(thread, priv); + if (rc != 0) + return rc; + } + return rc; +} diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index 58a6be1fc739..4c1f5d567f54 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -74,6 +74,7 @@ char *machine__mmap_name(struct machine *machine, char *bf, size_t size); void machines__set_symbol_filter(struct machines *machines, symbol_filter_t symbol_filter); +struct machine *machine__new_host(void); int machine__init(struct machine *machine, const char *root_dir, pid_t pid); void machine__exit(struct machine *machine); void machine__delete_dead_threads(struct machine *machine); @@ -91,7 +92,8 @@ int machine__resolve_callchain(struct machine *machine, struct thread *thread, struct perf_sample *sample, struct symbol **parent, - struct addr_location *root_al); + struct addr_location *root_al, + int max_stack); /* * Default guest kernel is defined by parameter --guestkallsyms @@ -165,4 +167,8 @@ void machines__destroy_kernel_maps(struct machines *machines); size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp); +int machine__for_each_thread(struct machine *machine, + int (*fn)(struct thread *thread, void *p), + void *priv); + #endif /* __PERF_MACHINE_H */ diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 4f6680d2043b..ef5bc913ca7a 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -172,7 +172,7 @@ int map__load(struct map *map, symbol_filter_t filter) pr_warning(", continuing without symbols\n"); return -1; } else if (nr == 0) { -#ifdef LIBELF_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT const size_t len = strlen(name); const size_t real_len = len - sizeof(DSO__DELETED); @@ -252,10 +252,16 @@ size_t map__fprintf_dsoname(struct map *map, FILE *fp) return fprintf(fp, "%s", dsoname); } -/* +/** + * map__rip_2objdump - convert symbol start address to objdump address. + * @map: memory map + * @rip: symbol start address + * * objdump wants/reports absolute IPs for ET_EXEC, and RIPs for ET_DYN. * map->dso->adjust_symbols==1 for ET_EXEC-like cases except ET_REL which is * relative to section start. + * + * Return: Address suitable for passing to "objdump --start-address=" */ u64 map__rip_2objdump(struct map *map, u64 rip) { @@ -268,6 +274,29 @@ u64 map__rip_2objdump(struct map *map, u64 rip) return map->unmap_ip(map, rip); } +/** + * map__objdump_2mem - convert objdump address to a memory address. + * @map: memory map + * @ip: objdump address + * + * Closely related to map__rip_2objdump(), this function takes an address from + * objdump and converts it to a memory address. Note this assumes that @map + * contains the address. To be sure the result is valid, check it forwards + * e.g. map__rip_2objdump(map->map_ip(map, map__objdump_2mem(map, ip))) == ip + * + * Return: Memory address. + */ +u64 map__objdump_2mem(struct map *map, u64 ip) +{ + if (!map->dso->adjust_symbols) + return map->unmap_ip(map, ip); + + if (map->dso->rel) + return map->unmap_ip(map, ip + map->pgoff); + + return ip; +} + void map_groups__init(struct map_groups *mg) { int i; @@ -371,6 +400,23 @@ struct symbol *map_groups__find_symbol_by_name(struct map_groups *mg, return NULL; } +int map_groups__find_ams(struct addr_map_symbol *ams, symbol_filter_t filter) +{ + if (ams->addr < ams->map->start || ams->addr > ams->map->end) { + if (ams->map->groups == NULL) + return -1; + ams->map = map_groups__find(ams->map->groups, ams->map->type, + ams->addr); + if (ams->map == NULL) + return -1; + } + + ams->al_addr = ams->map->map_ip(ams->map, ams->addr); + ams->sym = map__find_symbol(ams->map, ams->al_addr, filter); + + return ams->sym ? 0 : -1; +} + size_t __map_groups__fprintf_maps(struct map_groups *mg, enum map_type type, int verbose, FILE *fp) { diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h index 4886ca280536..e4e259c3ba16 100644 --- a/tools/perf/util/map.h +++ b/tools/perf/util/map.h @@ -84,6 +84,9 @@ static inline u64 identity__map_ip(struct map *map __maybe_unused, u64 ip) /* rip/ip <-> addr suitable for passing to `objdump --start-address=` */ u64 map__rip_2objdump(struct map *map, u64 rip); +/* objdump address -> memory address */ +u64 map__objdump_2mem(struct map *map, u64 ip); + struct symbol; typedef int (*symbol_filter_t)(struct map *map, struct symbol *sym); @@ -167,6 +170,10 @@ struct symbol *map_groups__find_symbol_by_name(struct map_groups *mg, struct map **mapp, symbol_filter_t filter); +struct addr_map_symbol; + +int map_groups__find_ams(struct addr_map_symbol *ams, symbol_filter_t filter); + static inline struct symbol *map_groups__find_function_by_name(struct map_groups *mg, const char *name, struct map **mapp, diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 98125319b158..c90e55cf7e82 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -998,8 +998,10 @@ void print_tracepoint_events(const char *subsys_glob, const char *event_glob, char evt_path[MAXPATHLEN]; char dir_path[MAXPATHLEN]; - if (debugfs_valid_mountpoint(tracing_events_path)) + if (debugfs_valid_mountpoint(tracing_events_path)) { + printf(" [ Tracepoints not available: %s ]\n", strerror(errno)); return; + } sys_dir = opendir(tracing_events_path); if (!sys_dir) diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l index 91346b753960..343299575b30 100644 --- a/tools/perf/util/parse-events.l +++ b/tools/perf/util/parse-events.l @@ -126,6 +126,37 @@ modifier_bp [rwx]{1,3} } +{ +config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); } +config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); } +config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); } +name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); } +period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); } +branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); } +, { return ','; } +"/" { BEGIN(INITIAL); return '/'; } +{name_minus} { return str(yyscanner, PE_NAME); } +} + +{ +{modifier_bp} { return str(yyscanner, PE_MODIFIER_BP); } +: { return ':'; } +{num_dec} { return value(yyscanner, 10); } +{num_hex} { return value(yyscanner, 16); } + /* + * We need to separate 'mem:' scanner part, in order to get specific + * modifier bits parsed out. Otherwise we would need to handle PE_NAME + * and we'd need to parse it manually. During the escape from + * state we need to put the escaping char back, so we dont miss it. + */ +. { unput(*yytext); BEGIN(INITIAL); } + /* + * We destroy the scanner after reaching EOF, + * but anyway just to be sure get back to INIT state. + */ +<> { BEGIN(INITIAL); } +} + cpu-cycles|cycles { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); } stalled-cycles-frontend|idle-cycles-frontend { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); } stalled-cycles-backend|idle-cycles-backend { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); } @@ -162,18 +193,6 @@ speculative-read|speculative-load | refs|Reference|ops|access | misses|miss { return str(yyscanner, PE_NAME_CACHE_OP_RESULT); } -{ -config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); } -config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); } -config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); } -name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); } -period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); } -branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); } -, { return ','; } -"/" { BEGIN(INITIAL); return '/'; } -{name_minus} { return str(yyscanner, PE_NAME); } -} - mem: { BEGIN(mem); return PE_PREFIX_MEM; } r{num_raw_hex} { return raw(yyscanner); } {num_dec} { return value(yyscanner, 10); } @@ -189,25 +208,7 @@ r{num_raw_hex} { return raw(yyscanner); } "}" { return '}'; } = { return '='; } \n { } - -{ -{modifier_bp} { return str(yyscanner, PE_MODIFIER_BP); } -: { return ':'; } -{num_dec} { return value(yyscanner, 10); } -{num_hex} { return value(yyscanner, 16); } - /* - * We need to separate 'mem:' scanner part, in order to get specific - * modifier bits parsed out. Otherwise we would need to handle PE_NAME - * and we'd need to parse it manually. During the escape from - * state we need to put the escaping char back, so we dont miss it. - */ -. { unput(*yytext); BEGIN(INITIAL); } - /* - * We destroy the scanner after reaching EOF, - * but anyway just to be sure get back to INIT state. - */ -<> { BEGIN(INITIAL); } -} +. { } %% diff --git a/tools/perf/util/path.c b/tools/perf/util/path.c index a8c49548ca48..5d13cb45b317 100644 --- a/tools/perf/util/path.c +++ b/tools/perf/util/path.c @@ -22,19 +22,23 @@ static const char *get_perf_dir(void) return "."; } -#ifndef HAVE_STRLCPY -size_t strlcpy(char *dest, const char *src, size_t size) +/* + * If libc has strlcpy() then that version will override this + * implementation: + */ +size_t __weak strlcpy(char *dest, const char *src, size_t size) { size_t ret = strlen(src); if (size) { size_t len = (ret >= size) ? size - 1 : ret; + memcpy(dest, src, len); dest[len] = '\0'; } + return ret; } -#endif static char *get_pathname(void) { diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index 5a4f2b6f3738..a3d42cd74919 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -1,7 +1,7 @@ #ifndef __PERF_REGS_H #define __PERF_REGS_H -#ifdef HAVE_PERF_REGS +#ifdef HAVE_PERF_REGS_SUPPORT #include #else #define PERF_REGS_MASK 0 @@ -10,5 +10,5 @@ static inline const char *perf_reg_name(int id __maybe_unused) { return NULL; } -#endif /* HAVE_PERF_REGS */ +#endif /* HAVE_PERF_REGS_SUPPORT */ #endif /* __PERF_REGS_H */ diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index bc9d8069d376..64362fe45b71 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -637,3 +637,19 @@ void print_pmu_events(const char *event_glob, bool name_only) printf("\n"); free(aliases); } + +bool pmu_have_event(const char *pname, const char *name) +{ + struct perf_pmu *pmu; + struct perf_pmu_alias *alias; + + pmu = NULL; + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (strcmp(pname, pmu->name)) + continue; + list_for_each_entry(alias, &pmu->aliases, list) + if (!strcmp(alias->name, name)) + return true; + } + return false; +} diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 6b2cbe2d4cc3..1179b26f244a 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -42,6 +42,7 @@ int perf_pmu__format_parse(char *dir, struct list_head *head); struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu); void print_pmu_events(const char *event_glob, bool name_only); +bool pmu_have_event(const char *pname, const char *name); int perf_pmu__test(void); #endif /* __PMU_H */ diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index aa04bf9c9ad7..779b2dacd43f 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -201,7 +201,7 @@ static int convert_to_perf_probe_point(struct probe_trace_point *tp, return 0; } -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT /* Open new debuginfo of given module */ static struct debuginfo *open_debuginfo(const char *module) { @@ -630,7 +630,7 @@ int show_available_vars(struct perf_probe_event *pevs, int npevs, return ret; } -#else /* !DWARF_SUPPORT */ +#else /* !HAVE_DWARF_SUPPORT */ static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, struct perf_probe_point *pp) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index 20c7299a9d4e..f0692737ebf1 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -118,7 +118,6 @@ static const Dwfl_Callbacks offline_callbacks = { static int debuginfo__init_offline_dwarf(struct debuginfo *self, const char *path) { - Dwfl_Module *mod; int fd; fd = open(path, O_RDONLY); @@ -129,11 +128,11 @@ static int debuginfo__init_offline_dwarf(struct debuginfo *self, if (!self->dwfl) goto error; - mod = dwfl_report_offline(self->dwfl, "", "", fd); - if (!mod) + self->mod = dwfl_report_offline(self->dwfl, "", "", fd); + if (!self->mod) goto error; - self->dbg = dwfl_module_getdwarf(mod, &self->bias); + self->dbg = dwfl_module_getdwarf(self->mod, &self->bias); if (!self->dbg) goto error; @@ -676,37 +675,42 @@ static int find_variable(Dwarf_Die *sc_die, struct probe_finder *pf) } /* Convert subprogram DIE to trace point */ -static int convert_to_trace_point(Dwarf_Die *sp_die, Dwarf_Addr paddr, - bool retprobe, struct probe_trace_point *tp) +static int convert_to_trace_point(Dwarf_Die *sp_die, Dwfl_Module *mod, + Dwarf_Addr paddr, bool retprobe, + struct probe_trace_point *tp) { Dwarf_Addr eaddr, highaddr; - const char *name; - - /* Copy the name of probe point */ - name = dwarf_diename(sp_die); - if (name) { - if (dwarf_entrypc(sp_die, &eaddr) != 0) { - pr_warning("Failed to get entry address of %s\n", - dwarf_diename(sp_die)); - return -ENOENT; - } - if (dwarf_highpc(sp_die, &highaddr) != 0) { - pr_warning("Failed to get end address of %s\n", - dwarf_diename(sp_die)); - return -ENOENT; - } - if (paddr > highaddr) { - pr_warning("Offset specified is greater than size of %s\n", - dwarf_diename(sp_die)); - return -EINVAL; - } - tp->symbol = strdup(name); - if (tp->symbol == NULL) - return -ENOMEM; - tp->offset = (unsigned long)(paddr - eaddr); - } else - /* This function has no name. */ - tp->offset = (unsigned long)paddr; + GElf_Sym sym; + const char *symbol; + + /* Verify the address is correct */ + if (dwarf_entrypc(sp_die, &eaddr) != 0) { + pr_warning("Failed to get entry address of %s\n", + dwarf_diename(sp_die)); + return -ENOENT; + } + if (dwarf_highpc(sp_die, &highaddr) != 0) { + pr_warning("Failed to get end address of %s\n", + dwarf_diename(sp_die)); + return -ENOENT; + } + if (paddr > highaddr) { + pr_warning("Offset specified is greater than size of %s\n", + dwarf_diename(sp_die)); + return -EINVAL; + } + + /* Get an appropriate symbol from symtab */ + symbol = dwfl_module_addrsym(mod, paddr, &sym, NULL); + if (!symbol) { + pr_warning("Failed to find symbol at 0x%lx\n", + (unsigned long)paddr); + return -ENOENT; + } + tp->offset = (unsigned long)(paddr - sym.st_value); + tp->symbol = strdup(symbol); + if (!tp->symbol) + return -ENOMEM; /* Return probe must be on the head of a subprogram */ if (retprobe) { @@ -1149,7 +1153,7 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, struct probe_finder *pf) tev = &tf->tevs[tf->ntevs++]; /* Trace point should be converted from subprogram DIE */ - ret = convert_to_trace_point(&pf->sp_die, pf->addr, + ret = convert_to_trace_point(&pf->sp_die, tf->mod, pf->addr, pf->pev->point.retprobe, &tev->point); if (ret < 0) return ret; @@ -1181,7 +1185,7 @@ int debuginfo__find_trace_events(struct debuginfo *self, { struct trace_event_finder tf = { .pf = {.pev = pev, .callback = add_probe_trace_event}, - .max_tevs = max_tevs}; + .mod = self->mod, .max_tevs = max_tevs}; int ret; /* Allocate result tevs array */ @@ -1250,7 +1254,7 @@ static int add_available_vars(Dwarf_Die *sc_die, struct probe_finder *pf) vl = &af->vls[af->nvls++]; /* Trace point should be converted from subprogram DIE */ - ret = convert_to_trace_point(&pf->sp_die, pf->addr, + ret = convert_to_trace_point(&pf->sp_die, af->mod, pf->addr, pf->pev->point.retprobe, &vl->point); if (ret < 0) return ret; @@ -1289,6 +1293,7 @@ int debuginfo__find_available_vars_at(struct debuginfo *self, { struct available_var_finder af = { .pf = {.pev = pev, .callback = add_available_vars}, + .mod = self->mod, .max_vls = max_vls, .externs = externs}; int ret; @@ -1322,8 +1327,8 @@ int debuginfo__find_probe_point(struct debuginfo *self, unsigned long addr, struct perf_probe_point *ppt) { Dwarf_Die cudie, spdie, indie; - Dwarf_Addr _addr, baseaddr; - const char *fname = NULL, *func = NULL, *tmp; + Dwarf_Addr _addr = 0, baseaddr = 0; + const char *fname = NULL, *func = NULL, *basefunc = NULL, *tmp; int baseline = 0, lineno = 0, ret = 0; /* Adjust address with bias */ @@ -1344,27 +1349,36 @@ int debuginfo__find_probe_point(struct debuginfo *self, unsigned long addr, /* Find a corresponding function (name, baseline and baseaddr) */ if (die_find_realfunc(&cudie, (Dwarf_Addr)addr, &spdie)) { /* Get function entry information */ - tmp = dwarf_diename(&spdie); - if (!tmp || + func = basefunc = dwarf_diename(&spdie); + if (!func || dwarf_entrypc(&spdie, &baseaddr) != 0 || - dwarf_decl_line(&spdie, &baseline) != 0) + dwarf_decl_line(&spdie, &baseline) != 0) { + lineno = 0; goto post; - func = tmp; + } - if (addr == (unsigned long)baseaddr) + fname = dwarf_decl_file(&spdie); + if (addr == (unsigned long)baseaddr) { /* Function entry - Relative line number is 0 */ lineno = baseline; - else if (die_find_inlinefunc(&spdie, (Dwarf_Addr)addr, - &indie)) { + goto post; + } + + /* Track down the inline functions step by step */ + while (die_find_top_inlinefunc(&spdie, (Dwarf_Addr)addr, + &indie)) { + /* There is an inline function */ if (dwarf_entrypc(&indie, &_addr) == 0 && - _addr == addr) + _addr == addr) { /* * addr is at an inline function entry. * In this case, lineno should be the call-site - * line number. + * line number. (overwrite lineinfo) */ lineno = die_get_call_lineno(&indie); - else { + fname = die_get_call_file(&indie); + break; + } else { /* * addr is in an inline function body. * Since lineno points one of the lines @@ -1372,19 +1386,27 @@ int debuginfo__find_probe_point(struct debuginfo *self, unsigned long addr, * be the entry line of the inline function. */ tmp = dwarf_diename(&indie); - if (tmp && - dwarf_decl_line(&spdie, &baseline) == 0) - func = tmp; + if (!tmp || + dwarf_decl_line(&indie, &baseline) != 0) + break; + func = tmp; + spdie = indie; } } + /* Verify the lineno and baseline are in a same file */ + tmp = dwarf_decl_file(&spdie); + if (!tmp || strcmp(tmp, fname) != 0) + lineno = 0; } post: /* Make a relative line number or an offset */ if (lineno) ppt->line = lineno - baseline; - else if (func) + else if (basefunc) { ppt->offset = addr - (unsigned long)baseaddr; + func = basefunc; + } /* Duplicate strings */ if (func) { diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h index 17e94d0c36f9..3f0c29dd6ac5 100644 --- a/tools/perf/util/probe-finder.h +++ b/tools/perf/util/probe-finder.h @@ -14,7 +14,7 @@ static inline int is_c_varname(const char *name) return isalpha(name[0]) || name[0] == '_'; } -#ifdef DWARF_SUPPORT +#ifdef HAVE_DWARF_SUPPORT #include "dwarf-aux.h" @@ -23,6 +23,7 @@ static inline int is_c_varname(const char *name) /* debug information structure */ struct debuginfo { Dwarf *dbg; + Dwfl_Module *mod; Dwfl *dwfl; Dwarf_Addr bias; }; @@ -77,6 +78,7 @@ struct probe_finder { struct trace_event_finder { struct probe_finder pf; + Dwfl_Module *mod; /* For solving symbols */ struct probe_trace_event *tevs; /* Found trace events */ int ntevs; /* Number of trace events */ int max_tevs; /* Max number of trace events */ @@ -84,6 +86,7 @@ struct trace_event_finder { struct available_var_finder { struct probe_finder pf; + Dwfl_Module *mod; /* For solving symbols */ struct variable_list *vls; /* Found variable lists */ int nvls; /* Number of variable lists */ int max_vls; /* Max no. of variable lists */ @@ -102,6 +105,6 @@ struct line_finder { int found; }; -#endif /* DWARF_SUPPORT */ +#endif /* HAVE_DWARF_SUPPORT */ #endif /*_PROBE_FINDER_H */ diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 71b5412bbbb9..06efd027b1db 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -33,13 +33,6 @@ int eprintf(int level, const char *fmt, ...) # define PyVarObject_HEAD_INIT(type, size) PyObject_HEAD_INIT(type) size, #endif -struct throttle_event { - struct perf_event_header header; - u64 time; - u64 id; - u64 stream_id; -}; - PyMODINIT_FUNC initperf(void); #define member_def(type, member, ptype, help) \ @@ -1036,6 +1029,7 @@ PyMODINIT_FUNC initperf(void) pyrf_cpu_map__setup_types() < 0) return; + /* The page_size is placed in util object. */ page_size = sysconf(_SC_PAGE_SIZE); Py_INCREF(&pyrf_evlist__type); diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c index a16cdd2625ad..0dfe27d99458 100644 --- a/tools/perf/util/rblist.c +++ b/tools/perf/util/rblist.c @@ -48,10 +48,12 @@ void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node) rblist->node_delete(rblist, rb_node); } -struct rb_node *rblist__find(struct rblist *rblist, const void *entry) +static struct rb_node *__rblist__findnew(struct rblist *rblist, + const void *entry, + bool create) { struct rb_node **p = &rblist->entries.rb_node; - struct rb_node *parent = NULL; + struct rb_node *parent = NULL, *new_node = NULL; while (*p != NULL) { int rc; @@ -67,7 +69,26 @@ struct rb_node *rblist__find(struct rblist *rblist, const void *entry) return parent; } - return NULL; + if (create) { + new_node = rblist->node_new(rblist, entry); + if (new_node) { + rb_link_node(new_node, parent, p); + rb_insert_color(new_node, &rblist->entries); + ++rblist->nr_entries; + } + } + + return new_node; +} + +struct rb_node *rblist__find(struct rblist *rblist, const void *entry) +{ + return __rblist__findnew(rblist, entry, false); +} + +struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry) +{ + return __rblist__findnew(rblist, entry, true); } void rblist__init(struct rblist *rblist) diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h index 6d0cae5ae83d..ff9913b994c2 100644 --- a/tools/perf/util/rblist.h +++ b/tools/perf/util/rblist.h @@ -32,6 +32,7 @@ void rblist__delete(struct rblist *rblist); int rblist__add_node(struct rblist *rblist, const void *new_entry); void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node); struct rb_node *rblist__find(struct rblist *rblist, const void *entry); +struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry); struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx); static inline bool rblist__empty(const struct rblist *rblist) diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c index a85e4ae5f3ac..c0c9795c4f02 100644 --- a/tools/perf/util/scripting-engines/trace-event-perl.c +++ b/tools/perf/util/scripting-engines/trace-event-perl.c @@ -282,7 +282,7 @@ static void perl_process_tracepoint(union perf_event *perf_event __maybe_unused, event = find_cache_event(evsel); if (!event) - die("ug! no event found for type %" PRIu64, evsel->attr.config); + die("ug! no event found for type %" PRIu64, (u64)evsel->attr.config); pid = raw_field_value(event, "common_pid", data); diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 70ffa41518f3..854c5aa4db0d 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -16,73 +16,34 @@ #include "perf_regs.h" #include "vdso.h" -static int perf_session__open(struct perf_session *self, bool force) +static int perf_session__open(struct perf_session *self) { - struct stat input_stat; - - if (!strcmp(self->filename, "-")) { - self->fd_pipe = true; - self->fd = STDIN_FILENO; - - if (perf_session__read_header(self) < 0) - pr_err("incompatible file format (rerun with -v to learn more)"); - - return 0; - } - - self->fd = open(self->filename, O_RDONLY); - if (self->fd < 0) { - int err = errno; - - pr_err("failed to open %s: %s", self->filename, strerror(err)); - if (err == ENOENT && !strcmp(self->filename, "perf.data")) - pr_err(" (try 'perf record' first)"); - pr_err("\n"); - return -errno; - } - - if (fstat(self->fd, &input_stat) < 0) - goto out_close; - - if (!force && input_stat.st_uid && (input_stat.st_uid != geteuid())) { - pr_err("file %s not owned by current user or root\n", - self->filename); - goto out_close; - } - - if (!input_stat.st_size) { - pr_info("zero-sized file (%s), nothing to do!\n", - self->filename); - goto out_close; - } + struct perf_data_file *file = self->file; if (perf_session__read_header(self) < 0) { pr_err("incompatible file format (rerun with -v to learn more)"); - goto out_close; + return -1; } + if (perf_data_file__is_pipe(file)) + return 0; + if (!perf_evlist__valid_sample_type(self->evlist)) { pr_err("non matching sample_type"); - goto out_close; + return -1; } if (!perf_evlist__valid_sample_id_all(self->evlist)) { pr_err("non matching sample_id_all"); - goto out_close; + return -1; } if (!perf_evlist__valid_read_format(self->evlist)) { pr_err("non matching read_format"); - goto out_close; + return -1; } - self->size = input_stat.st_size; return 0; - -out_close: - close(self->fd); - self->fd = -1; - return -1; } void perf_session__set_id_hdr_size(struct perf_session *session) @@ -106,39 +67,36 @@ static void perf_session__destroy_kernel_maps(struct perf_session *self) machines__destroy_kernel_maps(&self->machines); } -struct perf_session *perf_session__new(const char *filename, int mode, - bool force, bool repipe, - struct perf_tool *tool) +struct perf_session *perf_session__new(struct perf_data_file *file, + bool repipe, struct perf_tool *tool) { struct perf_session *self; - struct stat st; - size_t len; - - if (!filename || !strlen(filename)) { - if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode)) - filename = "-"; - else - filename = "perf.data"; - } - - len = strlen(filename); - self = zalloc(sizeof(*self) + len); - if (self == NULL) + self = zalloc(sizeof(*self)); + if (!self) goto out; - memcpy(self->filename, filename, len); self->repipe = repipe; INIT_LIST_HEAD(&self->ordered_samples.samples); INIT_LIST_HEAD(&self->ordered_samples.sample_cache); INIT_LIST_HEAD(&self->ordered_samples.to_free); machines__init(&self->machines); - if (mode == O_RDONLY) { - if (perf_session__open(self, force) < 0) + if (file) { + if (perf_data_file__open(file)) goto out_delete; - perf_session__set_id_hdr_size(self); - } else if (mode == O_WRONLY) { + + self->file = file; + + if (perf_data_file__is_read(file)) { + if (perf_session__open(self) < 0) + goto out_close; + + perf_session__set_id_hdr_size(self); + } + } + + if (!file || perf_data_file__is_write(file)) { /* * In O_RDONLY mode this will be performed when reading the * kernel MMAP event, in perf_event__process_mmap(). @@ -153,10 +111,13 @@ struct perf_session *perf_session__new(const char *filename, int mode, tool->ordered_samples = false; } -out: return self; -out_delete: + + out_close: + perf_data_file__close(file); + out_delete: perf_session__delete(self); + out: return NULL; } @@ -193,7 +154,8 @@ void perf_session__delete(struct perf_session *self) perf_session__delete_threads(self); perf_session_env__delete(&self->header.env); machines__exit(&self->machines); - close(self->fd); + if (self->file) + perf_data_file__close(self->file); free(self); vdso__exit(); } @@ -256,6 +218,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) tool->sample = process_event_sample_stub; if (tool->mmap == NULL) tool->mmap = process_event_stub; + if (tool->mmap2 == NULL) + tool->mmap2 = process_event_stub; if (tool->comm == NULL) tool->comm = process_event_stub; if (tool->fork == NULL) @@ -395,6 +359,17 @@ static void perf_event__read_swap(union perf_event *event, bool sample_id_all) swap_sample_id_all(event, &event->read + 1); } +static void perf_event__throttle_swap(union perf_event *event, + bool sample_id_all) +{ + event->throttle.time = bswap_64(event->throttle.time); + event->throttle.id = bswap_64(event->throttle.id); + event->throttle.stream_id = bswap_64(event->throttle.stream_id); + + if (sample_id_all) + swap_sample_id_all(event, &event->throttle + 1); +} + static u8 revbyte(u8 b) { int rev = (b >> 4) | ((b & 0xf) << 4); @@ -440,6 +415,9 @@ void perf_event__attr_swap(struct perf_event_attr *attr) attr->bp_type = bswap_32(attr->bp_type); attr->bp_addr = bswap_64(attr->bp_addr); attr->bp_len = bswap_64(attr->bp_len); + attr->branch_sample_type = bswap_64(attr->branch_sample_type); + attr->sample_regs_user = bswap_64(attr->sample_regs_user); + attr->sample_stack_user = bswap_32(attr->sample_stack_user); swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64)); } @@ -480,6 +458,8 @@ static perf_event__swap_op perf_event__swap_ops[] = { [PERF_RECORD_EXIT] = perf_event__task_swap, [PERF_RECORD_LOST] = perf_event__all64_swap, [PERF_RECORD_READ] = perf_event__read_swap, + [PERF_RECORD_THROTTLE] = perf_event__throttle_swap, + [PERF_RECORD_UNTHROTTLE] = perf_event__throttle_swap, [PERF_RECORD_SAMPLE] = perf_event__all64_swap, [PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap, [PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap, @@ -858,6 +838,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event, if (sample_type & PERF_SAMPLE_DATA_SRC) printf(" . data_src: 0x%"PRIx64"\n", sample->data_src); + if (sample_type & PERF_SAMPLE_TRANSACTION) + printf("... transaction: %" PRIx64 "\n", sample->transaction); + if (sample_type & PERF_SAMPLE_READ) sample_read__printf(sample, evsel->attr.read_format); } @@ -1029,6 +1012,7 @@ static int perf_session_deliver_event(struct perf_session *session, static int perf_session__process_user_event(struct perf_session *session, union perf_event *event, struct perf_tool *tool, u64 file_offset) { + int fd = perf_data_file__fd(session->file); int err; dump_event(session, event, file_offset, NULL); @@ -1042,7 +1026,7 @@ static int perf_session__process_user_event(struct perf_session *session, union return err; case PERF_RECORD_HEADER_TRACING_DATA: /* setup for reading amidst mmap */ - lseek(session->fd, file_offset, SEEK_SET); + lseek(fd, file_offset, SEEK_SET); return tool->tracing_data(tool, event, session); case PERF_RECORD_HEADER_BUILD_ID: return tool->build_id(tool, event, session); @@ -1168,6 +1152,7 @@ volatile int session_done; static int __perf_session__process_pipe_events(struct perf_session *self, struct perf_tool *tool) { + int fd = perf_data_file__fd(self->file); union perf_event *event; uint32_t size, cur_size = 0; void *buf = NULL; @@ -1186,7 +1171,7 @@ static int __perf_session__process_pipe_events(struct perf_session *self, return -errno; more: event = buf; - err = readn(self->fd, event, sizeof(struct perf_event_header)); + err = readn(fd, event, sizeof(struct perf_event_header)); if (err <= 0) { if (err == 0) goto done; @@ -1218,7 +1203,7 @@ more: p += sizeof(struct perf_event_header); if (size - sizeof(struct perf_event_header)) { - err = readn(self->fd, p, size - sizeof(struct perf_event_header)); + err = readn(fd, p, size - sizeof(struct perf_event_header)); if (err <= 0) { if (err == 0) { pr_err("unexpected end of event stream\n"); @@ -1245,7 +1230,9 @@ more: if (!session_done()) goto more; done: - err = 0; + /* do the final flush for ordered samples */ + self->ordered_samples.next_flush = ULLONG_MAX; + err = flush_sample_queue(self, tool); out_err: free(buf); perf_session__warn_about_errors(self, tool); @@ -1297,6 +1284,7 @@ int __perf_session__process_events(struct perf_session *session, u64 data_offset, u64 data_size, u64 file_size, struct perf_tool *tool) { + int fd = perf_data_file__fd(session->file); u64 head, page_offset, file_offset, file_pos, progress_next; int err, mmap_prot, mmap_flags, map_idx = 0; size_t mmap_size; @@ -1310,7 +1298,7 @@ int __perf_session__process_events(struct perf_session *session, file_offset = page_offset; head = data_offset - page_offset; - if (data_offset + data_size < file_size) + if (data_size && (data_offset + data_size < file_size)) file_size = data_offset + data_size; progress_next = file_size / 16; @@ -1329,7 +1317,7 @@ int __perf_session__process_events(struct perf_session *session, mmap_flags = MAP_PRIVATE; } remap: - buf = mmap(NULL, mmap_size, mmap_prot, mmap_flags, session->fd, + buf = mmap(NULL, mmap_size, mmap_prot, mmap_flags, fd, file_offset); if (buf == MAP_FAILED) { pr_err("failed to mmap file\n"); @@ -1374,13 +1362,13 @@ more: "Processing events..."); } - err = 0; if (session_done()) - goto out_err; + goto out; if (file_pos < file_size) goto more; +out: /* do the final flush for ordered samples */ session->ordered_samples.next_flush = ULLONG_MAX; err = flush_sample_queue(session, tool); @@ -1394,16 +1382,17 @@ out_err: int perf_session__process_events(struct perf_session *self, struct perf_tool *tool) { + u64 size = perf_data_file__size(self->file); int err; if (perf_session__register_idle_thread(self) == NULL) return -ENOMEM; - if (!self->fd_pipe) + if (!perf_data_file__is_pipe(self->file)) err = __perf_session__process_events(self, self->header.data_offset, self->header.data_size, - self->size, tool); + size, tool); else err = __perf_session__process_pipe_events(self, tool); @@ -1523,7 +1512,8 @@ void perf_evsel__print_ip(struct perf_evsel *evsel, union perf_event *event, if (symbol_conf.use_callchain && sample->callchain) { if (machine__resolve_callchain(machine, evsel, al.thread, - sample, NULL, NULL) != 0) { + sample, NULL, NULL, + PERF_MAX_STACK_DEPTH) != 0) { if (verbose) error("Failed to resolve callchain. Skipping\n"); return; @@ -1627,13 +1617,14 @@ int perf_session__cpu_bitmap(struct perf_session *session, void perf_session__fprintf_info(struct perf_session *session, FILE *fp, bool full) { + int fd = perf_data_file__fd(session->file); struct stat st; int ret; if (session == NULL || fp == NULL) return; - ret = fstat(session->fd, &st); + ret = fstat(fd, &st); if (ret == -1) return; diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h index 04bf7373a7e5..27c74d38b868 100644 --- a/tools/perf/util/session.h +++ b/tools/perf/util/session.h @@ -7,6 +7,7 @@ #include "machine.h" #include "symbol.h" #include "thread.h" +#include "data.h" #include #include @@ -29,16 +30,13 @@ struct ordered_samples { struct perf_session { struct perf_header header; - unsigned long size; struct machines machines; struct perf_evlist *evlist; struct pevent *pevent; struct events_stats stats; - int fd; - bool fd_pipe; bool repipe; struct ordered_samples ordered_samples; - char filename[1]; + struct perf_data_file *file; }; #define PRINT_IP_OPT_IP (1<<0) @@ -49,9 +47,8 @@ struct perf_session { struct perf_tool; -struct perf_session *perf_session__new(const char *filename, int mode, - bool force, bool repipe, - struct perf_tool *tool); +struct perf_session *perf_session__new(struct perf_data_file *file, + bool repipe, struct perf_tool *tool); void perf_session__delete(struct perf_session *session); void perf_event_header__bswap(struct perf_event_header *self); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 5f118a089519..1f9821db9e77 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -182,9 +182,19 @@ static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r) static int64_t sort__sym_cmp(struct hist_entry *left, struct hist_entry *right) { + int64_t ret; + if (!left->ms.sym && !right->ms.sym) return right->level - left->level; + /* + * comparing symbol address alone is not enough since it's a + * relative address within a dso. + */ + ret = sort__dso_cmp(left, right); + if (ret != 0) + return ret; + return _sort__sym_cmp(left->ms.sym, right->ms.sym); } @@ -243,50 +253,32 @@ struct sort_entry sort_sym = { static int64_t sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right) { - return (int64_t)(right->ip - left->ip); + if (!left->srcline) { + if (!left->ms.map) + left->srcline = SRCLINE_UNKNOWN; + else { + struct map *map = left->ms.map; + left->srcline = get_srcline(map->dso, + map__rip_2objdump(map, left->ip)); + } + } + if (!right->srcline) { + if (!right->ms.map) + right->srcline = SRCLINE_UNKNOWN; + else { + struct map *map = right->ms.map; + right->srcline = get_srcline(map->dso, + map__rip_2objdump(map, right->ip)); + } + } + return strcmp(left->srcline, right->srcline); } static int hist_entry__srcline_snprintf(struct hist_entry *self, char *bf, size_t size, unsigned int width __maybe_unused) { - FILE *fp = NULL; - char cmd[PATH_MAX + 2], *path = self->srcline, *nl; - size_t line_len; - - if (path != NULL) - goto out_path; - - if (!self->ms.map) - goto out_ip; - - if (!strncmp(self->ms.map->dso->long_name, "/tmp/perf-", 10)) - goto out_ip; - - snprintf(cmd, sizeof(cmd), "addr2line -e %s %016" PRIx64, - self->ms.map->dso->long_name, self->ip); - fp = popen(cmd, "r"); - if (!fp) - goto out_ip; - - if (getline(&path, &line_len, fp) < 0 || !line_len) - goto out_ip; - self->srcline = strdup(path); - if (self->srcline == NULL) - goto out_ip; - - nl = strchr(self->srcline, '\n'); - if (nl != NULL) - *nl = '\0'; - path = self->srcline; -out_path: - if (fp) - pclose(fp); - return repsep_snprintf(bf, size, "%s", path); -out_ip: - if (fp) - pclose(fp); - return repsep_snprintf(bf, size, "%-#*llx", BITS_PER_LONG / 4, self->ip); + return repsep_snprintf(bf, size, "%s", self->srcline); } struct sort_entry sort_srcline = { @@ -858,6 +850,127 @@ struct sort_entry sort_mem_snoop = { .se_width_idx = HISTC_MEM_SNOOP, }; +static int64_t +sort__abort_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->branch_info->flags.abort != + right->branch_info->flags.abort; +} + +static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + static const char *out = "."; + + if (self->branch_info->flags.abort) + out = "A"; + return repsep_snprintf(bf, size, "%-*s", width, out); +} + +struct sort_entry sort_abort = { + .se_header = "Transaction abort", + .se_cmp = sort__abort_cmp, + .se_snprintf = hist_entry__abort_snprintf, + .se_width_idx = HISTC_ABORT, +}; + +static int64_t +sort__in_tx_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->branch_info->flags.in_tx != + right->branch_info->flags.in_tx; +} + +static int hist_entry__in_tx_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + static const char *out = "."; + + if (self->branch_info->flags.in_tx) + out = "T"; + + return repsep_snprintf(bf, size, "%-*s", width, out); +} + +struct sort_entry sort_in_tx = { + .se_header = "Branch in transaction", + .se_cmp = sort__in_tx_cmp, + .se_snprintf = hist_entry__in_tx_snprintf, + .se_width_idx = HISTC_IN_TX, +}; + +static int64_t +sort__transaction_cmp(struct hist_entry *left, struct hist_entry *right) +{ + return left->transaction - right->transaction; +} + +static inline char *add_str(char *p, const char *str) +{ + strcpy(p, str); + return p + strlen(str); +} + +static struct txbit { + unsigned flag; + const char *name; + int skip_for_len; +} txbits[] = { + { PERF_TXN_ELISION, "EL ", 0 }, + { PERF_TXN_TRANSACTION, "TX ", 1 }, + { PERF_TXN_SYNC, "SYNC ", 1 }, + { PERF_TXN_ASYNC, "ASYNC ", 0 }, + { PERF_TXN_RETRY, "RETRY ", 0 }, + { PERF_TXN_CONFLICT, "CON ", 0 }, + { PERF_TXN_CAPACITY_WRITE, "CAP-WRITE ", 1 }, + { PERF_TXN_CAPACITY_READ, "CAP-READ ", 0 }, + { 0, NULL, 0 } +}; + +int hist_entry__transaction_len(void) +{ + int i; + int len = 0; + + for (i = 0; txbits[i].name; i++) { + if (!txbits[i].skip_for_len) + len += strlen(txbits[i].name); + } + len += 4; /* :XX */ + return len; +} + +static int hist_entry__transaction_snprintf(struct hist_entry *self, char *bf, + size_t size, unsigned int width) +{ + u64 t = self->transaction; + char buf[128]; + char *p = buf; + int i; + + buf[0] = 0; + for (i = 0; txbits[i].name; i++) + if (txbits[i].flag & t) + p = add_str(p, txbits[i].name); + if (t && !(t & (PERF_TXN_SYNC|PERF_TXN_ASYNC))) + p = add_str(p, "NEITHER "); + if (t & PERF_TXN_ABORT_MASK) { + sprintf(p, ":%" PRIx64, + (t & PERF_TXN_ABORT_MASK) >> + PERF_TXN_ABORT_SHIFT); + p += strlen(p); + } + + return repsep_snprintf(bf, size, "%-*s", width, buf); +} + +struct sort_entry sort_transaction = { + .se_header = "Transaction ", + .se_cmp = sort__transaction_cmp, + .se_snprintf = hist_entry__transaction_snprintf, + .se_width_idx = HISTC_TRANSACTION, +}; + struct sort_dimension { const char *name; struct sort_entry *entry; @@ -876,6 +989,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_SRCLINE, "srcline", sort_srcline), DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight), DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight), + DIM(SORT_TRANSACTION, "transaction", sort_transaction), }; #undef DIM @@ -888,6 +1002,8 @@ static struct sort_dimension bstack_sort_dimensions[] = { DIM(SORT_SYM_FROM, "symbol_from", sort_sym_from), DIM(SORT_SYM_TO, "symbol_to", sort_sym_to), DIM(SORT_MISPREDICT, "mispredict", sort_mispredict), + DIM(SORT_IN_TX, "in_tx", sort_in_tx), + DIM(SORT_ABORT, "abort", sort_abort), }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 4e80dbd271e7..bf4333694d3a 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -85,6 +85,7 @@ struct hist_entry { struct map_symbol ms; struct thread *thread; u64 ip; + u64 transaction; s32 cpu; struct hist_entry_diff diff; @@ -145,6 +146,7 @@ enum sort_type { SORT_SRCLINE, SORT_LOCAL_WEIGHT, SORT_GLOBAL_WEIGHT, + SORT_TRANSACTION, /* branch stack specific sort keys */ __SORT_BRANCH_STACK, @@ -153,6 +155,8 @@ enum sort_type { SORT_SYM_FROM, SORT_SYM_TO, SORT_MISPREDICT, + SORT_ABORT, + SORT_IN_TX, /* memory mode specific sort keys */ __SORT_MEMORY_MODE, diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c new file mode 100644 index 000000000000..d11aefbc4b8d --- /dev/null +++ b/tools/perf/util/srcline.c @@ -0,0 +1,265 @@ +#include +#include +#include + +#include + +#include "util/dso.h" +#include "util/util.h" +#include "util/debug.h" + +#ifdef HAVE_LIBBFD_SUPPORT + +/* + * Implement addr2line using libbfd. + */ +#define PACKAGE "perf" +#include + +struct a2l_data { + const char *input; + unsigned long addr; + + bool found; + const char *filename; + const char *funcname; + unsigned line; + + bfd *abfd; + asymbol **syms; +}; + +static int bfd_error(const char *string) +{ + const char *errmsg; + + errmsg = bfd_errmsg(bfd_get_error()); + fflush(stdout); + + if (string) + pr_debug("%s: %s\n", string, errmsg); + else + pr_debug("%s\n", errmsg); + + return -1; +} + +static int slurp_symtab(bfd *abfd, struct a2l_data *a2l) +{ + long storage; + long symcount; + asymbol **syms; + bfd_boolean dynamic = FALSE; + + if ((bfd_get_file_flags(abfd) & HAS_SYMS) == 0) + return bfd_error(bfd_get_filename(abfd)); + + storage = bfd_get_symtab_upper_bound(abfd); + if (storage == 0L) { + storage = bfd_get_dynamic_symtab_upper_bound(abfd); + dynamic = TRUE; + } + if (storage < 0L) + return bfd_error(bfd_get_filename(abfd)); + + syms = malloc(storage); + if (dynamic) + symcount = bfd_canonicalize_dynamic_symtab(abfd, syms); + else + symcount = bfd_canonicalize_symtab(abfd, syms); + + if (symcount < 0) { + free(syms); + return bfd_error(bfd_get_filename(abfd)); + } + + a2l->syms = syms; + return 0; +} + +static void find_address_in_section(bfd *abfd, asection *section, void *data) +{ + bfd_vma pc, vma; + bfd_size_type size; + struct a2l_data *a2l = data; + + if (a2l->found) + return; + + if ((bfd_get_section_flags(abfd, section) & SEC_ALLOC) == 0) + return; + + pc = a2l->addr; + vma = bfd_get_section_vma(abfd, section); + size = bfd_get_section_size(section); + + if (pc < vma || pc >= vma + size) + return; + + a2l->found = bfd_find_nearest_line(abfd, section, a2l->syms, pc - vma, + &a2l->filename, &a2l->funcname, + &a2l->line); +} + +static struct a2l_data *addr2line_init(const char *path) +{ + bfd *abfd; + struct a2l_data *a2l = NULL; + + abfd = bfd_openr(path, NULL); + if (abfd == NULL) + return NULL; + + if (!bfd_check_format(abfd, bfd_object)) + goto out; + + a2l = zalloc(sizeof(*a2l)); + if (a2l == NULL) + goto out; + + a2l->abfd = abfd; + a2l->input = strdup(path); + if (a2l->input == NULL) + goto out; + + if (slurp_symtab(abfd, a2l)) + goto out; + + return a2l; + +out: + if (a2l) { + free((void *)a2l->input); + free(a2l); + } + bfd_close(abfd); + return NULL; +} + +static void addr2line_cleanup(struct a2l_data *a2l) +{ + if (a2l->abfd) + bfd_close(a2l->abfd); + free((void *)a2l->input); + free(a2l->syms); + free(a2l); +} + +static int addr2line(const char *dso_name, unsigned long addr, + char **file, unsigned int *line) +{ + int ret = 0; + struct a2l_data *a2l; + + a2l = addr2line_init(dso_name); + if (a2l == NULL) { + pr_warning("addr2line_init failed for %s\n", dso_name); + return 0; + } + + a2l->addr = addr; + bfd_map_over_sections(a2l->abfd, find_address_in_section, a2l); + + if (a2l->found && a2l->filename) { + *file = strdup(a2l->filename); + *line = a2l->line; + + if (*file) + ret = 1; + } + + addr2line_cleanup(a2l); + return ret; +} + +#else /* HAVE_LIBBFD_SUPPORT */ + +static int addr2line(const char *dso_name, unsigned long addr, + char **file, unsigned int *line_nr) +{ + FILE *fp; + char cmd[PATH_MAX]; + char *filename = NULL; + size_t len; + char *sep; + int ret = 0; + + scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64, + dso_name, addr); + + fp = popen(cmd, "r"); + if (fp == NULL) { + pr_warning("popen failed for %s\n", dso_name); + return 0; + } + + if (getline(&filename, &len, fp) < 0 || !len) { + pr_warning("addr2line has no output for %s\n", dso_name); + goto out; + } + + sep = strchr(filename, '\n'); + if (sep) + *sep = '\0'; + + if (!strcmp(filename, "??:0")) { + pr_debug("no debugging info in %s\n", dso_name); + free(filename); + goto out; + } + + sep = strchr(filename, ':'); + if (sep) { + *sep++ = '\0'; + *file = filename; + *line_nr = strtoul(sep, NULL, 0); + ret = 1; + } +out: + pclose(fp); + return ret; +} +#endif /* HAVE_LIBBFD_SUPPORT */ + +char *get_srcline(struct dso *dso, unsigned long addr) +{ + char *file = NULL; + unsigned line = 0; + char *srcline; + char *dso_name = dso->long_name; + size_t size; + + if (!dso->has_srcline) + return SRCLINE_UNKNOWN; + + if (dso_name[0] == '[') + goto out; + + if (!strncmp(dso_name, "/tmp/perf-", 10)) + goto out; + + if (!addr2line(dso_name, addr, &file, &line)) + goto out; + + /* just calculate actual length */ + size = snprintf(NULL, 0, "%s:%u", file, line) + 1; + + srcline = malloc(size); + if (srcline) + snprintf(srcline, size, "%s:%u", file, line); + else + srcline = SRCLINE_UNKNOWN; + + free(file); + return srcline; + +out: + dso->has_srcline = 0; + return SRCLINE_UNKNOWN; +} + +void free_srcline(char *srcline) +{ + if (srcline && strcmp(srcline, SRCLINE_UNKNOWN) != 0) + free(srcline); +} diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index a9c829be5216..eed0b96302af 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -8,7 +8,7 @@ #include "symbol.h" #include "debug.h" -#ifndef HAVE_ELF_GETPHDRNUM +#ifndef HAVE_ELF_GETPHDRNUM_SUPPORT static int elf_getphdrnum(Elf *elf, size_t *dst) { GElf_Ehdr gehdr; @@ -487,27 +487,27 @@ int filename__read_debuglink(const char *filename, char *debuglink, ek = elf_kind(elf); if (ek != ELF_K_ELF) - goto out_close; + goto out_elf_end; if (gelf_getehdr(elf, &ehdr) == NULL) { pr_err("%s: cannot get elf header.\n", __func__); - goto out_close; + goto out_elf_end; } sec = elf_section_by_name(elf, &ehdr, &shdr, ".gnu_debuglink", NULL); if (sec == NULL) - goto out_close; + goto out_elf_end; data = elf_getdata(sec, NULL); if (data == NULL) - goto out_close; + goto out_elf_end; /* the start of this section is a zero-terminated string */ strncpy(debuglink, data->d_buf, size); +out_elf_end: elf_end(elf); - out_close: close(fd); out: @@ -1018,6 +1018,601 @@ int file__read_maps(int fd, bool exe, mapfn_t mapfn, void *data, return err; } +static int copy_bytes(int from, off_t from_offs, int to, off_t to_offs, u64 len) +{ + ssize_t r; + size_t n; + int err = -1; + char *buf = malloc(page_size); + + if (buf == NULL) + return -1; + + if (lseek(to, to_offs, SEEK_SET) != to_offs) + goto out; + + if (lseek(from, from_offs, SEEK_SET) != from_offs) + goto out; + + while (len) { + n = page_size; + if (len < n) + n = len; + /* Use read because mmap won't work on proc files */ + r = read(from, buf, n); + if (r < 0) + goto out; + if (!r) + break; + n = r; + r = write(to, buf, n); + if (r < 0) + goto out; + if ((size_t)r != n) + goto out; + len -= n; + } + + err = 0; +out: + free(buf); + return err; +} + +struct kcore { + int fd; + int elfclass; + Elf *elf; + GElf_Ehdr ehdr; +}; + +static int kcore__open(struct kcore *kcore, const char *filename) +{ + GElf_Ehdr *ehdr; + + kcore->fd = open(filename, O_RDONLY); + if (kcore->fd == -1) + return -1; + + kcore->elf = elf_begin(kcore->fd, ELF_C_READ, NULL); + if (!kcore->elf) + goto out_close; + + kcore->elfclass = gelf_getclass(kcore->elf); + if (kcore->elfclass == ELFCLASSNONE) + goto out_end; + + ehdr = gelf_getehdr(kcore->elf, &kcore->ehdr); + if (!ehdr) + goto out_end; + + return 0; + +out_end: + elf_end(kcore->elf); +out_close: + close(kcore->fd); + return -1; +} + +static int kcore__init(struct kcore *kcore, char *filename, int elfclass, + bool temp) +{ + GElf_Ehdr *ehdr; + + kcore->elfclass = elfclass; + + if (temp) + kcore->fd = mkstemp(filename); + else + kcore->fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0400); + if (kcore->fd == -1) + return -1; + + kcore->elf = elf_begin(kcore->fd, ELF_C_WRITE, NULL); + if (!kcore->elf) + goto out_close; + + if (!gelf_newehdr(kcore->elf, elfclass)) + goto out_end; + + ehdr = gelf_getehdr(kcore->elf, &kcore->ehdr); + if (!ehdr) + goto out_end; + + return 0; + +out_end: + elf_end(kcore->elf); +out_close: + close(kcore->fd); + unlink(filename); + return -1; +} + +static void kcore__close(struct kcore *kcore) +{ + elf_end(kcore->elf); + close(kcore->fd); +} + +static int kcore__copy_hdr(struct kcore *from, struct kcore *to, size_t count) +{ + GElf_Ehdr *ehdr = &to->ehdr; + GElf_Ehdr *kehdr = &from->ehdr; + + memcpy(ehdr->e_ident, kehdr->e_ident, EI_NIDENT); + ehdr->e_type = kehdr->e_type; + ehdr->e_machine = kehdr->e_machine; + ehdr->e_version = kehdr->e_version; + ehdr->e_entry = 0; + ehdr->e_shoff = 0; + ehdr->e_flags = kehdr->e_flags; + ehdr->e_phnum = count; + ehdr->e_shentsize = 0; + ehdr->e_shnum = 0; + ehdr->e_shstrndx = 0; + + if (from->elfclass == ELFCLASS32) { + ehdr->e_phoff = sizeof(Elf32_Ehdr); + ehdr->e_ehsize = sizeof(Elf32_Ehdr); + ehdr->e_phentsize = sizeof(Elf32_Phdr); + } else { + ehdr->e_phoff = sizeof(Elf64_Ehdr); + ehdr->e_ehsize = sizeof(Elf64_Ehdr); + ehdr->e_phentsize = sizeof(Elf64_Phdr); + } + + if (!gelf_update_ehdr(to->elf, ehdr)) + return -1; + + if (!gelf_newphdr(to->elf, count)) + return -1; + + return 0; +} + +static int kcore__add_phdr(struct kcore *kcore, int idx, off_t offset, + u64 addr, u64 len) +{ + GElf_Phdr gphdr; + GElf_Phdr *phdr; + + phdr = gelf_getphdr(kcore->elf, idx, &gphdr); + if (!phdr) + return -1; + + phdr->p_type = PT_LOAD; + phdr->p_flags = PF_R | PF_W | PF_X; + phdr->p_offset = offset; + phdr->p_vaddr = addr; + phdr->p_paddr = 0; + phdr->p_filesz = len; + phdr->p_memsz = len; + phdr->p_align = page_size; + + if (!gelf_update_phdr(kcore->elf, idx, phdr)) + return -1; + + return 0; +} + +static off_t kcore__write(struct kcore *kcore) +{ + return elf_update(kcore->elf, ELF_C_WRITE); +} + +struct phdr_data { + off_t offset; + u64 addr; + u64 len; +}; + +struct kcore_copy_info { + u64 stext; + u64 etext; + u64 first_symbol; + u64 last_symbol; + u64 first_module; + u64 last_module_symbol; + struct phdr_data kernel_map; + struct phdr_data modules_map; +}; + +static int kcore_copy__process_kallsyms(void *arg, const char *name, char type, + u64 start) +{ + struct kcore_copy_info *kci = arg; + + if (!symbol_type__is_a(type, MAP__FUNCTION)) + return 0; + + if (strchr(name, '[')) { + if (start > kci->last_module_symbol) + kci->last_module_symbol = start; + return 0; + } + + if (!kci->first_symbol || start < kci->first_symbol) + kci->first_symbol = start; + + if (!kci->last_symbol || start > kci->last_symbol) + kci->last_symbol = start; + + if (!strcmp(name, "_stext")) { + kci->stext = start; + return 0; + } + + if (!strcmp(name, "_etext")) { + kci->etext = start; + return 0; + } + + return 0; +} + +static int kcore_copy__parse_kallsyms(struct kcore_copy_info *kci, + const char *dir) +{ + char kallsyms_filename[PATH_MAX]; + + scnprintf(kallsyms_filename, PATH_MAX, "%s/kallsyms", dir); + + if (symbol__restricted_filename(kallsyms_filename, "/proc/kallsyms")) + return -1; + + if (kallsyms__parse(kallsyms_filename, kci, + kcore_copy__process_kallsyms) < 0) + return -1; + + return 0; +} + +static int kcore_copy__process_modules(void *arg, + const char *name __maybe_unused, + u64 start) +{ + struct kcore_copy_info *kci = arg; + + if (!kci->first_module || start < kci->first_module) + kci->first_module = start; + + return 0; +} + +static int kcore_copy__parse_modules(struct kcore_copy_info *kci, + const char *dir) +{ + char modules_filename[PATH_MAX]; + + scnprintf(modules_filename, PATH_MAX, "%s/modules", dir); + + if (symbol__restricted_filename(modules_filename, "/proc/modules")) + return -1; + + if (modules__parse(modules_filename, kci, + kcore_copy__process_modules) < 0) + return -1; + + return 0; +} + +static void kcore_copy__map(struct phdr_data *p, u64 start, u64 end, u64 pgoff, + u64 s, u64 e) +{ + if (p->addr || s < start || s >= end) + return; + + p->addr = s; + p->offset = (s - start) + pgoff; + p->len = e < end ? e - s : end - s; +} + +static int kcore_copy__read_map(u64 start, u64 len, u64 pgoff, void *data) +{ + struct kcore_copy_info *kci = data; + u64 end = start + len; + + kcore_copy__map(&kci->kernel_map, start, end, pgoff, kci->stext, + kci->etext); + + kcore_copy__map(&kci->modules_map, start, end, pgoff, kci->first_module, + kci->last_module_symbol); + + return 0; +} + +static int kcore_copy__read_maps(struct kcore_copy_info *kci, Elf *elf) +{ + if (elf_read_maps(elf, true, kcore_copy__read_map, kci) < 0) + return -1; + + return 0; +} + +static int kcore_copy__calc_maps(struct kcore_copy_info *kci, const char *dir, + Elf *elf) +{ + if (kcore_copy__parse_kallsyms(kci, dir)) + return -1; + + if (kcore_copy__parse_modules(kci, dir)) + return -1; + + if (kci->stext) + kci->stext = round_down(kci->stext, page_size); + else + kci->stext = round_down(kci->first_symbol, page_size); + + if (kci->etext) { + kci->etext = round_up(kci->etext, page_size); + } else if (kci->last_symbol) { + kci->etext = round_up(kci->last_symbol, page_size); + kci->etext += page_size; + } + + kci->first_module = round_down(kci->first_module, page_size); + + if (kci->last_module_symbol) { + kci->last_module_symbol = round_up(kci->last_module_symbol, + page_size); + kci->last_module_symbol += page_size; + } + + if (!kci->stext || !kci->etext) + return -1; + + if (kci->first_module && !kci->last_module_symbol) + return -1; + + return kcore_copy__read_maps(kci, elf); +} + +static int kcore_copy__copy_file(const char *from_dir, const char *to_dir, + const char *name) +{ + char from_filename[PATH_MAX]; + char to_filename[PATH_MAX]; + + scnprintf(from_filename, PATH_MAX, "%s/%s", from_dir, name); + scnprintf(to_filename, PATH_MAX, "%s/%s", to_dir, name); + + return copyfile_mode(from_filename, to_filename, 0400); +} + +static int kcore_copy__unlink(const char *dir, const char *name) +{ + char filename[PATH_MAX]; + + scnprintf(filename, PATH_MAX, "%s/%s", dir, name); + + return unlink(filename); +} + +static int kcore_copy__compare_fds(int from, int to) +{ + char *buf_from; + char *buf_to; + ssize_t ret; + size_t len; + int err = -1; + + buf_from = malloc(page_size); + buf_to = malloc(page_size); + if (!buf_from || !buf_to) + goto out; + + while (1) { + /* Use read because mmap won't work on proc files */ + ret = read(from, buf_from, page_size); + if (ret < 0) + goto out; + + if (!ret) + break; + + len = ret; + + if (readn(to, buf_to, len) != (int)len) + goto out; + + if (memcmp(buf_from, buf_to, len)) + goto out; + } + + err = 0; +out: + free(buf_to); + free(buf_from); + return err; +} + +static int kcore_copy__compare_files(const char *from_filename, + const char *to_filename) +{ + int from, to, err = -1; + + from = open(from_filename, O_RDONLY); + if (from < 0) + return -1; + + to = open(to_filename, O_RDONLY); + if (to < 0) + goto out_close_from; + + err = kcore_copy__compare_fds(from, to); + + close(to); +out_close_from: + close(from); + return err; +} + +static int kcore_copy__compare_file(const char *from_dir, const char *to_dir, + const char *name) +{ + char from_filename[PATH_MAX]; + char to_filename[PATH_MAX]; + + scnprintf(from_filename, PATH_MAX, "%s/%s", from_dir, name); + scnprintf(to_filename, PATH_MAX, "%s/%s", to_dir, name); + + return kcore_copy__compare_files(from_filename, to_filename); +} + +/** + * kcore_copy - copy kallsyms, modules and kcore from one directory to another. + * @from_dir: from directory + * @to_dir: to directory + * + * This function copies kallsyms, modules and kcore files from one directory to + * another. kallsyms and modules are copied entirely. Only code segments are + * copied from kcore. It is assumed that two segments suffice: one for the + * kernel proper and one for all the modules. The code segments are determined + * from kallsyms and modules files. The kernel map starts at _stext or the + * lowest function symbol, and ends at _etext or the highest function symbol. + * The module map starts at the lowest module address and ends at the highest + * module symbol. Start addresses are rounded down to the nearest page. End + * addresses are rounded up to the nearest page. An extra page is added to the + * highest kernel symbol and highest module symbol to, hopefully, encompass that + * symbol too. Because it contains only code sections, the resulting kcore is + * unusual. One significant peculiarity is that the mapping (start -> pgoff) + * is not the same for the kernel map and the modules map. That happens because + * the data is copied adjacently whereas the original kcore has gaps. Finally, + * kallsyms and modules files are compared with their copies to check that + * modules have not been loaded or unloaded while the copies were taking place. + * + * Return: %0 on success, %-1 on failure. + */ +int kcore_copy(const char *from_dir, const char *to_dir) +{ + struct kcore kcore; + struct kcore extract; + size_t count = 2; + int idx = 0, err = -1; + off_t offset = page_size, sz, modules_offset = 0; + struct kcore_copy_info kci = { .stext = 0, }; + char kcore_filename[PATH_MAX]; + char extract_filename[PATH_MAX]; + + if (kcore_copy__copy_file(from_dir, to_dir, "kallsyms")) + return -1; + + if (kcore_copy__copy_file(from_dir, to_dir, "modules")) + goto out_unlink_kallsyms; + + scnprintf(kcore_filename, PATH_MAX, "%s/kcore", from_dir); + scnprintf(extract_filename, PATH_MAX, "%s/kcore", to_dir); + + if (kcore__open(&kcore, kcore_filename)) + goto out_unlink_modules; + + if (kcore_copy__calc_maps(&kci, from_dir, kcore.elf)) + goto out_kcore_close; + + if (kcore__init(&extract, extract_filename, kcore.elfclass, false)) + goto out_kcore_close; + + if (!kci.modules_map.addr) + count -= 1; + + if (kcore__copy_hdr(&kcore, &extract, count)) + goto out_extract_close; + + if (kcore__add_phdr(&extract, idx++, offset, kci.kernel_map.addr, + kci.kernel_map.len)) + goto out_extract_close; + + if (kci.modules_map.addr) { + modules_offset = offset + kci.kernel_map.len; + if (kcore__add_phdr(&extract, idx, modules_offset, + kci.modules_map.addr, kci.modules_map.len)) + goto out_extract_close; + } + + sz = kcore__write(&extract); + if (sz < 0 || sz > offset) + goto out_extract_close; + + if (copy_bytes(kcore.fd, kci.kernel_map.offset, extract.fd, offset, + kci.kernel_map.len)) + goto out_extract_close; + + if (modules_offset && copy_bytes(kcore.fd, kci.modules_map.offset, + extract.fd, modules_offset, + kci.modules_map.len)) + goto out_extract_close; + + if (kcore_copy__compare_file(from_dir, to_dir, "modules")) + goto out_extract_close; + + if (kcore_copy__compare_file(from_dir, to_dir, "kallsyms")) + goto out_extract_close; + + err = 0; + +out_extract_close: + kcore__close(&extract); + if (err) + unlink(extract_filename); +out_kcore_close: + kcore__close(&kcore); +out_unlink_modules: + if (err) + kcore_copy__unlink(to_dir, "modules"); +out_unlink_kallsyms: + if (err) + kcore_copy__unlink(to_dir, "kallsyms"); + + return err; +} + +int kcore_extract__create(struct kcore_extract *kce) +{ + struct kcore kcore; + struct kcore extract; + size_t count = 1; + int idx = 0, err = -1; + off_t offset = page_size, sz; + + if (kcore__open(&kcore, kce->kcore_filename)) + return -1; + + strcpy(kce->extract_filename, PERF_KCORE_EXTRACT); + if (kcore__init(&extract, kce->extract_filename, kcore.elfclass, true)) + goto out_kcore_close; + + if (kcore__copy_hdr(&kcore, &extract, count)) + goto out_extract_close; + + if (kcore__add_phdr(&extract, idx, offset, kce->addr, kce->len)) + goto out_extract_close; + + sz = kcore__write(&extract); + if (sz < 0 || sz > offset) + goto out_extract_close; + + if (copy_bytes(kcore.fd, kce->offs, extract.fd, offset, kce->len)) + goto out_extract_close; + + err = 0; + +out_extract_close: + kcore__close(&extract); + if (err) + unlink(kce->extract_filename); +out_kcore_close: + kcore__close(&kcore); + + return err; +} + +void kcore_extract__delete(struct kcore_extract *kce) +{ + unlink(kce->extract_filename); +} + void symbol__elf_init(void) { elf_version(EV_CURRENT); diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c index 3a802c300fc5..2d2dd0532b5a 100644 --- a/tools/perf/util/symbol-minimal.c +++ b/tools/perf/util/symbol-minimal.c @@ -308,6 +308,21 @@ int file__read_maps(int fd __maybe_unused, bool exe __maybe_unused, return -1; } +int kcore_extract__create(struct kcore_extract *kce __maybe_unused) +{ + return -1; +} + +void kcore_extract__delete(struct kcore_extract *kce __maybe_unused) +{ +} + +int kcore_copy(const char *from_dir __maybe_unused, + const char *to_dir __maybe_unused) +{ + return -1; +} + void symbol__elf_init(void) { } diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 7eb0362f4ffd..c0c36965fff0 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -51,6 +51,7 @@ static enum dso_binary_type binary_type_symtab[] = { DSO_BINARY_TYPE__SYSTEM_PATH_DSO, DSO_BINARY_TYPE__GUEST_KMODULE, DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE, + DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO, DSO_BINARY_TYPE__NOT_FOUND, }; @@ -159,10 +160,12 @@ again: if (choose_best_symbol(curr, next) == SYMBOL_A) { rb_erase(&next->rb_node, symbols); + symbol__delete(next); goto again; } else { nd = rb_next(&curr->rb_node); rb_erase(&curr->rb_node, symbols); + symbol__delete(curr); } } } @@ -499,6 +502,64 @@ out_failure: return -1; } +int modules__parse(const char *filename, void *arg, + int (*process_module)(void *arg, const char *name, + u64 start)) +{ + char *line = NULL; + size_t n; + FILE *file; + int err = 0; + + file = fopen(filename, "r"); + if (file == NULL) + return -1; + + while (1) { + char name[PATH_MAX]; + u64 start; + char *sep; + ssize_t line_len; + + line_len = getline(&line, &n, file); + if (line_len < 0) { + if (feof(file)) + break; + err = -1; + goto out; + } + + if (!line) { + err = -1; + goto out; + } + + line[--line_len] = '\0'; /* \n */ + + sep = strrchr(line, 'x'); + if (sep == NULL) + continue; + + hex2u64(sep + 1, &start); + + sep = strchr(line, ' '); + if (sep == NULL) + continue; + + *sep = '\0'; + + scnprintf(name, sizeof(name), "[%s]", line); + + err = process_module(arg, name, start); + if (err) + break; + } +out: + free(line); + fclose(file); + return err; +} + struct process_kallsyms_args { struct map *map; struct dso *dso; @@ -739,51 +800,242 @@ bool symbol__restricted_filename(const char *filename, return restricted; } -struct kcore_mapfn_data { - struct dso *dso; - enum map_type type; - struct list_head maps; +struct module_info { + struct rb_node rb_node; + char *name; + u64 start; }; -static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data) +static void add_module(struct module_info *mi, struct rb_root *modules) { - struct kcore_mapfn_data *md = data; - struct map *map; + struct rb_node **p = &modules->rb_node; + struct rb_node *parent = NULL; + struct module_info *m; - map = map__new2(start, md->dso, md->type); - if (map == NULL) + while (*p != NULL) { + parent = *p; + m = rb_entry(parent, struct module_info, rb_node); + if (strcmp(mi->name, m->name) < 0) + p = &(*p)->rb_left; + else + p = &(*p)->rb_right; + } + rb_link_node(&mi->rb_node, parent, p); + rb_insert_color(&mi->rb_node, modules); +} + +static void delete_modules(struct rb_root *modules) +{ + struct module_info *mi; + struct rb_node *next = rb_first(modules); + + while (next) { + mi = rb_entry(next, struct module_info, rb_node); + next = rb_next(&mi->rb_node); + rb_erase(&mi->rb_node, modules); + free(mi->name); + free(mi); + } +} + +static struct module_info *find_module(const char *name, + struct rb_root *modules) +{ + struct rb_node *n = modules->rb_node; + + while (n) { + struct module_info *m; + int cmp; + + m = rb_entry(n, struct module_info, rb_node); + cmp = strcmp(name, m->name); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + return m; + } + + return NULL; +} + +static int __read_proc_modules(void *arg, const char *name, u64 start) +{ + struct rb_root *modules = arg; + struct module_info *mi; + + mi = zalloc(sizeof(struct module_info)); + if (!mi) return -ENOMEM; - map->end = map->start + len; - map->pgoff = pgoff; + mi->name = strdup(name); + mi->start = start; - list_add(&map->node, &md->maps); + if (!mi->name) { + free(mi); + return -ENOMEM; + } + + add_module(mi, modules); + + return 0; +} + +static int read_proc_modules(const char *filename, struct rb_root *modules) +{ + if (symbol__restricted_filename(filename, "/proc/modules")) + return -1; + + if (modules__parse(filename, modules, __read_proc_modules)) { + delete_modules(modules); + return -1; + } return 0; } +int compare_proc_modules(const char *from, const char *to) +{ + struct rb_root from_modules = RB_ROOT; + struct rb_root to_modules = RB_ROOT; + struct rb_node *from_node, *to_node; + struct module_info *from_m, *to_m; + int ret = -1; + + if (read_proc_modules(from, &from_modules)) + return -1; + + if (read_proc_modules(to, &to_modules)) + goto out_delete_from; + + from_node = rb_first(&from_modules); + to_node = rb_first(&to_modules); + while (from_node) { + if (!to_node) + break; + + from_m = rb_entry(from_node, struct module_info, rb_node); + to_m = rb_entry(to_node, struct module_info, rb_node); + + if (from_m->start != to_m->start || + strcmp(from_m->name, to_m->name)) + break; + + from_node = rb_next(from_node); + to_node = rb_next(to_node); + } + + if (!from_node && !to_node) + ret = 0; + + delete_modules(&to_modules); +out_delete_from: + delete_modules(&from_modules); + + return ret; +} + +static int do_validate_kcore_modules(const char *filename, struct map *map, + struct map_groups *kmaps) +{ + struct rb_root modules = RB_ROOT; + struct map *old_map; + int err; + + err = read_proc_modules(filename, &modules); + if (err) + return err; + + old_map = map_groups__first(kmaps, map->type); + while (old_map) { + struct map *next = map_groups__next(old_map); + struct module_info *mi; + + if (old_map == map || old_map->start == map->start) { + /* The kernel map */ + old_map = next; + continue; + } + + /* Module must be in memory at the same address */ + mi = find_module(old_map->dso->short_name, &modules); + if (!mi || mi->start != old_map->start) { + err = -EINVAL; + goto out; + } + + old_map = next; + } +out: + delete_modules(&modules); + return err; +} + /* - * If kallsyms is referenced by name then we look for kcore in the same + * If kallsyms is referenced by name then we look for filename in the same * directory. */ -static bool kcore_filename_from_kallsyms_filename(char *kcore_filename, - const char *kallsyms_filename) +static bool filename_from_kallsyms_filename(char *filename, + const char *base_name, + const char *kallsyms_filename) { char *name; - strcpy(kcore_filename, kallsyms_filename); - name = strrchr(kcore_filename, '/'); + strcpy(filename, kallsyms_filename); + name = strrchr(filename, '/'); if (!name) return false; - if (!strcmp(name, "/kallsyms")) { - strcpy(name, "/kcore"); + name += 1; + + if (!strcmp(name, "kallsyms")) { + strcpy(name, base_name); return true; } return false; } +static int validate_kcore_modules(const char *kallsyms_filename, + struct map *map) +{ + struct map_groups *kmaps = map__kmap(map)->kmaps; + char modules_filename[PATH_MAX]; + + if (!filename_from_kallsyms_filename(modules_filename, "modules", + kallsyms_filename)) + return -EINVAL; + + if (do_validate_kcore_modules(modules_filename, map, kmaps)) + return -EINVAL; + + return 0; +} + +struct kcore_mapfn_data { + struct dso *dso; + enum map_type type; + struct list_head maps; +}; + +static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data) +{ + struct kcore_mapfn_data *md = data; + struct map *map; + + map = map__new2(start, md->dso, md->type); + if (map == NULL) + return -ENOMEM; + + map->end = map->start + len; + map->pgoff = pgoff; + + list_add(&map->node, &md->maps); + + return 0; +} + static int dso__load_kcore(struct dso *dso, struct map *map, const char *kallsyms_filename) { @@ -800,8 +1052,12 @@ static int dso__load_kcore(struct dso *dso, struct map *map, if (map != machine->vmlinux_maps[map->type]) return -EINVAL; - if (!kcore_filename_from_kallsyms_filename(kcore_filename, - kallsyms_filename)) + if (!filename_from_kallsyms_filename(kcore_filename, "kcore", + kallsyms_filename)) + return -EINVAL; + + /* All modules must be present at their original addresses */ + if (validate_kcore_modules(kallsyms_filename, map)) return -EINVAL; md.dso = dso; @@ -1188,6 +1444,105 @@ out: return err; } +static int find_matching_kcore(struct map *map, char *dir, size_t dir_sz) +{ + char kallsyms_filename[PATH_MAX]; + struct dirent *dent; + int ret = -1; + DIR *d; + + d = opendir(dir); + if (!d) + return -1; + + while (1) { + dent = readdir(d); + if (!dent) + break; + if (dent->d_type != DT_DIR) + continue; + scnprintf(kallsyms_filename, sizeof(kallsyms_filename), + "%s/%s/kallsyms", dir, dent->d_name); + if (!validate_kcore_modules(kallsyms_filename, map)) { + strlcpy(dir, kallsyms_filename, dir_sz); + ret = 0; + break; + } + } + + closedir(d); + + return ret; +} + +static char *dso__find_kallsyms(struct dso *dso, struct map *map) +{ + u8 host_build_id[BUILD_ID_SIZE]; + char sbuild_id[BUILD_ID_SIZE * 2 + 1]; + bool is_host = false; + char path[PATH_MAX]; + + if (!dso->has_build_id) { + /* + * Last resort, if we don't have a build-id and couldn't find + * any vmlinux file, try the running kernel kallsyms table. + */ + goto proc_kallsyms; + } + + if (sysfs__read_build_id("/sys/kernel/notes", host_build_id, + sizeof(host_build_id)) == 0) + is_host = dso__build_id_equal(dso, host_build_id); + + build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id); + + /* Use /proc/kallsyms if possible */ + if (is_host) { + DIR *d; + int fd; + + /* If no cached kcore go with /proc/kallsyms */ + scnprintf(path, sizeof(path), "%s/[kernel.kcore]/%s", + buildid_dir, sbuild_id); + d = opendir(path); + if (!d) + goto proc_kallsyms; + closedir(d); + + /* + * Do not check the build-id cache, until we know we cannot use + * /proc/kcore. + */ + fd = open("/proc/kcore", O_RDONLY); + if (fd != -1) { + close(fd); + /* If module maps match go with /proc/kallsyms */ + if (!validate_kcore_modules("/proc/kallsyms", map)) + goto proc_kallsyms; + } + + /* Find kallsyms in build-id cache with kcore */ + if (!find_matching_kcore(map, path, sizeof(path))) + return strdup(path); + + goto proc_kallsyms; + } + + scnprintf(path, sizeof(path), "%s/[kernel.kallsyms]/%s", + buildid_dir, sbuild_id); + + if (access(path, F_OK)) { + pr_err("No kallsyms or vmlinux with build-id %s was found\n", + sbuild_id); + return NULL; + } + + return strdup(path); + +proc_kallsyms: + return strdup("/proc/kallsyms"); +} + static int dso__load_kernel_sym(struct dso *dso, struct map *map, symbol_filter_t filter) { @@ -1214,7 +1569,7 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map, goto do_kallsyms; } - if (symbol_conf.vmlinux_name != NULL) { + if (!symbol_conf.ignore_vmlinux && symbol_conf.vmlinux_name != NULL) { err = dso__load_vmlinux(dso, map, symbol_conf.vmlinux_name, filter); if (err > 0) { @@ -1226,7 +1581,7 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map, return err; } - if (vmlinux_path != NULL) { + if (!symbol_conf.ignore_vmlinux && vmlinux_path != NULL) { err = dso__load_vmlinux_path(dso, map, filter); if (err > 0) return err; @@ -1236,51 +1591,11 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map, if (symbol_conf.symfs[0] != 0) return -1; - /* - * Say the kernel DSO was created when processing the build-id header table, - * we have a build-id, so check if it is the same as the running kernel, - * using it if it is. - */ - if (dso->has_build_id) { - u8 kallsyms_build_id[BUILD_ID_SIZE]; - char sbuild_id[BUILD_ID_SIZE * 2 + 1]; - - if (sysfs__read_build_id("/sys/kernel/notes", kallsyms_build_id, - sizeof(kallsyms_build_id)) == 0) { - if (dso__build_id_equal(dso, kallsyms_build_id)) { - kallsyms_filename = "/proc/kallsyms"; - goto do_kallsyms; - } - } - /* - * Now look if we have it on the build-id cache in - * $HOME/.debug/[kernel.kallsyms]. - */ - build_id__sprintf(dso->build_id, sizeof(dso->build_id), - sbuild_id); - - if (asprintf(&kallsyms_allocated_filename, - "%s/.debug/[kernel.kallsyms]/%s", - getenv("HOME"), sbuild_id) == -1) { - pr_err("Not enough memory for kallsyms file lookup\n"); - return -1; - } - - kallsyms_filename = kallsyms_allocated_filename; + kallsyms_allocated_filename = dso__find_kallsyms(dso, map); + if (!kallsyms_allocated_filename) + return -1; - if (access(kallsyms_filename, F_OK)) { - pr_err("No kallsyms or vmlinux with build-id %s " - "was found\n", sbuild_id); - free(kallsyms_allocated_filename); - return -1; - } - } else { - /* - * Last resort, if we don't have a build-id and couldn't find - * any vmlinux file, try the running kernel kallsyms table. - */ - kallsyms_filename = "/proc/kallsyms"; - } + kallsyms_filename = kallsyms_allocated_filename; do_kallsyms: err = dso__load_kallsyms(dso, kallsyms_filename, map, filter); diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index fd5b70ea2981..07de8fea2f48 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -13,7 +13,7 @@ #include #include "build-id.h" -#ifdef LIBELF_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT #include #include #endif @@ -21,7 +21,7 @@ #include "dso.h" -#ifdef HAVE_CPLUS_DEMANGLE +#ifdef HAVE_CPLUS_DEMANGLE_SUPPORT extern char *cplus_demangle(const char *, int); static inline char *bfd_demangle(void __maybe_unused *v, const char *c, int i) @@ -46,7 +46,7 @@ static inline char *bfd_demangle(void __maybe_unused *v, * libelf 0.8.x and earlier do not support ELF_C_READ_MMAP; * for newer versions we can use mmap to reduce memory usage: */ -#ifdef LIBELF_MMAP +#ifdef HAVE_LIBELF_MMAP_SUPPORT # define PERF_ELF_C_READ_MMAP ELF_C_READ_MMAP #else # define PERF_ELF_C_READ_MMAP ELF_C_READ @@ -85,6 +85,7 @@ struct symbol_conf { unsigned short priv_size; unsigned short nr_events; bool try_vmlinux_path, + ignore_vmlinux, show_kernel_path, use_modules, sort_by_name, @@ -178,7 +179,7 @@ struct symsrc { int fd; enum dso_binary_type type; -#ifdef LIBELF_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT Elf *elf; GElf_Ehdr ehdr; @@ -222,6 +223,9 @@ int sysfs__read_build_id(const char *filename, void *bf, size_t size); int kallsyms__parse(const char *filename, void *arg, int (*process_symbol)(void *arg, const char *name, char type, u64 start)); +int modules__parse(const char *filename, void *arg, + int (*process_module)(void *arg, const char *name, + u64 start)); int filename__read_debuglink(const char *filename, char *debuglink, size_t size); @@ -252,4 +256,21 @@ typedef int (*mapfn_t)(u64 start, u64 len, u64 pgoff, void *data); int file__read_maps(int fd, bool exe, mapfn_t mapfn, void *data, bool *is_64_bit); +#define PERF_KCORE_EXTRACT "/tmp/perf-kcore-XXXXXX" + +struct kcore_extract { + char *kcore_filename; + u64 addr; + u64 offs; + u64 len; + char extract_filename[sizeof(PERF_KCORE_EXTRACT)]; + int fd; +}; + +int kcore_extract__create(struct kcore_extract *kce); +void kcore_extract__delete(struct kcore_extract *kce); + +int kcore_copy(const char *from_dir, const char *to_dir); +int compare_proc_modules(const char *from, const char *to); + #endif /* __PERF_SYMBOL */ diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h index b554ffc462b6..88cfeaff600b 100644 --- a/tools/perf/util/top.h +++ b/tools/perf/util/top.h @@ -24,6 +24,7 @@ struct perf_top { u64 exact_samples; u64 guest_us_samples, guest_kernel_samples; int print_entries, count_filter, delay_secs; + int max_stack; bool hide_kernel_symbols, hide_user_symbols, zero; bool use_tui, use_stdio; bool kptr_restrict_warned; diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c index e9e1c03f927d..6681f71f2f95 100644 --- a/tools/perf/util/trace-event-parse.c +++ b/tools/perf/util/trace-event-parse.c @@ -120,42 +120,6 @@ raw_field_value(struct event_format *event, const char *name, void *data) return val; } -void *raw_field_ptr(struct event_format *event, const char *name, void *data) -{ - struct format_field *field; - - field = pevent_find_any_field(event, name); - if (!field) - return NULL; - - if (field->flags & FIELD_IS_DYNAMIC) { - int offset; - - offset = *(int *)(data + field->offset); - offset &= 0xffff; - - return data + offset; - } - - return data + field->offset; -} - -int trace_parse_common_type(struct pevent *pevent, void *data) -{ - struct pevent_record record; - - record.data = data; - return pevent_data_type(pevent, &record); -} - -int trace_parse_common_pid(struct pevent *pevent, void *data) -{ - struct pevent_record record; - - record.data = data; - return pevent_data_pid(pevent, &record); -} - unsigned long long read_size(struct event_format *event, void *ptr, int size) { return pevent_read_number(event->pevent, ptr, size); diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h index fafe1a40444a..04df63114109 100644 --- a/tools/perf/util/trace-event.h +++ b/tools/perf/util/trace-event.h @@ -11,8 +11,6 @@ union perf_event; struct perf_tool; struct thread; -extern struct pevent *perf_pevent; - int bigendian(void); struct pevent *read_trace_init(int file_bigendian, int host_bigendian); @@ -23,26 +21,19 @@ int parse_ftrace_file(struct pevent *pevent, char *buf, unsigned long size); int parse_event_file(struct pevent *pevent, char *buf, unsigned long size, char *sys); -struct pevent_record *trace_peek_data(struct pevent *pevent, int cpu); - unsigned long long raw_field_value(struct event_format *event, const char *name, void *data); -void *raw_field_ptr(struct event_format *event, const char *name, void *data); void parse_proc_kallsyms(struct pevent *pevent, char *file, unsigned int size); void parse_ftrace_printk(struct pevent *pevent, char *file, unsigned int size); ssize_t trace_report(int fd, struct pevent **pevent, bool repipe); -int trace_parse_common_type(struct pevent *pevent, void *data); -int trace_parse_common_pid(struct pevent *pevent, void *data); - struct event_format *trace_find_next_event(struct pevent *pevent, struct event_format *event); unsigned long long read_size(struct event_format *event, void *ptr, int size); unsigned long long eval_flag(const char *flag); -struct pevent_record *trace_read_data(struct pevent *pevent, int cpu); int read_tracing_data(int fd, struct list_head *pattrs); struct tracing_data { diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h index cb6bc503a792..ec0c71a2ca2e 100644 --- a/tools/perf/util/unwind.h +++ b/tools/perf/util/unwind.h @@ -13,7 +13,7 @@ struct unwind_entry { typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, void *arg); -#ifdef LIBUNWIND_SUPPORT +#ifdef HAVE_LIBUNWIND_SUPPORT int unwind__get_entries(unwind_entry_cb_t cb, void *arg, struct machine *machine, struct thread *thread, @@ -31,5 +31,5 @@ unwind__get_entries(unwind_entry_cb_t cb __maybe_unused, { return 0; } -#endif /* LIBUNWIND_SUPPORT */ +#endif /* HAVE_LIBUNWIND_SUPPORT */ #endif /* __UNWIND_H */ diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index 6d17b18e915d..c25e57b3acb2 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -1,7 +1,7 @@ #include "../perf.h" #include "util.h" #include -#ifdef BACKTRACE_SUPPORT +#ifdef HAVE_BACKTRACE_SUPPORT #include #endif #include @@ -55,17 +55,20 @@ int mkdir_p(char *path, mode_t mode) return (stat(path, &st) && mkdir(path, mode)) ? -1 : 0; } -static int slow_copyfile(const char *from, const char *to) +static int slow_copyfile(const char *from, const char *to, mode_t mode) { - int err = 0; + int err = -1; char *line = NULL; size_t n; FILE *from_fp = fopen(from, "r"), *to_fp; + mode_t old_umask; if (from_fp == NULL) goto out; + old_umask = umask(mode ^ 0777); to_fp = fopen(to, "w"); + umask(old_umask); if (to_fp == NULL) goto out_fclose_from; @@ -82,7 +85,7 @@ out: return err; } -int copyfile(const char *from, const char *to) +int copyfile_mode(const char *from, const char *to, mode_t mode) { int fromfd, tofd; struct stat st; @@ -93,13 +96,13 @@ int copyfile(const char *from, const char *to) goto out; if (st.st_size == 0) /* /proc? do it slowly... */ - return slow_copyfile(from, to); + return slow_copyfile(from, to, mode); fromfd = open(from, O_RDONLY); if (fromfd < 0) goto out; - tofd = creat(to, 0755); + tofd = creat(to, mode); if (tofd < 0) goto out_close_from; @@ -121,6 +124,11 @@ out: return err; } +int copyfile(const char *from, const char *to) +{ + return copyfile_mode(from, to, 0755); +} + unsigned long convert_unit(unsigned long value, char *unit) { *unit = ' '; @@ -204,7 +212,7 @@ int hex2u64(const char *ptr, u64 *long_val) } /* Obtain a backtrace and print it to stdout. */ -#ifdef BACKTRACE_SUPPORT +#ifdef HAVE_BACKTRACE_SUPPORT void dump_stack(void) { void *array[16]; @@ -361,3 +369,45 @@ int parse_nsec_time(const char *str, u64 *ptime) *ptime = time_sec * NSEC_PER_SEC + time_nsec; return 0; } + +unsigned long parse_tag_value(const char *str, struct parse_tag *tags) +{ + struct parse_tag *i = tags; + + while (i->tag) { + char *s; + + s = strchr(str, i->tag); + if (s) { + unsigned long int value; + char *endptr; + + value = strtoul(str, &endptr, 10); + if (s != endptr) + break; + + value *= i->mult; + return value; + } + i++; + } + + return (unsigned long) -1; +} + +int filename__read_int(const char *filename, int *value) +{ + char line[64]; + int fd = open(filename, O_RDONLY), err = -1; + + if (fd < 0) + return -1; + + if (read(fd, line, sizeof(line)) > 0) { + *value = atoi(line); + err = 0; + } + + close(fd); + return err; +} diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index a53535949043..c8f362daba87 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -128,6 +128,8 @@ void put_tracing_file(char *file); #endif #endif +#define PERF_GTK_DSO "libperf-gtk.so" + /* General helper functions */ extern void usage(const char *err) NORETURN; extern void die(const char *err, ...) NORETURN __attribute__((format (printf, 1, 2))); @@ -241,6 +243,7 @@ static inline int sane_case(int x, int high) int mkdir_p(char *path, mode_t mode); int copyfile(const char *from, const char *to); +int copyfile_mode(const char *from, const char *to, mode_t mode); s64 perf_atoll(const char *str); char **argv_split(const char *str, int *argcp); @@ -270,6 +273,13 @@ bool is_power_of_2(unsigned long n) return (n != 0 && ((n & (n - 1)) == 0)); } +static inline unsigned next_pow2(unsigned x) +{ + if (!x) + return 1; + return 1ULL << (32 - __builtin_clz(x - 1)); +} + size_t hex_width(u64 v); int hex2u64(const char *ptr, u64 *val); @@ -281,4 +291,20 @@ void dump_stack(void); extern unsigned int page_size; void get_term_dimensions(struct winsize *ws); + +struct parse_tag { + char tag; + int mult; +}; + +unsigned long parse_tag_value(const char *str, struct parse_tag *tags); + +#define SRCLINE_UNKNOWN ((char *) "??:0") + +struct dso; + +char *get_srcline(struct dso *dso, unsigned long addr); +void free_srcline(char *srcline); + +int filename__read_int(const char *filename, int *value); #endif /* GIT_COMPAT_UTIL_H */ diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include index 0d0506d55c71..ee76544deecb 100644 --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@ -59,21 +59,22 @@ QUIET_SUBDIR0 = +$(MAKE) $(COMMAND_O) -C # space to separate -C and subdir QUIET_SUBDIR1 = ifneq ($(findstring $(MAKEFLAGS),s),s) -ifneq ($(V),1) - QUIET_CC = @echo ' ' CC $@; - QUIET_AR = @echo ' ' AR $@; - QUIET_LINK = @echo ' ' LINK $@; - QUIET_MKDIR = @echo ' ' MKDIR $@; - QUIET_GEN = @echo ' ' GEN $@; + ifneq ($(V),1) + QUIET_CC = @echo ' CC '$@; + QUIET_AR = @echo ' AR '$@; + QUIET_LINK = @echo ' LINK '$@; + QUIET_MKDIR = @echo ' MKDIR '$@; + QUIET_GEN = @echo ' GEN '$@; QUIET_SUBDIR0 = +@subdir= - QUIET_SUBDIR1 = ;$(NO_SUBDIR) echo ' ' SUBDIR $$subdir; \ + QUIET_SUBDIR1 = ;$(NO_SUBDIR) \ + echo ' SUBDIR '$$subdir; \ $(MAKE) $(PRINT_DIR) -C $$subdir - QUIET_FLEX = @echo ' ' FLEX $@; - QUIET_BISON = @echo ' ' BISON $@; + QUIET_FLEX = @echo ' FLEX '$@; + QUIET_BISON = @echo ' BISON '$@; descend = \ - +@echo ' ' DESCEND $(1); \ + +@echo ' DESCEND '$(1); \ mkdir -p $(OUTPUT)$(1) && \ $(MAKE) $(COMMAND_O) subdir=$(if $(subdir),$(subdir)/$(1),$(1)) $(PRINT_DIR) -C $(1) $(2) -endif + endif endif diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c index 4fa655d68a81..41bd85559d4b 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -151,7 +151,7 @@ static int check_timer_create(int which) fflush(stdout); done = 0; - timer_create(which, NULL, &id); + err = timer_create(which, NULL, &id); if (err < 0) { perror("Can't create timer\n"); return -1; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 979bff485fb0..a9dd682cf5e3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1064,10 +1064,12 @@ EXPORT_SYMBOL_GPL(gfn_to_hva); unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable) { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - if (writable) + unsigned long hva = __gfn_to_hva_many(slot, gfn, NULL, false); + + if (!kvm_is_error_hva(hva) && writable) *writable = !memslot_is_readonly(slot); - return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false); + return hva; } static int kvm_read_hva(void *data, void __user *hva, int len)