Anton Vorontsov [Thu, 9 Jul 2009 18:36:44 +0000 (22:36 +0400)]
powerpc/85xx: Add support for I2C EEPROMs on MPC8548CDS boards
This patch simply adds four eeprom nodes to MPC8548CDS' device tree.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Vorontsov [Fri, 24 Jul 2009 21:42:17 +0000 (01:42 +0400)]
powerpc/83xx: Add support for MPC8377E-WLAN boards
MPC8377E-WLAN are basically RDB boards except:
- RAM extended to 512 MB;
- NAND flash removed, NOR flash extended to 64 MB;
- Vitesse VSC7385 5-port switch removed, RTL8211B PHY added;
- Power management MCU removed;
- PCI slot removed, another mini-PCI slot added (IRQ routing changed);
- USB3300 PHY's ID pin grounded, thus USB port is host-only.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
in case the interrupt controller was used in an earlier life then it is
possible it is that some of its sources were used and are still unmask.
If the (unmasked) device is active and is creating interrupts (or one
interrupts was pending since the interrupts were disabled) then the boot
process "ends" very soon. Once external interrupts are enabled, we land in
-> do_IRQ
-> call ppc_md.get_irq()
-> ipic_read() gets the source number
-> irq_linear_revmap(source)
-> revmap[source] == NO_IRQ
-> irq_find_mapping(source) returns NO_IRQ because no source
is registered
-> source is NO_IRQ, ppc_spurious_interrupts gets incremented, no
further action.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
x = \(kmalloc\|kcalloc\|kzalloc\)(...);
... when != x == NULL
when != x != NULL
when != (x || ...)
(
kfree(x)
|
f(...,C,...,x,...)
|
*f(...,x,...)
|
*x->f
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Thu, 30 Jul 2009 22:56:38 +0000 (17:56 -0500)]
powerpc/85xx: Move mpc8536ds.dts to address-cells/size-cells = <2>
Change the top-level #address-cells and #size-cells to <2> so the
mpc8536ds.dts is easier to deal with both a true 32-bit physical
or 36-bit physical address space.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 01:41:06 +0000 (01:41 +0000)]
powerpc/40x: Update kilauea defconfig to support NAND, RTC and HWMON
This patch adds support for the following devices to the Kilauea
defconfig file:
- PPC4xx NAND controller (NDFC)
- I2C RTC (Dallas DS1338)
- I2C HWMON (Dallas DS1775)
Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 01:40:56 +0000 (01:40 +0000)]
powerpc/40x: Update Kilauea dts to support NAND, RTC and HWMON
This patch adds support for the following devices to the Kilauea dts:
- PPC4xx NAND controller (NDFC)
- I2C RTC (Dallas DS1338)
- I2C HWMON (Dallas DS1775)
Additionally the partitioning of the NOR FLASH is changed. The dtb
partition has been missing. Fixed in this patch.
Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Wed, 5 Aug 2009 03:33:32 +0000 (22:33 -0500)]
powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus
Introduced a temporary variable into our iterating over the list cpus
that are threads on the same core. For some reason Ben forgot how for
loops work.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Fix encoding of page table cache numbers
The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E
The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though
we did carve out some virtual space for it, the necessary support code
wasn't there. This implements it by using 16M pages for now, though the
page size could easily be changed at runtime if necessary.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Add TLB management code for 64-bit Book3E
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.
There is currently no support for hugetlbfs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Move around mmu_gathers definition on 64-bit
The definition for the global structure mmu_gathers, used by generic code,
is currently defined in multiple places not including anything used by
64-bit Book3E. This changes it by moving to one place common to all
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Add PACA fields specific to 64-bit Book3E processors
This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Add memory management headers for new 64-bit BookE
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit
We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds various SPRs defined on 64-bit BookE, along with changes
to the definition of the base MSR values to add the values needed
for 64-bit Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
That patch used to just add a hook to page table flushing but
pulling that string brought out a whole bunch of issues, so it
now does that and more:
- We now make the RCU batching of page freeing SMP only, as I
believe it was intended initially. We make a few more things compile
to nothing on !CONFIG_SMP
- Some macros are turned into functions, though that forced me to
out of line a few stuffs due to unsolvable include depenencies,
however it's probably better that way anyway, it's not -that-
critical code path.
- 32-bit didn't call pte_free_finish() on tlb_flush() which means
that it wouldn't push out the batch to RCU for delayed freeing when
a bunch of page tables have been freed, they would just stay in there
until the batch gets full.
64-bit BookE will use that hook to maintain the virtually linear
page tables or the indirect entries in the TLB when using the
HW loader.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, a single ifdef covers SLB related bits and more generic ppc64
related bits, split this in two separate ifdef's since 64-bit BookE will
need one but not the other.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Our 64-bit hash context handling has no init function, but 64-bit Book3E
will use the common mmu_context_nohash.c code which does, so define an
empty inline mmu_context_init() for 64-bit server and call it from
our 64-bit setup_arch()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org>
powerpc/mm: Make low level TLB flush ops on BookE take additional args
We need to pass down whether the page is direct or indirect and we'll
need to pass the page size to _tlbil_va and _tlbivax_bcast
We also add a new low level _tlbil_pid_noind() which does a TLB flush
by PID but avoids flushing indirect entries if possible
This implements those new prototypes but defines them with inlines
or macros so that no additional arguments are actually passed on current
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Modify some ppc_asm.h macros to accomodate 64-bits Book3E
The way I intend to use tophys/tovirt on 64-bit BookE is different
from the "trick" that we currently play for 32-bit BookE so change
the condition of definition of these macros to make it so.
Also, make sure we only use rfid and mtmsrd instead of rfi and mtmsr
for 64-bit server processors, not all 64-bit processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org>
powerpc/mm: Add support for early ioremap on non-hash 64-bit processors
This adds some code to do early ioremap's using page tables instead of
bolting entries in the hash table. This will be used by the upcoming
64-bits BookE port.
The patch also changes the test for early vs. late ioremap to use
slab_is_available() instead of our old hackish mem_init_done.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Add HW threads support to no_hash TLB management
The current "no hash" MMU context management code is written with
the assumption that one CPU == one TLB. This is not the case on
implementations that support HW multithreading, where several
linux CPUs can share the same TLB.
This adds some basic support for this to our context management
and our TLB flushing code.
It also cleans up the optional debugging output a bit
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/of: Remove useless register save/restore when calling OF back
enter_prom() used to save and restore registers such as CTR, XER etc..
which are volatile, or SRR0,1... which we don't care about. This
removes a bunch of useless code and while at it turns an mtmsrd into
an MTMSRD macro which will be useful to Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Fix misplaced #endif in pgtable-ppc64-64k.h
A misplaced #endif causes more definitions than intended to be
protected by #ifndef __ASSEMBLY__. This breaks upcoming 64-bit
BookE support patch when using 64k pages.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The truncate syscall has a signed long parameter, so when using a 32-
bit userspace with a 64-bit kernel the argument is zero-extended
instead of sign-extended. Adding the compat_sys_truncate function
fixes the issue.
This was noticed during an LSB truncate test failure. The test was
checking for the correct error number set when truncate is called with
a length of -1. The test can be found at:
Frans Pop [Thu, 23 Jul 2009 08:57:18 +0000 (08:57 +0000)]
powerpc: Makefile simplification through use of cc-ifversion
Signed-off-by: Frans Pop <elendil@planet.nl> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This change the SPRG used to store the PACA on ppc64 from
SPRG3 to SPRG1. SPRG3 is user readable on most processors
and we want to use it for other things. We change the scratch
SPRG used by exception vectors from SRPG1 to SPRG2.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mm: Fix definitions of FORCE_MAX_ZONEORDER in Kconfig
The current definitions set ranges and defaults for 32 and 64-bit
only using "PPC_STD_MMU" which means hash based MMU. This uselessly
restrict the usefulness for the upcoming 64-bit BookE port, but more
than that, it's broken on 32-bit since the only 32-bit platform
supporting multiple page sizes currently is 44x which does -not-
have PPC_STD_MMU_32 set.
This fixes it by using PPC64 and PPC32 instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Remove use of a second scratch SPRG in STAB code
The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
save a GPR in order to decide whether to go to do_stab_bolted_* or
to handle a normal data access exception.
This prevents our scheme of freeing SPRG3 which is user visible for
user uses since we cannot use SPRG0 which, on RS/64, seems to be
read-only for supervisor mode (like POWER4).
This reworks the STAB exception entry to use the PACA as temporary
storage instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Use names rather than numbers for SPRGs (v2)
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.
We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.
This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.
The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 13 Jul 2009 20:53:53 +0000 (20:53 +0000)]
powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can
reuse this preload slot by loading in the segment at 0x10000000, where almost
all PowerPC binaries are linked at.
On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), both the 32bit and
64bit context switch rate improves (tested on a 4GHz POWER6):
Anton Blanchard [Mon, 13 Jul 2009 20:53:52 +0000 (20:53 +0000)]
powerpc: Rearrange SLB preload code
With the new top down layout it is likely that the pc and stack will be in the
same segment, because the pc is most likely in a library allocated via a top
down mmap. Right now we bail out early if these segments match.
Rearrange the SLB preload code to sanity check all SLB preload addresses
are not in the kernel, then check all addresses for conflicts.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 13 Jul 2009 20:53:51 +0000 (20:53 +0000)]
powerpc: Move 64bit VDSO to improve context switch performance
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
is position independent we can remove the hint and let get_unmapped_area pick
an area. This will mean the vdso will be near other mmaps and will share
an SLB entry:
On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), our context switch
rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6).
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: expose the multi-bit ops that underlie single-bit ops.
The bitops.h functions that operate on a single bit in a bitfield are
implemented by operating on the corresponding word location. In all
cases the inner logic is valid if the mask being applied has more than
one bit set, so this patch exposes those inner operations. Indeed,
set_bits() was already available, but it duplicated code from
set_bit() (rather than making the latter a wrapper) - it was also
missing the PPC405_ERR77() workaround and the "volatile" address
qualifier present in other APIs. This corrects that, and exposes the
other multi-bit equivalents.
One advantage of these multi-bit forms is that they allow word-sized
variables to essentially be their own spinlocks, eg. very useful for
state machines where an atomic "flags" variable can obviate the need
for any additional locking.
Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/mpic: Fix MPIC_BROKEN_REGREAD on non broken MPICs
The workaround enabled by CONFIG_MPIC_BROKEN_REGREAD does not work
on non-broken MPICs. The symptom is no interrupts being received.
The fix is twofold. Firstly the code was broken for multiple isus,
we need to index into the shadow array with the src_no, not the idx.
Secondly, we always do the read, but only use the VECPRI_MASK and
VECPRI_ACTIVITY bits from the hardware, the rest of "val" comes
from the shadow.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Wed, 19 Aug 2009 02:41:47 +0000 (19:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: Fix prompt for LSM_MMAP_MIN_ADDR
security: Make LSM_MMAP_MIN_ADDR default match its help text.
Linus Torvalds [Wed, 19 Aug 2009 02:41:05 +0000 (19:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: use the right flag for get_vm_area()
percpu, sparc64: fix sparse possible cpu map handling
init: set nr_cpu_ids before setup_per_cpu_areas()
Bo Liu [Tue, 18 Aug 2009 21:11:19 +0000 (14:11 -0700)]
mm: build_zonelists(): move clear node_load[] to __build_all_zonelists()
If node_load[] is cleared everytime build_zonelists() is
called,node_load[] will have no help to find the next node that should
appear in the given node's fallback list.
Because of the bug, zonelist's node_order is not calculated as expected.
This bug affects on big machine, which has asynmetric node distance.
This (my bug) is very old but no one has reported this for a long time.
Maybe because the number of asynmetric NUMA is very small and they use
cpuset for customizing node memory allocation fallback.
[akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Signed-off-by: Bo Liu <bo-liu@hotmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Graff Yang [Tue, 18 Aug 2009 21:11:17 +0000 (14:11 -0700)]
nommu: check fd read permission in validate_mmap_request()
According to the POSIX (1003.1-2008), the file descriptor shall have been
opened with read permission, regardless of the protection options specified to
mmap(). The ltp test cases mmap06/07 need this.
Signed-off-by: Graff Yang <graff.yang@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 18 Aug 2009 21:11:17 +0000 (14:11 -0700)]
spi_s3c24xx: fix transfer setup code
Since the changes to the bitbang driver, there is the possibility we will
be called with either the speed_hz or bpw values zero. We take these to
mean that the default values (8 bits per word, or maximum bus speed).
Signed-off-by: Ben Dooks <ben@simtec.co.uk> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 18 Aug 2009 21:11:16 +0000 (14:11 -0700)]
spi_s3c24xx: fix clock rate calculation
Currently the clock rate calculation may round as pleased, which means
that it is possible that we will round down and end up with a faster clock
rate than intended.
Change the calculation to use DIV_ROUND_UP() to ensure that we end up with
a clock rate either the same as or lower than the user requested one.
Signed-off-by: Ben Dooks <ben@simtec.co.uk> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 18 Aug 2009 21:11:12 +0000 (14:11 -0700)]
mmc: add the new linux-mmc mailing list to MAINTAINERS
There are a number of individual MMC drivers listed in MAINTAINERS. I
didn't modify those records. Perhaps I should have.
Cc: <linux-mmc@vger.kernel.org> Cc: Manuel Lauss <manuel.lauss@gmail.com> Cc: Nicolas Pitre <nico@cam.org> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: Pavel Pisa <ppisa@pikron.com> Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Sascha Sommer <saschasommer@freenet.de> Cc: Ian Molton <ian@mnementh.co.uk> Cc: Joseph Chan <JosephChan@via.com.tw> Cc: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 18 Aug 2009 21:11:10 +0000 (14:11 -0700)]
mm: revert "oom: move oom_adj value"
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct. It was a very good first step for sanitize OOM.
However Paul Menage reported the commit makes regression to his job
scheduler. Current OOM logic can kill OOM_DISABLED process.
Why? His program has the code of similar to the following.
...
set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
...
if (vfork() == 0) {
set_oom_adj(0); /* Invoked child can be killed */
execve("foo-bar-cmd");
}
....
vfork() parent and child are shared the same mm_struct. then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent. Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.
Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.
Then, this patch revert commit 2ff05b2b and related commit.
Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 18 Aug 2009 21:11:08 +0000 (14:11 -0700)]
vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed
get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly
to a positive signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Steve French <smfrench@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones [Tue, 18 Aug 2009 17:47:37 +0000 (13:47 -0400)]
security: Make LSM_MMAP_MIN_ADDR default match its help text.
Commit 788084aba2ab7348257597496befcbccabdc98a3 added the LSM_MMAP_MIN_ADDR
option, whose help text states "For most ia64, ppc64 and x86 users with lots
of address space a value of 65536 is reasonable and should cause no problems."
Which implies that it's default setting was typoed.
Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
net: restore gnet_stats_basic to previous definition
NETROM: Fix use of static buffer
e1000e: fix use of pci_enable_pcie_error_reporting
e1000e: WoL does not work on 82577/82578 with manageability enabled
cnic: Fix locking in init/exit calls.
cnic: Fix locking in start/stop calls.
bnx2: Use mutex on slow path cnic calls.
cnic: Refine registration with bnx2.
cnic: Fix symbol_put_addr() panic on ia64.
gre: Fix MTU calculation for bound GRE tunnels
pegasus: Add new device ID.
drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
via-velocity: Fix test of mii_status bit VELOCITY_DUPLEX_FULL
rt2x00: fix memory corruption in rf cache, add a sanity check
ixgbe: Fix receive on real device when VLANs are configured
ixgbe: Do not return 0 in ixgbe_fcoe_ddp() upon FCP_RSP in DDP completion
netxen: free napi resources during detach
netxen: remove netxen workqueue
ixgbe: fix issues setting rx-usecs with legacy interrupts
can: fix oops caused by wrong rtnl newlink usage
...
Thomas Gleixner [Mon, 17 Aug 2009 12:07:16 +0000 (14:07 +0200)]
genirq: Wake up irq thread after action has been installed
The wake_up_process() of the new irq thread in __setup_irq() is too
early as the irqaction is not yet fully initialized especially
action->irq is not yet set. The interrupt thread might dereference the
wrong irq descriptor.
Move the wakeup after the action is installed and action->irq has been
set.
Reported-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Buesch <mb@bu3sch.de>
Eric Dumazet [Sun, 16 Aug 2009 09:36:49 +0000 (09:36 +0000)]
net: restore gnet_stats_basic to previous definition
In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.
Restoring old behavior is not welcome, for performance reason.
Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)
Based on a report and initial patch from Michael Spang.
Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Tue, 18 Aug 2009 01:05:32 +0000 (18:05 -0700)]
NETROM: Fix use of static buffer
The static variable used by nr_call_to_digi might result in corruption if
multiple threads are trying to usee a node or neighbour via ioctl. Fixed
by having the caller pass a structure in. This is safe because nr_add_node
rsp. nr_add_neigh will allocate a permanent structure, if needed.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 18 Aug 2009 00:35:26 +0000 (10:35 +1000)]
Fix new incorrect error return from do_md_stop.
Recent commit c8c00a6915a2e3d10416e8bdd3138429beb96210
changed the exit paths in do_md_stop and was not quite
careful enough. There is one path were 'err' now needs
to be cleared but it isn't.
So setting an array to readonly (with mdadm --readonly) will
work, but will incorrectly report and error: ENXIO.
Linus Torvalds [Mon, 17 Aug 2009 20:38:58 +0000 (13:38 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: define round_hint_to_min in !CONFIG_SECURITY
Security/SELinux: seperate lsm specific mmap_min_addr
SELinux: call cap_file_mmap in selinux_file_mmap
Capabilities: move cap_file_mmap to commoncap.c
Eric Paris [Mon, 17 Aug 2009 01:51:55 +0000 (21:51 -0400)]
inotify: start watch descriptor count at 1
The inotify_add_watch man page specifies that inotify_add_watch() will
return a non-negative integer. However, historically the inotify
watches started at 1, not at 0.
Turns out that the inotifywait program provided by the inotify-tools
package doesn't properly handle a 0 watch descriptor. In 7e790dd5 we
changed from starting at 1 to starting at 0. This patch starts at 1,
just like in previous kernels, but also just like in previous kernels
it's possible for it to wrap back to 0. This preserves the kernel
functionality exactly like it was before the patch (neither method broke
the spec)
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Mon, 17 Aug 2009 01:51:49 +0000 (21:51 -0400)]
inotify: tail drop inotify q_overflow events
In f44aebcc the tail drop logic of events with no file backing
(q_overflow and in_ignored) was reversed so IN_IGNORED events would
never be tail dropped. This now means that Q_OVERFLOW events are NOT
tail dropped. The fix is to not tail drop IN_IGNORED, but to tail drop
Q_OVERFLOW.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Mon, 17 Aug 2009 01:51:44 +0000 (21:51 -0400)]
notify: unused event private race
inotify decides if private data it passed to get added to an event was
used by checking list_empty(). But it's possible that the event may
have been dequeued and the private event removed so it would look empty.
The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes warnings like this:
CC fs/proc/meminfo.o
In file included from /work/linux/include/linux/mmzone.h:20,
from /work/linux/include/linux/gfp.h:4,
from /work/linux/include/linux/mm.h:8,
from /work/linux/fs/proc/meminfo.c:5:
/work/linux/arch/mips/include/asm/page.h:36:1: warning: "HPAGE_SIZE" redefined
In file included from /work/linux/fs/proc/meminfo.c:2:
/work/linux/include/linux/hugetlb.h:107:1: warning: this is the location of the previous definition
The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).
Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.
Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.
x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
The locking in xfs_iget_cache_hit currently has numerous problems:
- we clear the reclaim tag without i_flags_lock which protects
modifications to it
- we call inode_init_always which can sleep with pag_ici_lock
held (this is oss.sgi.com BZ #819)
- we acquire and drop i_flags_lock a lot and thus provide no
consistency between the various flags we set/clear under it
This patch fixes all that with a major revamp of the locking in
the function. The new version acquires i_flags_lock early and
only drops it once we need to call into inode_init_always or before
calling xfs_ilock.
This patch fixes a bug seen in the wild where we race modifying the
reclaim tag.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Paris [Fri, 31 Jul 2009 16:54:11 +0000 (12:54 -0400)]
Security/SELinux: seperate lsm specific mmap_min_addr
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Fri, 31 Jul 2009 16:54:05 +0000 (12:54 -0400)]
SELinux: call cap_file_mmap in selinux_file_mmap
Currently SELinux does not check CAP_SYS_RAWIO in the file_mmap hook. This
means there is no DAC check on the ability to mmap low addresses in the
memory space. This function adds the DAC check for CAP_SYS_RAWIO while
maintaining the selinux check on mmap_zero. This means that processes
which need to mmap low memory will need CAP_SYS_RAWIO and mmap_zero but will
NOT need the SELinux sys_rawio capability.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Fri, 31 Jul 2009 16:53:58 +0000 (12:53 -0400)]
Capabilities: move cap_file_mmap to commoncap.c
Currently we duplicate the mmap_min_addr test in cap_file_mmap and in
security_file_mmap if !CONFIG_SECURITY. This patch moves cap_file_mmap
into commoncap.c and then calls that function directly from
security_file_mmap ifndef CONFIG_SECURITY like all of the other capability
checks are done.
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
Leonardo Potenza [Sun, 16 Aug 2009 16:55:48 +0000 (18:55 +0200)]
x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.c
The function uv_acpi_madt_oem_check() has been marked __init,
the struct apic_x2apic_uv_x has been marked __refdata.
The aim is to address the following section mismatch messages:
WARNING: arch/x86/kernel/apic/built-in.o(.data+0x1368): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
WARNING: arch/x86/kernel/built-in.o(.data+0x68e8): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
WARNING: arch/x86/built-in.o(.text+0x7b36f): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_ioremap()
The function uv_acpi_madt_oem_check() references
the function __init early_ioremap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_ioremap is wrong.
WARNING: arch/x86/built-in.o(.text+0x7b38d): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_iounmap()
The function uv_acpi_madt_oem_check() references
the function __init early_iounmap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_iounmap is wrong.
WARNING: arch/x86/built-in.o(.data+0x8668): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Xiaotian Feng [Fri, 14 Aug 2009 14:35:52 +0000 (14:35 +0000)]
e1000e: fix use of pci_enable_pcie_error_reporting
commit 111b9dc5 ("e1000e: add aer support") introduces pcie aer
support for e1000e, but it is not reasonable to disable it in
e1000_remove but enable it in e1000_resume. This patch enables aer
support in e1000_probe.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Fri, 14 Aug 2009 14:35:33 +0000 (14:35 +0000)]
e1000e: WoL does not work on 82577/82578 with manageability enabled
With manageability (Intel AMT) enabled via BIOS, PHY wakeup does not get
configured on newer parts which use PHY wakeup vs. MAC wakeup which causes
WoL to not work. The driver should configure PHY wakeup whether or not
manageability is enabled.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:47 +0000 (15:49 +0000)]
cnic: Fix locking in init/exit calls.
The slow path ulp_init and ulp_exit calls to the bnx2i driver
are sleepable calls and therefore should not be protected using
rcu_read_lock. Fix it by using mutex and refcount during these
calls. cnic_unregister_driver() will now wait for the refcount
to go to zero before completing the call.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:46 +0000 (15:49 +0000)]
cnic: Fix locking in start/stop calls.
The slow path ulp_start and ulp_stop calls to the bnx2i driver
are sleepable calls and therefore should not be protected using
rcu_read_lock. Fix it by using mutex and setting a bit during
these calls. cnic_unregister_device() will now wait for the bit
to clear before completing the call.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:44 +0000 (15:49 +0000)]
cnic: Refine registration with bnx2.
Register and unregister with bnx2 during NETDEV_UP and NETDEV_DOWN
events. This simplifies the sequence of events and allows locking
fixes in the next patch.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:43 +0000 (15:49 +0000)]
cnic: Fix symbol_put_addr() panic on ia64.
When the cnic driver tries to grab a symbol from bnx2 when bnx2 is
running init code, symbol_get() will succeed but symbol_put_addr()
will hit BUG() a moment later. module_text_address() fails because
bnx2 is still in init code.
This is fixed by using symbol_put() instead which does the exact
opposite of symbol_get().
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>