Brian Gerst [Tue, 10 Feb 2009 14:51:46 +0000 (09:51 -0500)]
x86: pass in pt_regs pointer for syscalls that need it
Some syscalls need to access the pt_regs structure, either to copy
user register state or to modifiy it. This patch adds stubs to load
the address of the pt_regs struct into the %eax register, and changes
the syscalls to regparm(1) to receive the pt_regs pointer as the
first argument.
Brian Gerst [Tue, 10 Feb 2009 14:51:45 +0000 (09:51 -0500)]
x86: use pt_regs pointer in do_device_not_available()
The generic exception handler (error_code) passes in the pt_regs
pointer and the error code (unused in this case). The commit
"x86: fix math_emu register frame access" changed this to pass by
value, which doesn't work correctly with stack protector enabled.
Change it back to use the pt_regs pointer.
Ingo Molnar [Wed, 11 Feb 2009 11:17:29 +0000 (12:17 +0100)]
stackprotector: fix multi-word cross-builds
Stackprotector builds were failing if CROSS_COMPILER was more than
a single world (such as when distcc was used) - because the check
scripts used $1 instead of $*.
Tejun Heo [Wed, 11 Feb 2009 07:31:00 +0000 (16:31 +0900)]
x86: fix x86_32 stack protector bugs
Impact: fix x86_32 stack protector
Brian Gerst found out that %gs was being initialized to stack_canary
instead of stack_canary - 20, which basically gave the same canary
value for all threads. Fixing this also exposed the following bugs.
* cpu_idle() didn't call boot_init_stack_canary()
* stack canary switching in switch_to() was being done too late making
the initial run of a new thread use the old stack canary value.
Fix all of them and while at it update comment in cpu_idle() about
calling boot_init_stack_canary().
Tejun Heo [Mon, 9 Feb 2009 13:17:40 +0000 (22:17 +0900)]
x86: implement x86_32 stack protector
Impact: stack protector for x86_32
Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.
x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.
Tejun Heo [Mon, 9 Feb 2009 13:17:40 +0000 (22:17 +0900)]
x86: make lazy %gs optional on x86_32
Impact: pt_regs changed, lazy gs handling made optional, add slight
overhead to SAVE_ALL, simplifies error_code path a bit
On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
doesn't have place for it and gs is saved/loaded only when necessary.
In preparation for stack protector support, this patch makes lazy %gs
handling optional by doing the followings.
* Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
* Save and restore %gs along with other registers in entry_32.S unless
LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
even when LAZY_GS. However, it adds no overhead to common exit path
and simplifies entry path with error code.
* Define different user_gs accessors depending on LAZY_GS and add
lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
lazy_*_gs() ops are used to save, load and clear %gs lazily.
* Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
xen and lguest changes need to be verified.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tejun Heo [Mon, 9 Feb 2009 13:17:40 +0000 (22:17 +0900)]
x86: add %gs accessors for x86_32
Impact: cleanup
On x86_32, %gs is handled lazily. It's not saved and restored on
kernel entry/exit but only when necessary which usually is during task
switch but there are few other places. Currently, it's done by
calling savesegment() and loadsegment() explicitly. Define
get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
While at it, clean up register access macros in signal.c.
This cleans up code a bit and will help future changes.
Tejun Heo [Mon, 9 Feb 2009 13:17:39 +0000 (22:17 +0900)]
stackprotector: update make rules
Impact: no default -fno-stack-protector if stackp is enabled, cleanup
Stackprotector make rules had the following problems.
* cc support test and warning are scattered across makefile and
kernel/panic.c.
* -fno-stack-protector was always added regardless of configuration.
Update such that cc support test and warning are contained in makefile
and -fno-stack-protector is added iff stackp is turned off. While at
it, prepare for 32bit support.
Tejun Heo [Mon, 9 Feb 2009 13:17:39 +0000 (22:17 +0900)]
elf: add ELF_CORE_COPY_KERNEL_REGS()
ELF core dump is used for both user land core dump and kernel crash
dump. Depending on architecture, register might need to be accessed
differently for userland and kernel. Allow architectures to define
ELF_CORE_COPY_KERNEL_REGS() and use different operation for kernel
register dump.
Tejun Heo [Mon, 9 Feb 2009 13:17:39 +0000 (22:17 +0900)]
x86: fix math_emu register frame access
do_device_not_available() is the handler for #NM and it declares that
it takes a unsigned long and calls math_emu(), which takes a long
argument and surprisingly expects the stack frame starting at the zero
argument would match struct math_emu_info, which isn't true regardless
of configuration in the current code.
This patch makes do_device_not_available() take struct pt_regs like
other exception handlers and initialize struct math_emu_info with
pointer to it and pass pointer to the math_emu_info to math_emulate()
like normal C functions do. This way, unless gcc makes a copy of
struct pt_regs in do_device_not_available(), the register frame is
correctly accessed regardless of kernel configuration or compiler
used.
This doesn't fix all math_emu problems but it at least gets it
somewhat working.
Alok Kataria [Fri, 6 Feb 2009 18:29:35 +0000 (10:29 -0800)]
x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
Commit 6194ba6ff6ccf8d5c54c857600843c67aa82c407 ("x86: don't special-case
pmd allocations as much") made changes to the way we handle pmd allocations,
and while doing that it dropped a call to paravirt_release_pd on the
pgd page from the pgd_dtor code path.
As a result of this missing release, the hypervisor is now unaware of the
pgd page being freed, and as a result it ends up tracking this page as a
page table page.
After this the guest may start using the same page for other purposes, and
depending on what use the page is put to, it may result in various performance
and/or functional issues ( hangs, reboots).
Since this release is only required for VMI, I now release the pgd page from
the (vmi)_pgd_free hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
x86: add clflush before monitor for Intel 7400 series
For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1].
This clflush makes sure that there are no false wakeups from mwait when the
monitored address was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Brian Gerst [Sun, 8 Feb 2009 14:58:40 +0000 (09:58 -0500)]
x86: fix abuse of per_cpu_offset
Impact: bug fix
Don't use per_cpu_offset() to determine if it valid to access a
per-cpu variable for a given cpu number. It is not a valid assumption
on x86-64 anymore. Use cpu_possible() instead.
Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Brian Gerst [Sun, 8 Feb 2009 14:58:39 +0000 (09:58 -0500)]
x86: use linker to offset symbols by __per_cpu_load
Impact: cleanup and bug fix
Use the linker to create symbols for certain per-cpu variables
that are offset by __per_cpu_load. This allows the removal of
the runtime fixup of the GDT pointer, which fixes a bug with
resume reported by Jiri Slaby.
radeonfb: Fix resume from D3Cold on some platforms
For historical reason, this driver used its own saving/restoring
of the PCI config space, and used the state of it on resume as
an indication as to whether it needed to re-POST the chip or not.
This methods breaks with the later core changes since the core will
have restored things for us.
This patch fixes it by removing that custom code, using standard
core methods to save/restore state, and testing for the need to
re-POST by comparing the content of a few key PLL registers.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aty128fb: Properly save PCI state before changing PCI PM level
This fixes aty128fb to properly save the PCI config space -before- it
potentially switches the PM state of the chip. This avoids a
warning with the new PM core and is the right thing to do anyway.
I also replaced the hand-coded switch to D2 with a call to the
genericc pci_set_power_state() and removed the code that switches it
back to D0 since the generic code is doing that for us nowadays.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atyfb: Properly save PCI state before changing PCI PM level
This fixes atyfb to properly save the PCI config space -before- it
potentially switches the PM state of the chip. This avoids a
warning with the new PM core and is the right thing to do anyway.
I also slightly cleaned up the code that checks whether we are
running on a PowerMac to do a runtime check instead of a compile
check only, and replaced a deprecated number with the proper
symbolic constant.
Finally, I removed the useless switch to D0 from resume since
the core does it for us.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cornelia Huck [Tue, 20 Jan 2009 14:31:31 +0000 (15:31 +0100)]
async: Rename _special -> _domain for clarity.
Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cornelia Huck [Mon, 19 Jan 2009 12:45:28 +0000 (13:45 +0100)]
async: Fix running list handling.
async_schedule() should pass in async_running as the running
list, and run_one_entry() should put the entry to be run on
the provided running list instead of always on the generic one.
Reported-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Linus Torvalds [Sat, 7 Feb 2009 18:46:30 +0000 (10:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI PM: make the PM core more careful with drivers using the new PM framework
PCI PM: Read power state from device after trying to change it on resume
PCI PM: Do not disable and enable bridges during suspend-resume
PCI: PCIe portdrv: Simplify suspend and resume
PCI PM: Fix saving of device state in pci_legacy_suspend
PCI PM: Check if the state has been saved before trying to restore it
PCI PM: Fix handling of devices without drivers
PCI: return error on failure to read PCI ROMs
PCI: properly clean up ASPM link state on device remove
Rusty Russell [Sat, 7 Feb 2009 07:45:56 +0000 (18:15 +1030)]
module: remove over-zealous check in __module_get()
Impact: fix spurious BUG_ON() triggered under load
module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
can give false results, for example).
Almost noone should be using __module_get, but that's another issue.
Linus Torvalds [Sat, 7 Feb 2009 16:30:20 +0000 (08:30 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (30 commits)
ACPI: Kconfig text - Fix the ACPI_CONTAINER module name according to the real module name.
eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init
ACPICA: Fix table entry truncation calculation
ACPI: Enable bit 11 in _PDC to advertise hw coord
ACPI: struct device - replace bus_id with dev_name(), dev_set_name()
ACPI: add missing KERN_* constants to printks
ACPI: dock: Don't eval _STA on every show_docked sysfs read
ACPI: disable ACPI cleanly when bad RSDP found
ACPI: delete CPU_IDLE=n code
ACPI: cpufreq: Remove deprecated /proc/acpi/processor/../performance proc entries
ACPI: make some IO ports off-limits to AML
ACPICA: add debug dump of BIOS _OSI strings
ACPI: proc_dir_entry 'video/VGA' already registered
ACPI: Skip the first two elements in the _BCL package
ACPI: remove BM_RLD access from idle entry path
ACPI: remove locking from PM1x_STS register reads
eeepc-laptop: use netlink interface
eeepc-laptop: Implement rfkill hotplugging in eeepc-laptop
eeepc-laptop: Check return values from rfkill_register
eeepc-laptop: Add support for extended hotkeys
...
Darren Salt [Sat, 7 Feb 2009 06:02:07 +0000 (01:02 -0500)]
eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init
I got the following oops while changing the backlight brightness during
startup. When it happens, it prevents use of the hotkeys, Fn-Fx, and the
lid button.
It's a clear use-before-init, as I verified by testing with an
appropriately-placed "else printk".
Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Myron Stowe [Fri, 30 Jan 2009 22:44:53 +0000 (15:44 -0700)]
ACPICA: Fix table entry truncation calculation
During early boot, ACPI RSDT/XSDT table entries are gathered into the
'initial_tables[]' array. This array is currently statically defined (see
./drivers/acpi/tables.c). When there are more table entries than can be
held in the 'initial_tables[]' array, the message "Truncating N table
entries!" is output. As currently implemented, this message will always
erroneously calculate N as 0.
This patch fixes the calculation that determines how many table entries
will be missing (truncated).
This modification may be used under either the GPL or the BSD-style
license used for Intel ACPI CA code.
Signed-off-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Bit 11 in intel PDC definitions is meant for OS capability to handle
hardware coordination of P-states. In Linux we have always supported
hwardware coordination of P-states. Just let the BIOSes know that we
support it, by setting this bit.
Some BIOSes use this bit to choose between hardware or software coordination
and without this change below, BIOSes switch to software coordination, which
is not very optimal in terms of power consumption and extra wakeups from idle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Frank Seidel [Wed, 4 Feb 2009 16:03:07 +0000 (17:03 +0100)]
ACPI: add missing KERN_* constants to printks
According to kerneljanitors todo list all printk calls (beginning
a new line) should have an according KERN_* constant.
Those are the missing peaces here for the acpi subsystem.
Signed-off-by: Frank Seidel <frank@f-seidel.de> Signed-off-by: Len Brown <len.brown@intel.com>
Holger Macht [Tue, 20 Jan 2009 11:18:24 +0000 (12:18 +0100)]
ACPI: dock: Don't eval _STA on every show_docked sysfs read
Some devices trigger a DEVICE_CHECK on every evalutation of _STA. This
can also be seen in commit 8b59560a3baf2e7c24e0fb92ea5d09eca92805db
(ACPI: dock: avoid check _STA method). If an undock is processed, the
dock driver sends a uevent and userspace might read the show_docked
property in sysfs. This causes an evaluation of _STA of the particular
device which causes the dock driver to immediately dock again.
In any case, evaluation of _STA (show_docked) does not necessarily mean
that we are docked, so check with the internal device structure.
http://bugzilla.kernel.org/show_bug.cgi?id=12360
Signed-off-by: Holger Macht <hmacht@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (37 commits)
Btrfs: Make sure dir is non-null before doing S_ISGID checks
Btrfs: Fix memory leak in cache_drop_leaf_ref
Btrfs: don't return congestion in write_cache_pages as often
Btrfs: Only prep for btree deletion balances when nodes are mostly empty
Btrfs: fix btrfs_unlock_up_safe to walk the entire path
Btrfs: change btrfs_del_leaf to drop locks earlier
Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
Btrfs: Don't try to compress pages past i_size
Btrfs: join the transaction in __btrfs_setxattr
Btrfs: Handle SGID bit when creating inodes
Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
Btrfs: Change btree locking to use explicit blocking points
Btrfs: hash_lock is no longer needed
Btrfs: disable leak debugging checks in extent_io.c
Btrfs: sort references by byte number during btrfs_inc_ref
Btrfs: async threads should try harder to find work
Btrfs: selinux support
Btrfs: make btrfs acls selectable
Btrfs: Catch missed bios in the async bio submission thread
Btrfs: fix readdir on 32 bit machines
...
Tyler Hicks [Sat, 7 Feb 2009 00:06:51 +0000 (18:06 -0600)]
eCryptfs: Regression in unencrypted filename symlinks
The addition of filename encryption caused a regression in unencrypted
filename symlink support. ecryptfs_copy_filename() is used when dealing
with unencrypted filenames and it reported that the new, copied filename
was a character longer than it should have been.
This caused the return value of readlink() to count the NULL byte of the
symlink target. Most applications don't care about the extra NULL byte,
but a version control system (bzr) helped in discovering the bug.
Roland McGrath [Sat, 7 Feb 2009 02:15:18 +0000 (18:15 -0800)]
x86-64: fix int $0x80 -ENOSYS return
One of my past fixes to this code introduced a different new bug.
When using 32-bit "int $0x80" entry for a bogus syscall number,
the return value is not correctly set to -ENOSYS. This only happens
when neither syscall-audit nor syscall tracing is enabled (i.e., never
seen if auditd ever started). Test program:
/* gcc -o int80-badsys -m32 -g int80-badsys.c
Run on x86-64 kernel.
Note to reproduce the bug you need auditd never to have started. */
#include <errno.h>
#include <stdio.h>
int
main (void)
{
long res;
asm ("int $0x80" : "=a" (res) : "0" (99999));
printf ("bad syscall returns %ld\n", res);
return res != -ENOSYS;
}
The fix makes the int $0x80 path match the sysenter and syscall paths.
Reported-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Roland McGrath <roland@redhat.com>
Roland McGrath [Sat, 7 Feb 2009 01:34:07 +0000 (17:34 -0800)]
elf core dump: fix get_user use
The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
use get_user() for a normal user-space address.
Checking for VM_READ optimizes out the case where get_user() would fail
anyway. The vm_file check here was already superfluous given the control
flow earlier in the function, so that is a cleanup/optimization unrelated
to other changes but an obvious and trivial one.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>
moved the place in which the 'safeness' of a SUID/SGID exec was performed to
before de_thread() was called. This means that LSM_UNSAFE_SHARE is now
calculated incorrectly. This flag is set if any of the usage counts for
fs_struct, files_struct and sighand_struct are greater than 1 at the time the
determination is made. All of which are true for threads created by the
pthread library.
However, since we wish to make the security calculation before irrevocably
damaging the process so that we can return it an error code in the case where
we decide we want to reject the exec request on this basis, we have to make the
determination before calling de_thread().
So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
(CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and
so can be discounted by check_unsafe_exec().
We do have to be careful because CLONE_THREAD does not imply FS or FILES.
We _assume_ that there will be no extra references to these structs held by the
threads we're going to kill.
This can be tested with the attached pair of programs. Build the two programs
using the Makefile supplied, and run ./test1 as a non-root user. If
successful, you should see something like:
[dhowells@andromeda tmp]$ ./test1
--TEST1--
uid=4043, euid=4043 suid=4043
exec ./test2
--TEST2--
uid=4043, euid=0 suid=0
SUCCESS - Correct effective user ID
Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David Smith <dsmith@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
Dave Kleikamp [Fri, 6 Feb 2009 20:59:26 +0000 (14:59 -0600)]
vfs: Don't call attach_nobh_buffers() with an empty list
This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu>
nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh->b_this_page.
This can be illustrated by running the writev01 testcase from LTP on jfs.
This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()". That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page. In that case, we won't have to
worry about a failed write leaving unitialized data in the page.
Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Dmitri Monakhov <dmonakhov@openvz.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Feb 2009 16:48:16 +0000 (08:48 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
ieee1394: dv1394: move deprecation message from module init to file open
firewire: core: Remove card from list of cards when enable fails
I created commit 7971db5a4b4176ad5df590fce07a962c643a2740 on a machine
where I forgot to set user.name and user.email before. The default
values were not optimal.
Li Zefan [Fri, 6 Feb 2009 08:17:19 +0000 (08:17 +0000)]
fork.c: fix NULL pointer dereference when nr_threads == threads-max
I happened to forked lots of processes, and hit NULL pointer dereference.
It is because in copy_process() after checking max_threads, 0 is returned
but not -EAGAIN.
Linus Torvalds [Fri, 6 Feb 2009 15:41:10 +0000 (07:41 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: Ensure an md array never has too many devices.
md: Fix a bug in linear.c causing which_dev() to return the wrong device.
md: Allow read error in a single drive raid1 to be passed up.
Stefan Richter [Tue, 3 Feb 2009 16:54:31 +0000 (17:54 +0100)]
ieee1394: dv1394: move deprecation message from module init to file open
On many Linux installations, the dv1394 driver will be auto-loaded
whenever an AV/C device (e.g. camcorder or audio device) is plugged in.
An irritating message would then appear in the kernel log.
Defer this message to until a dv1394 character device file is actually
used by a program. Also include the program name in the message and
update the message slightly.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Clemens Ladisch [Fri, 6 Feb 2009 07:13:07 +0000 (08:13 +0100)]
sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices
For audio devices that do not have proper audio descriptors (e.g.,
Edirol UA-20), we use hardcoded parameters from our quirks list.
However, we must still read the maximum packet size from the standard
endpoint descriptor; otherwise, we might use packets that are too big
and therefore rejected by the USB core.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
NeilBrown [Fri, 6 Feb 2009 07:02:46 +0000 (18:02 +1100)]
md: Ensure an md array never has too many devices.
Each different metadata format supported by md supports a
different maximum number of devices.
We really should be enforcing this maximum in the kernel, but
we aren't quite doing that properly.
We currently only enforce it at the 'hot_add' point, which is an
older interface which is not used by current userspace.
We need to also enforce it at 'add_new_disk' time for active arrays
and at 'do_md_run' time when starting a new array.
So move the test from 'hot_add' into 'bind_rdev_to_array' which is
called from both 'hot_add' and 'add_new_disk, and add a new
test in 'analyse_sbs' which is called from 'do_md_run'.
This bug (or missing feature) has been around "forever" and so
the patch is suitable for any -stable that is currently maintained.
which_dev() computes the device holding a given sector by shifting
down the sector number to a 32 bit range, dividing by the array
spacing and looking up the resulting index in the hash table of
the array.
Because the computed index might be slightly too small, a loop at
the end of which_dev() increases the index until the given sector
actually falls into the range of the device associated with that index.
The changes of the above mentioned commit caused this loop to check
whether the _index_ rather than the sector number is small enough,
effectively bypassing the loop and thus possibly returning the wrong
device.
As reported by Simon Kirby, this leads to errors such as
NeilBrown [Fri, 6 Feb 2009 04:06:47 +0000 (15:06 +1100)]
md: Allow read error in a single drive raid1 to be passed up.
If a raid1 only has a single working device and gets a read error,
we choose to simply return that error up to the filesystem (or whatever)
rather than failing the whole array.
However the codes doesn't quite do that. We attempt a readbalance
which allocates the same drive, so we retry the read - indefinitely.
Instead: If read_balance in the error case chooses the same drive that just
failed, treat it as a failure and don't retry.
Linus Torvalds [Fri, 6 Feb 2009 00:12:38 +0000 (16:12 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"
Linus Torvalds [Fri, 6 Feb 2009 00:11:54 +0000 (16:11 -0800)]
Merge branch 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix up T-bit error handling in SH-4A mutex fastpath.
sh: Fix up spurious syscall restarting.
sh: fcnvds fix with denormalized numbers on SH-4 FPU.
sh: Only reserve memory under CONFIG_ZERO_PAGE_OFFSET when it != 0.
sh: Handle calling csum_partial with misaligned data
sh: ap325rxa: Enable ov772x in defconfig.
sh: ap325rxa: Add ov772x support.
sh: ap325rxa: control camera power toggling.
sh: mach-migor: Enable ov772x and tw9910 in defconfig.
Herbert Xu [Thu, 5 Feb 2009 23:15:50 +0000 (15:15 -0800)]
ipv6: Copy cork options in ip6_append_data
As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking. This patch applies the simplest fix
which is to memdup all the relevant bits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Thu, 5 Feb 2009 21:30:05 +0000 (00:30 +0300)]
seq_file: fix big-enough lseek() + read()
lseek() further than length of the file will leave stale ->index
(second-to-last during iteration). Next seq_read() will not notice
that ->f_pos is big enough to return 0, but will print last item
as if ->f_pos is pointing to it.
Eric Biederman [Wed, 4 Feb 2009 23:12:25 +0000 (15:12 -0800)]
seq_file: move traverse so it can be used from seq_read
In 2.6.25 some /proc files were converted to use the seq_file
infrastructure. But seq_files do not correctly support pread(), which
broke some usersapce applications.
To handle pread correctly we can't assume that f_pos is where we left it
in seq_read. So move traverse() so that we can eventually use it in
seq_read and do thus some day support pread().
Signed-off-by: Eric Biederman <ebiederm@xmission.com> Cc: Paul Turner <pjt@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dean Nelson [Wed, 4 Feb 2009 23:12:24 +0000 (15:12 -0800)]
sgi-xp: fix writing past the end of kzalloc()'d space
A missing type cast results in writing way beyond the end of a kzalloc()'d
memory segment resulting in slab corruption. But it seems like the better
solution is to define ->recv_msg_slots as a 'void *' rather than a
'struct xpc_notify_mq_msg_uv *' and add the type cast.
Signed-off-by: Dean Nelson <dcn@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Wed, 4 Feb 2009 23:12:18 +0000 (15:12 -0800)]
drivers/video/backlight: rename da903x to da903x_bl
Currently both da903x backlight and voltage reulator drivers have the
same name. Rename the backlight driver to allow use of both drivers as
modules.
Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Eric Miao <eric.miao@marvell.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atmel-ssc: fix misuse of dev_dbg when requested ssc instance is not found
The ssc pointer is not valid when the id is not found in the list.
Convert the message from a debug one into an error message and avoid
dereferencing the bad pointer.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Huang Weiyi <weiyi.huang@gmail.com> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Carsten Otte [Wed, 4 Feb 2009 23:12:16 +0000 (15:12 -0800)]
do_wp_page: fix regression with execute in place
Fix do_wp_page for VM_MIXEDMAP mappings.
In the case where pfn_valid returns 0 for a pfn at the beginning of
do_wp_page and the mapping is not shared writable, the code branches to
label `gotten:' with old_page == NULL.
In case the vma is locked (vma->vm_flags & VM_LOCKED), lock_page,
clear_page_mlock, and unlock_page try to access the old_page.
This patch checks whether old_page is valid before it is dereferenced.
Johannes Weiner [Wed, 4 Feb 2009 23:12:14 +0000 (15:12 -0800)]
wait: prevent exclusive waiter starvation
With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.
Interruptible waiters don't do that when aborting due to a signal. And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.
This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently. The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.
Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock. Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.
Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.
[akpm@linux-foundation.org: coding-style fixes] Reported-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Mentored-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> ["after some testing"] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Kebert [Wed, 4 Feb 2009 23:12:12 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6710
Add support for the HP laptops of model 6710x for having correctly setup
axes.
Signed-off-by: Martin Kebert <gkmarty@gmail.com> Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Herrmann [Wed, 4 Feb 2009 23:12:11 +0000 (15:12 -0800)]
lis3lv02d: add axes knowledge for HP 6730
Add support for the HP laptops of model 6730x for having correctly setup
axes.
Signed-off-by: Pavel Herrmann <morpheus.ibis@gmail.com> Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frans Pop <elendil@planet.nl> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Matthew Garrett <mjg@redhat.com> Reported-by: Helge Deller <deller@gmx.de> Testted-by: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 4 Feb 2009 23:12:06 +0000 (15:12 -0800)]
revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY"
Revert commit 0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f because it causes
(arguably poorly designed) existing userspace to spend interminable
periods closing billions of not-open file descriptors.
We could bring this back, with some sort of opt-in tunable in /proc, which
defaults to "off".
Peter's alanysis follows:
: I spent several hours trying to get to the bottom of a serious
: performance issue that appeared on one of our servers after upgrading to
: 2.6.28. In the end it's what could be considered a userspace bug that
: was triggered by a change in 2.6.28. Since this might also affect other
: people I figured I'd at least document what I found here, and maybe we
: can even do something about it:
:
:
: So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
: the team maintaining our ftp archive complained that one of their
: scripts that previously ran in a few minutes still hadn't even come
: close to being done after an hour or so. Downgrading to 2.6.27 fixed
: that.
:
: Turns out that script is forking a lot and something in it or python or
: whereever closes all the file descriptors it doesn't want to pass on.
: That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
: closes them all with a few exceptions.
:
: Turns out that takes a long time when your limit -n is now 2^20 (1048576).
:
: With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
: now a thousand times that.
:
: 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
: RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
: allows, as the title implies, to set the limit for number of files to
: infinity.
:
: Closer investigation showed that the broken default ulimit did not apply
: to "system" processes (like stuff started from init). In the end I
: could establish that all processes that passed through pam_limit at one
: point had the bad resource limit.
:
: Apparently the pam library in Debian etch (4.0) initializes the limits
: to some default values when it doesn't have any settings in limit.conf
: to override them. Turns out that for nofiles this is RLIM_INFINITY.
: Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
: package version 0.79-5 fixes that - tho I'm not sure what side effects
: that has.
:
: Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
: uses a different pam (version).
Reported-by: Peter Palfrader <weasel@debian.org> Cc: Adam Tkac <vonsch@gmail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>