Merge dm_create() and dm_create_with_minor() by introducing the special value
DM_ANY_MINOR to request the allocation of the next available minor number.
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kevin Corry [Mon, 26 Jun 2006 07:27:28 +0000 (00:27 -0700)]
[PATCH] dm mirror log: sector size fix
On-disk logs for dm-mirror devices are currently hard-coded to use 512 byte
hard-sector-sizes. This patch fixes dm-log so it will work with devices with
non-512-byte hard-sector-sizes.
To maintain full compatibility, instead of moving the clean-bits bitset to a
bitset, and enlarges the disk-header buffer to encompass both the header and
the bitset. The I/O routines for the bitset are removed, and the I/O routines
for the disk-header now also read/write the bitset.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Neil Brown [Mon, 26 Jun 2006 07:27:26 +0000 (00:27 -0700)]
[PATCH] dm: mirror sector offset fix
The device-mapper core does not perform any remapping of bios before passing
them to the targets. If a particular mapping begins part-way into a device,
targets obtain the sector relative to the start of the mapping by subtracting
ti->begin.
The dm-raid1 target didn't do this everywhere: this patch fixes it, taking
care to subtract ti->begin exactly once for each bio.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:25 +0000 (00:27 -0700)]
[PATCH] dm: fix block device initialisation
In alloc_dev(), we register the device with the block layer and then continue
to initialize the device. But register_disk() makes the device available to
be opened before we have completed initialising it.
This patch moves the final bits of the initialization above the disk
registration.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:25 +0000 (00:27 -0700)]
[PATCH] dm: add module ref counting
The reference counting on dm-mod is zero if no mapped devices are open. This
is incorrect, and can lead to an oops if the module is unloaded while mapped
devices exist.
This patch claims a reference to the module whenever a device is created, and
drops it again when the device is freed.
Devices must be removed before dm-mod is unloaded.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:23 +0000 (00:27 -0700)]
[PATCH] dm: add DMF_FREEING
There is a chicken and egg problem between the block layer and dm in which the
gendisk associated with a mapping keeps a reference-less pointer to the
mapped_device.
This patch uses a new flag DMF_FREEING to indicate when the mapped_device is
no longer valid. This is checked to prevent any attempt to open the device
from succeeding while the device is being destroyed.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:21 +0000 (00:27 -0700)]
[PATCH] dm: fix idr minor allocation
One part of the system can attempt to use a mapped device before another has
finished initialising it or while it is being freed.
This patch introduces a place holder value, MINOR_ALLOCED, to mark the minor
as allocated but in a state where it can't be used, such as mid-allocation or
mid-free. At the end of the initialization, it replaces the place holder with
the pointer to the mapped_device, making it available to the rest of the dm
subsystem.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Persistent snapshots currently store a private copy of the chunk size.
Userspace also supplies the chunk size when loading a snapshot. Ensure
consistency by only storing the chunk_size in one place instead of two.
Currently the two sizes will differ if the chunk size supplied by userspace
does not match the chunk size an existing snapshot actually uses. Amongst
other problems, this causes an incorrect 'percentage full' to be reported.
The patch ensures consistency by only storing the chunk_size in one place,
removing it from struct pstore. Some initialisation is delayed until the
correct chunk_size is known. If read_header() discovers that the wrong chunk
size was supplied, the 'area' buffer (which the header already got read into)
is reinitialised to the correct size.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] VT binding: Do not create a device file for class device 'fbcon'
The class device "fbcon" does not need to be a device file. Do not create one
by passing a major and minor number of zero to
class_device_create()/destroy().
[PATCH] VT binding: Make VT binding a Kconfig option
To enable this feature, CONFIG_VT_HW_CONSOLE_BINDING must be set to 'y'. This
feature will default to 'n' to minimize users accidentally corrupting their
virtual terminals.
[PATCH] VT binding: Add sysfs control to the VT layer
Add sysfs control to the VT layer. A new sysfs class, 'vtconsole', and class
devices 'vtcon[n]' are added. Each class device file has the following
attributes:
/sys/class/vtconsole/vtcon[n]/name - read-only attribute showing the
name of the current backend
/sys/class/vtconsole/vtcon[n]/bind - read/write attribute
where: 0 - backend is unbound/unbind backend from the VT layer
1 - backend is bound/bind backend to the VT layer
In addition, if any of the consoles are in KD_GRAPHICS mode, binding and
unbinding will not succeed. KD_GRAPHICS mode usually indicates that the
underlying console hardware is used for other purposes other than displaying
text (ie X). This feature should prevent binding/unbinding from interfering
with a graphics application using the VT.
[PATCH] VT binding: Update fbcon to support binding
The control for binding/unbinding is moved from fbcon to the console layer.
Thus the fbcon sysfs attributes, attach and detach, are also gone.
1. Add a notifier event that tells fbcon if a framebuffer driver has been
unregistered. If no registered driver remains, fbcon will unregister
itself from the console layer.
2. Replaced calls to give_up_console() with unregister_con_driver().
3. Still use take_over_console() instead of register_con_driver() to
maintain compatibility
4. Respect the parameter first_fb_vc and last_fb_vc instead of using 0 and
MAX_NR_CONSOLES - 1. These parameters are settable by the user.
5. When fbcon is completely unbound from the console layer, fbcon will
also release (iow, decrement module reference counts to zero) all fbdev
drivers. In other words, a bind or unbind request from the console layer
will propagate down to the framebuffer drivers.
6. If fbcon is not bound to the console, it will ignore all notifier
events (except driver registration and unregistration) and all sysfs
requests.
[PATCH] VT binding: Add binding/unbinding support for the VT console
The framebuffer console is now able to dynamically bind and unbind from the VT
console layer. Due to the way the VT console layer works, the drivers
themselves decide when to bind or unbind. However, it was decided that
binding must be controlled, not by the drivers themselves, but by the VT
console layer. With this, dynamic binding is possible for all VT console
drivers, not just fbcon.
Thus, the VT console layer will impose the following to all VT console
drivers:
- all registered VT console drivers will be entered in a private list
- drivers can register themselves to the VT console layer, but they cannot
decide when to bind or unbind. (Exception: To maintain backwards
compatibility, take_over_console() will automatically bind the driver after
registration.)
- drivers can remove themselves from the list by unregistering from the VT
console layer. A prerequisite for unregistration is that the driver must not
be bound.
The following functions are new in the vt.c:
register_con_driver() - public function, this function adds the VT console
driver to an internal list maintained by the VT console
bind_con_driver() - private function, it binds the driver to the console
take_over_console() is changed to call register_con_driver() followed by a
bind_con_driver(). This is the only time drivers can decide when to bind to
the VT layer. This is to maintain backwards compatibility.
unbind_con_driver() - private function, it unbinds the driver from its
console. The vacated consoles will be taken over by the default boot console
driver.
unregister_con_driver() - public function, removes the driver from the
internal list maintained by the VT console. It will only succeed if the
driver is currently unbound.
con_is_bound() checks if the driver is currently bound or not
give_up_console() is just a wrapper to unregister_con_driver().
There are also 3 additional functions meant to be called only by the tty layer
for sysfs control:
vt_bind() - calls bind_con_driver()
vt_unbind() - calls unbind_con_driver()
vt_show_drivers() - shows the list of registered drivers
Most VT console drivers will continue to work as is, but might have problems
when unbinding or binding which should be fixable with minimal changes.
[PATCH] Detaching fbcon: add capability to attach/detach fbcon
Add the ability to detach and attach the framebuffer console to and from the
vt layer. This is done by echo'ing any value to sysfs attributes located in
class/graphics/fbcon. The two attributes are:
attach - bind fbcon to the vt layer
detach - unbind fbcon from the vt layer
Once fbcon is detached from the vt layer, fbcon can be unloaded if compiled as
a module. This feature is quite useful for developers who work on the
framebuffer or console subsystem. This is also useful for users who want to
go to text mode or graphics mode without having to reboot.
Directly unloading the fbcon module is not possible because the vt layer
increments the module reference count for all bound consoles. Detaching fbcon
decrements the module reference count to zero so unloading becomes possible.
[PATCH] Detaching fbcon: sdd sysfs class device entry for fbcon
In order for this feature to work, an interface will be needed. The most
appropriate is sysfs. However, the framebuffer console has no sysfs entry
yet. This will create a sysfs class device entry for fbcon under
/sys/class/graphics.
Add a class_device entry 'fbcon' under class 'graphics'. Console-specific
attributes which where previously under class/graphics/fb[x] are moved to
class/graphics/fbcon. These attributes, 'con_rotate' and 'con_rotate_all',
are also renamed to 'rotate' and 'rotate_all' respectively.
[PATCH] Detaching fbcon: remove calls to pci_disable_device()
Detaching fbcon allows individual drivers to be unloaded. However several
drivers call pci_disable_device() upon exit. This function will disable the
BAR's which will kill VGA text mode and/or affect X/DRM.
To prevent this, remove calls to pci_disable_device() from several drivers.
To allow for detaching fbcon, it must be able to give up the console.
However, the function give_up_console() is plain broken. It just sets the
entries in the console driver map to NULL, it leaves the vt layer without a
console driver, and does not decrement the module reference count. Calling
give_up_console() is guaranteed to hang the machine..
To fix this problem, ensure that the virtual consoles are not left dangling
without a driver. All systems have a default boot driver (either vgacon or
dummycon) which is never unloaded. For those vt's that lost their driver, the
default boot driver is reassigned back to them.
[PATCH] Detaching fbcon: fix vgacon to allow retaking of the console
One of the limitations of the framebuffer console system is its inablity to
unload or detach itself from the console layer. And once it loads, it also
locks in framebuffer drivers preventing their unload. Although the con2fbmap
utility does provide a means to unload individual drivers, it requires that at
least one framebuffer driver is loaded for use by fbcon.
With this change, it is possible to detach fbcon from the console layer. If it
is detached, it will reattach the boot console driver (which is permanently
loaded) back to the console layer so the system can continue to work. As a
consequence, fbcon will also decrement its reference count of individual
framebuffer drivers, allowing all of these drivers to be unloaded even if
fbcon is still loaded.
Unless you use drivers that restores the display to text mode (rivafb and
i810fb, for example), detaching fbcon does require assistance from userspace
tools (ie, vbetools) for text mode to be restored completely. Without the
help of these tools, fbcon will leave the VGA console corrupted. The methods
that can be used will be described in Documentation/fb/fbcon.txt.
Because the vt layer also increments the module reference count for each
console driver, fbcon cannot be directly unloaded. It must be detached first
prior to unload.
Similarly, fbcon can be reattached to the console layer without having to
reload the module. A nice feature if fbcon is compiled statically.
Attaching and detaching fbcon is done via sysfs attributes. A class device
entry for fbcon is created in /sys/class/graphics. The two attributes that
controls this feature are detach and attach. Two other attributes that are
piggybacked under /sys/class/graphics/fb[n] that are fbcon-specific,
'con_rotate' and 'con_rotate_all' are moved to fbcon. They are renamed as
'rotate' and 'rotate_all' respectively.
Overall, this feature is a great help for developers working in the
framebuffer or console layer. There is not need to continually reboot the
kernel for every small change. It is also useful for regular users who wants
to choose between a graphical console or a text console without having to
reboot.
Example usage for x86:
/* start in text mode */
modprobe xxxfb
modprobe fbcon
/* graphical mode with fbcon using xxxfb */
echo 1 > /sys/class/graphics/fbcon/detach
/* back to text mode, will produce corrupt display unless vbetool is used */
rmmod xxxfb
modprobe yyyfb
/* back to graphical mode with fbcon using yyyfb */
Before trying out this feature, please read Documentation/fb/fbcon.txt.
This patch:
In order for fbcon to detach itself from the console layer, vgacon, which is a
boot console driver, must be fixed so it can retake the console multiple
times, not just during init. The following needs to be done:
- remove __init from the vgacon_startup, this is called again by
take_over_console().
- vc->rows and vc->cols are set manually by vgacon during init. After init,
vc_resize() can be used
- make sure the scrollback_buffer is not reallocated
Modify the sysfs description of a video mode such that modes are tagged with
their scan type, (p)rogessive, (i)nterlaced, (d)ouble scan. For example,
U:1920x1080i-50. This is useful to disambiguate some of the 'consumer' video
timings found in CEA-861 (especially those for EDTV).
Signed-off-by: Daniel R Thompson <daniel.thompson@st.com> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
WARNING: drivers/video/aty/atyfb.o - Section mismatch: reference to
.init.text:aty_init_cursor from .text between 'aty_init' (at offset 0x241d)
and 'atyfb_blank'
WARNING: drivers/video/macmodes.o - Section mismatch: reference to
.init.text:mac_find_mode from __ksymtab after '__ksymtab_mac_find_mode' (at
offset 0x10)
[PATCH] fbdev: Fix logo rotation if width != height
Logo drawing crashes or produces a corrupt display if the logo width and
height are not equal. The dimensions are transposed prior to the actual
rotation and the width is used instead of the height in the actual rotation
code. These produce a corrupt image.
[PATCH] neofb: fix unblank logic interfering with lid toggled backlight
This is a fix for the most annoying problem that remained with neofb:
After "setterm -powersave powerdown" the console blanker will disable the
backlight after the given timeout expires. If this happens after the lid
has been shut, we read "LCD off" from the register and store that in the
driver. Once the lid is opened, the backlight turns on, but any key press
that would awaken the blanked console will switch the backlight off again.
The workaround so far was to use the "display config toggle" Fn key combo -
once if no external display is attached, otherwise as often as required to
restore the desired display setup.
The following patch fixes the issue at least for the LCD-only case, with no
external monitor attached. Other display setup permutations are pending
further testing, but so far I can guarantee at least no negative change in
behaviour, if any at all.
Signed-off-by: Christian Trefzer <ctrefzer@gmx.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CONFIG_FB = m and CONFIG_{BACKLIGHT:LCD}_CLASS_DEVICE = y is possible
resulting in link errors. Fix by making backlight and lcd class also depend
on FB
Arnaud Patard [Mon, 26 Jun 2006 07:26:45 +0000 (00:26 -0700)]
[PATCH] s3c2410fb: Fix resume
regs.lcdcon1 was not updated on suspend. The result was a garbaged display on
resume. This bug was first noticed by Christer Weinigel. This patch is a
modified version of the one he sent to me.
David Hollister [Mon, 26 Jun 2006 07:26:41 +0000 (00:26 -0700)]
[PATCH] vt: Delay the update of the visible console
Delay the update of the visible framebuffer console until all other consoles
have been initialized in order to avoid losing information. This only seems
to be a problem with modules, not with built-in drivers.
Signed-off-by: David Hollister <david.hollister@amd.com> Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] atyfb: Remove unneeded calls to wait_for_idle
The drawing functions of atyfb is unecessary syncing the GPU which is
affecting performance. Remove the calls, any direct access by fbcon to the
framebuffer will always be preceeded by a call to atyfb_sync().
- In the 2-bit scheme of the cursor image, just set the first bit to be
always zero (turn off transparency and/or XOR), and just do the masking
manually
- The cursor color is converted into 32-bit RGBA8888 using struct fb_cmap.
Each component in the cmap is u16 in size, so mask the upper 8 bits.
nVidia is churning out chipsets like there's no tomorrow. And even though the
pci_device_id table now has numerous entries, it is still not guaranteed that
all supported devices are included or will be included.
Fortunately, nvidiafb has chipset detection logic built in. So, change the
contents of the pci_device_id table so it will capture all nVidia devices of
the display class. Unsupported chipsets will then be filtered out by
nvidiafb's detection logic.
[PATCH] fbdev: More accurate sync range extrapolation
The EDID block should specify the display's operating limits (vertical and
horizontal sync ranges, and maximum dot clock). If not given by the EDID
block, the ranges are extrapolated from the modelist. However, the
computation used is only a rough approximation, and the resulting values may
not reflect the actual capability of the display. This problem is frequently
encountered when the EDID block has a single entry, the single mode entry will
fail validation.
To prevent this, calculate the values based on the same method used in
fb_validate_mode().
[PATCH] savagefb: Add state save and_restore hooks
Reported by: Rich (Bugzilla Bug 6417)
"if savage driver is used in x.org together with savagefb, it results in
seriously garbled and distorted screen - coupled with severe slowdowns."
This bug is the result of Xorg unable to handle savagefb altering the
hardware which results in X failing to start properly and/or failed console
switching.
Add savagefb_state_save and savagefb_state_restore. These hooks will only
save and restore the extended VGA registers. Standard VGA registers will be
left alone. This is enough to make savagefb play nicely with the latest
Xorg savage driver, and perhaps with savage DRI. (Transient screen artifacts
may appear before X loads and during console switches).
(Unfortunately, blanking the screen also leaves Xorg in a blanked state, so I
have to unblank the screen before Xorg loads. So I doubt if the transient
screen artifacts will be completely invisible but hopefully it will only be
for a shorter duration (not much).)
[PATCH] savagefb: Allocate space for current and saved register states
Allocate space for 2 register states: 'current' for the current state of
the hardware, and 'saved', to be used for restoring the hardware to a sane
state. This is in preparation for the addition of state save and restore
hooks to make savagefb work together with the latest Xorg savage driver.
in ide_dump_opcode() takes the ide_lock in an irq-unsafe manner, i.e. this
function expects to be called with irqs disabled. But
ide_dump_ata[pi]_status() doesnt do that - it enables interrupts specifically.
That is a no-no - what guarantees that another IDE port couldnt generate an
IDE interrupt while we are dumping this error? The fix is to turn the
irq-enabling in these functions into irq-disabling.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sergei Shtylyov [Mon, 26 Jun 2006 07:26:16 +0000 (00:26 -0700)]
[PATCH] ide: pdc202xx_old: remove the obsolete busproc
Remove the busproc from pdc202xx_old.c because:
- it handles the obsolete HDIO_TRISTATE_HWIF ioctl instead of the modern
HDIO_SET_BUSSTATE, so treats its argument wrong;
- I don't think that tristating both channels is good idea (probably can't
be done otherwise since there seems to be only single bit controlling this).
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sergei Shtylyov [Mon, 26 Jun 2006 07:26:15 +0000 (00:26 -0700)]
[PATCH] ide: actually honor drive's minimum PIO/DMA cycle times
The function ide_timing_compute() fails to *actually* take drive's
specified minimum PIO/DMA cycle times into account -- when doing this, it
calls ide_timing_merge() on the 'struct ide_timing' argument which contains
garbage at the moment, and then ultimately destroys the read cycle time by
quantizing the ide_timing[] entry, instead of copying from that entry to
the argument structure, and only then doing a merge/quantize.
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Boldi [Mon, 26 Jun 2006 07:26:13 +0000 (00:26 -0700)]
[PATCH] ide-io: increase timeout value to allow for slave wakeup
During an STR resume cycle, the ide master disk times-out when there is
also a slave present (especially CD). Increasing the timeout in ide-io
from 10,000 to 100,000 fixes this problem.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Mon, 26 Jun 2006 07:26:12 +0000 (00:26 -0700)]
[PATCH] Fix IDE locking error
This bit us a few kernels ago, and for some reason never made it's way
upstream.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144743
Kernel panic - not syncing: drivers/ide/pci/piix.c:231:
spin_lock(drivers/ide/ide.c:c03cef28) already locked by driver/ide/ide-iops.c/1153.
Signed-off-by: Dave Jones <davej@redhat.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove a call to hwif->tuneproc() on the error path of
config_chipset_for_dma(), as its single caller
(pdc202xx_config_drive_xfer_rate()) will do the call in that case.
Signed-off-by: Tobias Oed <tobiasoed@hotmail.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After the previous patch SIGNAL_GROUP_EXIT implies a pending SIGKILL, we
can remove this check from copy_process() because we already checked
!signal_pending().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:09 +0000 (00:26 -0700)]
[PATCH] coredump: shutdown current process first
This patch optimizes zap_threads() for the case when there are no ->mm
users except the current's thread group. In that case we can avoid
'for_each_process()' loop.
It also adds a useful invariant: SIGNAL_GROUP_EXIT (if checked under
->siglock) always implies that all threads (except may be current) have
pending SIGKILL.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:08 +0000 (00:26 -0700)]
[PATCH] coredump: some code relocations
This is a preparation for the next patch. No functional changes.
Basically, this patch moves '->flags & SIGNAL_GROUP_EXIT' check into
zap_threads(), and 'complete(vfork_done)' into coredump_wait outside of
->mmap_sem protected area.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:08 +0000 (00:26 -0700)]
[PATCH] coredump: don't take tasklist_lock
This patch removes tasklist_lock from zap_threads().
This is safe wrt:
do_exit:
The caller holds mm->mmap_sem. This means that task which
shares the same ->mm can't pass exit_mm(), so it can't be
unhashed from init_task.tasks or ->thread_group lists.
fork:
None of sub-threads can fork after zap_process(leader). All
processes which were created before this point should be
visible to zap_threads() because copy_process() adds the new
process to the tail of init_task.tasks list, and ->siglock
lock/unlock provides a memory barrier.
de_thread:
It does list_replace_rcu(&leader->tasks, ¤t->tasks).
So zap_threads() will see either old or new leader, it does
not matter. However, it can change p->sighand, so we should
use lock_task_sighand() in zap_process().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:06 +0000 (00:26 -0700)]
[PATCH] coredump: speedup SIGKILL sending
With this patch a thread group is killed atomically under ->siglock. This is
faster because we can use sigaddset() instead of force_sig_info() and this is
used in further patches.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:05 +0000 (00:26 -0700)]
[PATCH] coredump: optimize ->mm users traversal
zap_threads() iterates over all threads to find those ones which share
current->mm. All threads in the thread group share the same ->mm, so we can
skip entire thread group if it has another ->mm.
This patch shifts the killing of thread group into the newly added
zap_process() function. This looks as unnecessary complication, but it is
used in further patches.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Paris [Mon, 26 Jun 2006 07:26:03 +0000 (00:26 -0700)]
[PATCH] SELinux: Add sockcreate node to procattr API
Below is a patch to add a new /proc/self/attr/sockcreate A process may write a
context into this interface and all subsequent sockets created will be labeled
with that context. This is the same idea as the fscreate interface where a
process can specify the label of a file about to be created. At this time one
envisioned user of this will be xinetd. It will be able to better label
sockets for the actual services. At this time all sockets take the label of
the creating process, so all xinitd sockets would just be labeled the same.
I tested this by creating a tcp sender and listener. The sender was able to
write to this new proc file and then create sockets with the specified label.
I am able to be sure the new label was used since the avc denial messages
kicked out by the kernel included both the new security permission
setsockcreate and all the socket denials were for the new label, not the label
of the running process.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 26 Jun 2006 07:26:01 +0000 (00:26 -0700)]
[PATCH] simplify/fix first_tid()
first_tid:
/* If nr exceeds the number of threads there is nothing todo */
if (nr) {
if (nr >= get_nr_threads(leader))
goto done;
}
This is not reliable: sub-threads can exit after this check, so the
'for' loop below can overlap and proc_task_readdir() can return an
already filldir'ed dirents.
for (; pos && pid_alive(pos); pos = next_thread(pos)) {
if (--nr > 0)
continue;
Off-by-one error, will return 'leader' when nr == 1.
This patch tries to fix these problems and simplify the code.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] proc: Remove tasklist_lock from proc_task_readdir.
This is just like my previous removal of tasklist_lock from first_tgid, and
next_tgid. It simply had to wait until it was rcu safe to walk the thread
list.
This should be the last instance of the tasklist_lock in proc. So user
processes should not be able to influence the tasklist lock hold times.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In process of getting proc_fd_access_allowed to work it has developed a few
warts. In particular the special case that always allows introspection and
the special case to allow inspection of kernel threads.
The special case for introspection is needed for /proc/self/mem.
The special case for kernel threads really should be overridable
by security modules.
So consolidate these checks into ptrace.c:may_attach().
The check to always allow introspection is trivial.
The check to allow access to kernel threads, and zombies is a little
trickier. mem_read and mem_write already verify an mm exists so it isn't
needed twice. proc_fd_access_allowed only doesn't want a check to verify
task->mm exits, s it prevents all access to kernel threads. So just move
the task->mm check into ptrace_attach where it is needed for practical
reasons.
I did a quick audit and none of the security modules in the kernel seem to
care if they are passed a task without an mm into security_ptrace. So the
above move should be safe and it allows security modules to come up with
more restrictive policy.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks
Since 2.2 we have been doing a chroot check to see if it is appropriate to
return a read or follow one of these magic symlinks. The chroot check was
asking a question about the visibility of files to the calling process and
it was actually checking the destination process, and not the files
themselves. That test was clearly bogus.
In my first pass through I simply fixed the test to check the visibility of
the files themselves. That naive approach to fixing the permissions was
too strict and resulted in cases where a task could not even see all of
it's file descriptors.
What has disturbed me about relaxing this check is that file descriptors
are per-process private things, and they are occasionaly used a user space
capability tokens. Looking a little farther into the symlink path on /proc
I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
there were permissions checking this.
But I was still concerned about privacy. Besides /proc there is only one
other way to find out this kind of information, and that is ptrace. ptrace
has been around for a long time and it has a well established security
model.
So after thinking about it I finally realized that the permission checks
that make sense are the permission checks applied to ptrace_attach. The
checks are simple per process, and won't cause nasty surprises for people
coming from less capable unices.
Unfortunately there is one case that the current ptrace_attach test does
not cover: Zombies and kernel threads. Single stepping those kinds of
processes is impossible. Being able to see which file descriptors are open
on these tasks is important to lsof, fuser and friends. So for these
special processes I made the rule you can't find out unless you have
CAP_SYS_PTRACE.
These proc permission checks should now conform to the principle of least
surprise. As well as using much less code to implement :)
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The code doesn't need to sleep to when making this check so I can just do the
comparison and not worry about the reference counts.
TODO: While looking at this I realized that my original cleanup did not push
the permission check far enough down into the stack. The call of
proc_check_dentry_visible needs to move out of the generic proc
readlink/follow link code and into the individual get_link instances.
Otherwise the shared resources checks are not quite correct (shared
files_struct does not require a shared fs_struct), and there are races with
unshare.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
that they work with struct pid instead of struct task_ref.
Mostly this is a straight 1-1 substitution.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every inode in /proc holds a reference to a struct task_struct. If a
directory or file is opened and remains open after the the task exits this
pinning continues. With 8K stacks on a 32bit machine the amount pinned per
file descriptor is about 10K.
Normally I would figure a reasonable per user process limit is about 100
processes. With 80 processes, with a 1000 file descriptors each I can trigger
the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
data.
This patch replaces the struct task_struct pointer with a pointer to a struct
task_ref which has a struct task_struct pointer. The so the pinning of dead
tasks does not happen.
The code now has to contend with the fact that the task may now exit at any
time. Which is a little but not muh more complicated.
With this change it takes about 1000 processes each opening up 1000 file
descriptors before I can trigger the OOM killer. Much better.
[mlp@google.com: task_mmu small fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Paul Jackson <pj@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Albert Cahalan <acahalan@gmail.com> Signed-off-by: Prasanna Meda <mlp@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings
Currently in /proc at several different places we define buffers to hold a
process id, or a file descriptor . In most of them we use either a hard coded
number or a different define. Modify them all to use PROC_NUMBUF, so the code
has a chance of being maintained.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Like the bug Oleg spotted in first_tid there was also a small off by one
error in first_tgid, when a seek was done on the /proc directory. This
fixes that and changes the code structure to make it a little more obvious
what is going on.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>