Pierre Ossman [Wed, 29 Nov 2006 11:10:52 +0000 (12:10 +0100)]
MMC: Always use a sector size of 512 bytes
Both MMC and SD specifications specify (although a bit unclearly in the MMC
case) that a sector size of 512 bytes must always be supported by the card.
Cards can report larger "native" size than this, and cards >= 2 GB even
must do so. Most other readers use 512 bytes even for these cards. We should
do the same to be compatible.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Herbert Xu [Wed, 29 Nov 2006 11:06:04 +0000 (12:06 +0100)]
SCTP: Always linearise packet on input
I was looking at a RHEL5 bug report involving Xen and SCTP
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212550).
It turns out that SCTP wasn't written to handle skb fragments at
all. The absence of any calls to skb_may_pull is testament to
that.
It just so happens that Xen creates fragmented packets more often
than other scenarios (header & data split when going from domU to
dom0). That's what caused this bug to show up.
Until someone has the time sits down and audits the entire net/sctp
directory, here is a conservative and safe solution that simply
linearises all packets on input.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Al Viro [Wed, 29 Nov 2006 10:40:22 +0000 (11:40 +0100)]
add forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call. Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Olaf Kirch [Wed, 29 Nov 2006 09:59:22 +0000 (10:59 +0100)]
[UDP]: Make udp_encap_rcv use pskb_may_pull
Make udp_encap_rcv use pskb_may_pull
IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset,
when header split is enabled. When receiving sufficiently large packets, the
driver puts everything up to and including the UDP header into the header
portion of the skb, and the rest goes into the paged part. udp_encap_rcv
forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it
passes it up it to the IKE daemon.
Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Alan Stern [Sat, 25 Nov 2006 01:47:52 +0000 (02:47 +0100)]
USB: UHCI: Increase port-reset completion delay for HP controllers
This patch (as657) increases the port-reset completion delay in uhci-hcd
for HP's embedded controllers. Unlike other UHCI controllers, the HP
chips can take as long as 250 us to carry out the processing associated
with finishing a port reset.
This fixes Novell bug #148761.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
The hptiop just got merged with a horrible amount of really bad ioctl
code that is against the standards for new scsi drivers. This patch
backs it out (and fixes a small bug where scsi_add_host is called to
early). We can re-add proper APIs once we agree on them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Updates:
- don't bypass SYNCHRONIZE_CACHE command
- return SCSI_MLQUEUE_HOST_BUSY when no free request slots
- move scsi_remove_host() to the begin of hpt_remove(), or it will
not work after resources being released.
Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
HighPoint RocketRAID 3220/3320 series 8 channel PCI-X SATA RAID Host
Adapters.
Fixes from original submission:
Merge Andrew Morton's patches:
- Provide locking for global list
- Fix debug printks
- uninline function with multiple callsites
- coding style fixups
- remove unneeded casts of void*
- kfree(NULL) is legal
- Don't "succeed" if register_chrdev() failed - otherwise we'll later
unregister a not-registered chrdev.
- Don't return from hptiop_do_ioctl() with the spinlock held.
- uninline __hpt_do_ioctl()
Update for Arjan van de Ven's comments:
- put all asm/ includes after the linux/ ones
- replace mdelay with msleep
- add pci posting flush
- do not set pci command reqister in map_pci_bar
- do not try merging sg elements in hptiop_buildsgl()
- remove unused outstandingcommands member from hba structure
- remove unimplemented hptiop_abort() handler
- remove typedef u32 hpt_id_t
Other updates:
- fix endianess
Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Hidetoshi Seto [Fri, 24 Nov 2006 02:11:19 +0000 (03:11 +0100)]
sysfs: remove duplicated dput in sysfs_update_file
Following function can drops d_count twice against one reference
by lookup_one_len.
<SOURCE>
/**
* sysfs_update_file - update the modified timestamp on an object attribute.
* @kobj: object we're acting for.
* @attr: attribute descriptor.
*/
int sysfs_update_file(struct kobject * kobj, const struct attribute * attr)
{
struct dentry * dir = kobj->dentry;
struct dentry * victim;
int res = -ENOENT;
mutex_lock(&dir->d_inode->i_mutex);
victim = lookup_one_len(attr->name, dir, strlen(attr->name));
if (!IS_ERR(victim)) {
/* make sure dentry is really there */
if (victim->d_inode &&
(victim->d_parent->d_inode == dir->d_inode)) {
victim->d_inode->i_mtime = CURRENT_TIME;
fsnotify_modify(victim);
/**
* Drop reference from initial sysfs_get_dentry().
*/
dput(victim);
res = 0;
} else
d_drop(victim);
/**
* Drop the reference acquired from sysfs_get_dentry() above.
*/
dput(victim);
}
mutex_unlock(&dir->d_inode->i_mutex);
return res;
}
</SOURCE>
PCI-hotplug (drivers/pci/hotplug/pci_hotplug_core.c) is only user of
this function. I confirmed that dentry of /sys/bus/pci/slots/XXX/*
have negative d_count value.
This patch removes unnecessary dput().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Kirill Korotaev [Fri, 24 Nov 2006 02:08:27 +0000 (03:08 +0100)]
fix sys_getppid oopses on debug kernel
sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.
So this patch always takes the lock to be safe.
Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Al Viro [Fri, 24 Nov 2006 02:03:34 +0000 (03:03 +0100)]
[IPX]: Annotate and fix IPX checksum
Calculation of IPX checksum got buggered about 2.4.0. The old variant
mangled the packet; that got fixed, but calculation itself got buggered.
Restored the correct logics, fixed a subtle breakage we used to have even
back then: if the sum is 0 mod 0xffff, we want to return 0, not 0xffff.
The latter has special meaning for IPX (cheksum disabled). Observation
(and obvious fix) nicked from history of FreeBSD ipx_cksum.c...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Need to check some more cases in IPX receive. If the skb is purely
fragments, the IPX header needs to be extracted. The function
pskb_may_pull() may in theory invalidate all the pointers in the skb,
so references to ipx header must be refreshed.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
This patch will linearize and check there is enough data.
It handles the pprop case as well as avoiding a whole audit of
the routing code.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Adrian Bunk [Thu, 23 Nov 2006 01:11:07 +0000 (02:11 +0100)]
[SCSI] advansys pci tweaks.
Remove a lot of duplicate #defines from the advansys driver,
and make them look like PCI IDs as defined elsewhere in the kernel.
Also add a module table so that it automatically gets picked up
by tools relying on modinfo output (like say, distro installers).
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Randy Dunlap [Thu, 23 Nov 2006 01:09:36 +0000 (02:09 +0100)]
advansys section fixes
Priority: not critical.
Mark 3 functions __init. Saves a little memory.
This makes these functions' calls to AdvWaitEEPCmd() (which is __init)
be clean (i.e., eliminates text -> init -> text call chain).
Fix multiple section mismatch warnings:
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet3550EEPConfig' (at offset 0x7a22) and 'AdvSet38C0800EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet3550EEPConfig' (at offset 0x7a4e) and 'AdvSet38C0800EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet3550EEPConfig' (at offset 0x7a79) and 'AdvSet38C0800EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet3550EEPConfig' (at offset 0x7aa2) and 'AdvSet38C0800EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet3550EEPConfig' (at offset 0x7abb) and 'AdvSet38C0800EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C0800EEPConfig' (at offset 0x7ae0) and 'AdvSet38C1600EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C0800EEPConfig' (at offset 0x7b0c) and 'AdvSet38C1600EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C0800EEPConfig' (at offset 0x7b37) and 'AdvSet38C1600EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C0800EEPConfig' (at offset 0x7b60) and 'AdvSet38C1600EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C0800EEPConfig' (at offset 0x7b79) and 'AdvSet38C1600EEPConfig'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C1600EEPConfig' (at offset 0x7b9e) and 'AdvExeScsiQueue'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C1600EEPConfig' (at offset 0x7bca) and 'AdvExeScsiQueue'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C1600EEPConfig' (at offset 0x7bf5) and 'AdvExeScsiQueue'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C1600EEPConfig' (at offset 0x7c1e) and 'AdvExeScsiQueue'
WARNING: drivers/scsi/advansys.o - Section mismatch: reference to .init.text: from .text between 'AdvSet38C1600EEPConfig' (at offset 0x7c37) and 'AdvExeScsiQueue'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
POWERPC: Make alignment exception always check exception table
The alignment exception used to only check the exception table for
-EFAULT, not for other errors. That opens an oops window if we can
coerce the kernel into getting an alignment exception for other reasons
in what would normally be a user-protected accessor, which can be done
via some of the futex ops. This fixes it by always checking the
exception tables.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Artur Skawina [Mon, 20 Nov 2006 21:32:56 +0000 (22:32 +0100)]
sis900 adm7001 PHY support
this patch is required to get a SIS964 based motherboard ethernet working
(FSC D1875) (picking the #1 transceiver, instead of the last one, in case
no known ones were found might be a better default, and would have worked
in this case too)
Signed-off-by: Artur Skawina <art_k@o2.pl> Signed-off-by: Adrian Bunk <bunk@stusta.de>
'This is based on the proposed patches flying around but also checks that
the device in question is new enough to have word 93 rather thanb blindly
assuming word 93 == 0 means SATA (see ATA-5, ATA-7)' -- Alan Cox
Required for my SATA drive on an Asus Pundit-R to operate above 33MBps.
Signed-off-by: Michael-Luke Jones <mlj28@cam.ac.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Diego Calleja [Mon, 20 Nov 2006 21:25:17 +0000 (22:25 +0100)]
Fix BeFS slab corruption
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
David Miller [Sun, 19 Nov 2006 23:21:04 +0000 (00:21 +0100)]
[RTNETLINK]: Fix IFLA_ADDRESS handling.
The ->set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.
So whip up a sockaddr to wrap around the netlink
attribute for the ->set_mac_address call.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Dmitry Mishin [Fri, 17 Nov 2006 16:53:07 +0000 (17:53 +0100)]
Fix timer race in dst GC code
Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.
CPU1 CPU2
dst_run_gc() entered dst_run_gc() entered
spin_lock(&dst_lock) .....
del_timer(&dst_gc_timer) fail to get lock
.... mod_timer() <--- puts
timer back
to the list
add_timer(&dst_gc_timer) <--- BUG because timer is in list already.
Found during OpenVZ internal testing.
At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.
F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.
Badari Pulavarty [Fri, 17 Nov 2006 16:47:22 +0000 (17:47 +0100)]
ext3 -nobh option causes oops
For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Paul Fulghum [Thu, 16 Nov 2006 23:13:41 +0000 (00:13 +0100)]
synclink_gt fix receive tty error handling
Fix receive tty error handling in synclink_gt driver.
Adrian reported compiler warning for incorrect bit test
against char variable. I determined these and other
device specific error bits were incorrectly defined.
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Daniel Ritz [Wed, 15 Nov 2006 16:07:33 +0000 (17:07 +0100)]
fix via586 irq routing for pirq 5
fix interrput routing for via 586 bridges. pirq can be 5 which needs to be
mapped to INTD. but currently the access functions can handle only pirq 1-4.
this is similar to the other via chipsets where pirq 4 and 5 are both mapped
to INTD. fixes bugzilla #7490
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Bob Moore [Wed, 15 Nov 2006 15:20:37 +0000 (16:20 +0100)]
Reduce ACPI verbosity on null handle condition
As detailed at http://bugs.gentoo.org/131534 :
2.6.16 converted many ACPI debug messages into error or warning
messages. One extraneous message was incorrectly converted, resulting in
logs being flooded by "Handle is NULL and Pathname is relative" messages
on some systems.
This patch (part of a larger ACPICA commit) converts the message back to
debug level.
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix longstanding load balancing bug in the scheduler
The scheduler will stop load balancing if the most busy processor contains
processes pinned via processor affinity.
The scheduler currently only does one search for busiest cpu. If it cannot
pull any tasks away from the busiest cpu because they were pinned then the
scheduler goes into a corner and sulks leaving the idle processors idle.
F.e. If you have processor 0 busy running four tasks pinned via taskset,
there are none on processor 1 and one just started two processes on
processor 2 then the scheduler will not move one of the two processes away
from processor 2.
This patch fixes that issue by forcing the scheduler to come out of its
corner and retrying the load balancing by considering other processors for
load balancing.
This patch was originally developed by John Hawkes and discussed at
I have removed extraneous material and gone back to equipping struct rq
with the cpu the queue is associated with since this makes the patch much
easier and it is likely that others in the future will have the same
difficulty of figuring out which processor owns which runqueue.
The overhead added through these patches is a single word on the stack if
the kernel is configured to support 32 cpus or less (32 bit). For 32 bit
environments the maximum number of cpus that can be configued is 255 which
would result in the use of 32 bytes additional on the stack. On IA64 up to
1k cpus can be configured which will result in the use of 128 additional
bytes on the stack. The maximum additional cache footprint is one
cacheline. Typically memory use will be much less than a cacheline and the
additional cpumask will be placed on the stack in a cacheline that already
contains other local variable.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Paul Mackerras [Fri, 10 Nov 2006 23:28:30 +0000 (00:28 +0100)]
nvidia fbdev: fix powerpc xmon scribbles
xmon writes garbage on the screen because the nvidia console driver has
changed the line pitch from what the firmware set it to. Fix it by making
the nvidia driver inform the btext engine (which xmon uses if the screen is
its output device) about changes to display resolution.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Paul Mackerras [Fri, 10 Nov 2006 23:17:57 +0000 (00:17 +0100)]
[POWERPC] Fix return value from memcpy
As pointed out by Herbert Xu <herbert@gondor.apana.org.au>, our
memcpy implementation didn't return the destination pointer as its
return value, and there is code in the kernel that expects that.
This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Herbert Xu [Fri, 10 Nov 2006 23:15:10 +0000 (00:15 +0100)]
[NET]: Update frag_list in pskb_trim
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming. This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.
Examples include esp_output and ip_fragment.
Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.
It is possible to get away with this if we audit everything to make sure
that they always consult skb->len before going down onto frag_list. In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb. For now though, let's do the conservative fix
and update frag_list.
Many thanks to Marco Berizzi for helping me to track down this bug.
This 4-year old bug took 3 months to track down. Marco was very patient
indeed :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jean Delvare [Fri, 10 Nov 2006 23:13:32 +0000 (00:13 +0100)]
scx200_acb: Fix the block transactions
The scx200_acb i2c bus driver pretends to support SMBus block
transactions, but in fact it implements the more simple I2C block
transactions. Additionally, it lacks sanity checks on the length
of the block transactions, which could lead to a buffer overrun.
This fixes an oops reported by Alexander Atanasov:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114970382125094
Thanks to Ben Gardner for fixing my bugs :)
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jeff Mahoney [Thu, 9 Nov 2006 10:31:23 +0000 (11:31 +0100)]
[DISKLABEL] SUN: Fix signed int usage for sector count
The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:
This patch switches the sector count to an unsigned int to fix this.
Eric Sandeen also submitted the same patch.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Larry Woodman [Thu, 9 Nov 2006 10:05:38 +0000 (11:05 +0100)]
[NET]: __alloc_pages() failures reported due to fragmentation
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available. I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator. Shouldnt the gfp_mask be passed to alloc_skb() ?
Signed-off-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Herbert Xu [Thu, 9 Nov 2006 10:03:56 +0000 (11:03 +0100)]
[NET]: Set truesize in pskb_copy
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
John Heffner [Thu, 9 Nov 2006 10:01:54 +0000 (11:01 +0100)]
[TCP]: Don't use highmem in tcp hash size calculation.
This patch removes consideration of high memory when determining TCP
hash table sizes. Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Kirill Korotaev [Wed, 8 Nov 2006 07:12:01 +0000 (08:12 +0100)]
[IPV4]: Limit rt cache size properly.
During OpenVZ stress testing we found that UDP traffic with random src
can generate too much excessive rt hash growing leading finally to OOM
and kernel panics.
It was found that for 4GB i686 system (having 1048576 total pages and
225280 normal zone pages) kernel allocates the following route hash:
syslog: IP route cache hash table entries: 262144 (order: 8, 1048576
bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone
Attached the patch which removes HASH_HIGHMEM flag from
alloc_large_system_hash() call.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Marcel Holtmann [Wed, 8 Nov 2006 07:10:30 +0000 (08:10 +0100)]
Don't allow chmod() on the /proc/<pid>/ files
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.
The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..
This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Herbert Xu [Wed, 8 Nov 2006 06:47:29 +0000 (07:47 +0100)]
[NET]: Add missing UFO initialisations
This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).
Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.
What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.
The fix is simple enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
from mm/memory.c:
1434 static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
1435 {
1436 /*
1437 * If the source page was a PFN mapping, we don't have
1438 * a "struct page" for it. We do a best-effort copy by
1439 * just copying from the original user address. If that
1440 * fails, we just zero-fill it. Live with it.
1441 */
1442 if (unlikely(!src)) {
1443 void *kaddr = kmap_atomic(dst, KM_USER0);
1444 void __user *uaddr = (void __user *)(va & PAGE_MASK);
1445
1446 /*
1447 * This really shouldn't fail, because the page is there
1448 * in the page tables. But it might just be unreadable,
1449 * in which case we just give up and fill the result with
1450 * zeroes.
1451 */
1452 if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
1453 memset(kaddr, 0, PAGE_SIZE);
1454 kunmap_atomic(kaddr, KM_USER0);
#### D-cache have to be flushed here.
#### It seems it is just forgotten.
1455 return;
1456
1457 }
1458 copy_user_highpage(dst, src, va);
#### Ok here. flush_dcache_page() called from this func if arch need it
1459 }
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Thomas Graf [Tue, 7 Nov 2006 14:30:21 +0000 (15:30 +0100)]
PKT_SCHED: Fix error handling while dumping actions
"return -err" and blindly inheriting the error code in the netlink
failure exception handler causes errors codes to be returned as
positive value therefore making them being ignored by the caller.
May lead to sending out incomplete netlink messages.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Dave Jones [Tue, 7 Nov 2006 14:14:04 +0000 (15:14 +0100)]
[CPUFREQ] Make powernow-k7 work on SMP kernels.
Even though powernow-k7 doesn't work in SMP environments,
it can work on an SMP configured kernel if there's only
one CPU present, however recalibrate_cpu_khz was returning
-EINVAL on such kernels, so we failed to init the cpufreq driver.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on extension header
matches.
When extension headers occur in the non-first fragment after the fragment
header (possibly with an incorrect nexthdr value in the fragment header)
a rule looking for this extension header will never match.
Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
Since all extension headers are before the protocol header this makes sure
an extension header is either not present or in the first fragment, where
we can properly parse it.
With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on protocol matches.
When the protocol header doesn't follow the fragment header immediately,
the fragment header contains the protocol number of the next extension
header. When the extension header and the protocol header are sent in
a second fragment a rule like "ip6tables .. -p udp -j DROP" will never
match.
Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Neil Brown [Sun, 5 Nov 2006 08:03:18 +0000 (09:03 +0100)]
knfsd: Fix race that can disable NFS server.
This is a long standing bug that seems to have only recently become
apparent, presumably due to increasing use of NFS over TCP - many
distros seem to be making it the default.
The SK_CONN bit gets set when a listening socket may be ready
for an accept, just as SK_DATA is set when data may be available.
It is entirely possible for svc_tcp_accept to be called with neither
of these set. It doesn't happen often but there is a small race in
svc_sock_enqueue as SK_CONN and SK_DATA are tested outside the
spin_lock. They could be cleared immediately after the test and
before the lock is gained.
This normally shouldn't be a problem. The sockets are non-blocking so
trying to read() or accept() when ther is nothing to do is not a problem.
However: svc_tcp_recvfrom makes the decision "Should I accept() or
should I read()" based on whether SK_CONN is set or not. This usually
works but is not safe. The decision should be based on whether it is
a TCP_LISTEN socket or a TCP_CONNECTED socket.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Thomas Gleixner [Sun, 5 Nov 2006 08:02:46 +0000 (09:02 +0100)]
posix-cpu-timers: prevent signal delivery starvation
The integer divisions in the timer accounting code can round the result
down to 0. Adding 0 is without effect and the signal delivery stops.
Clamp the division result to minimum 1 to avoid this.
Problem was reported by Seongbae Park <spark@google.com>, who provided
also an inital patch.
Roland sayeth:
I have had some more time to think about the problem, and to reproduce it
using Toyo's test case. For the record, if my understanding of the problem
is correct, this happens only in one very particular case. First, the
expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
it's < nthreads so the division yields zero. Second, it only affects each
thread that is so new that its CPU time accumulation is zero so now+0 is
still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
clocks when cputime_t is tick granularity (or the SCHED clock on
configurations where sched_clock's value only advances on clock ticks), this
is not hard to arrange with new threads starting up and blocking before they
accumulate a whole tick of CPU time. That's what happens in Toyo's test
case.
Note that in general it is fine for that division to round down to zero,
and set each thread's expiry time to its "now" time. The problem only
arises with thread's whose "now" value is still zero, so that now+0 winds up
0 and is interpreted as "not set" instead of ">= now". So it would be a
sufficient and more precise fix to just use max(ticks, 1) inside the loop
when setting each it_*_expires value.
But, it does no harm to round the division up to one and always advance
every thread's expiry time. If the thread didn't already fire timers for
the expiry time of "now", there is no expectation that it will do so before
the next tick anyway. So I followed Thomas's patch in lifting the max out
of the loops.
This patch also covers the reload cases, which are harder to write a test
for (and I didn't try). I've tested it with Toyo's case and it fixes that.
[toyoa@mvista.com: fix: min_t -> max_t] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
James Morris [Sun, 5 Nov 2006 08:00:45 +0000 (09:00 +0100)]
[IPV6]: fix lockup via /proc/net/ip6_flowlabel (CVE-2006-5619)
There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where,
after finding a flowlabel, the code will loop forever not finding any
further flowlabels, first traversing the rest of the hash bucket then just
looping.
This patch fixes the problem by breaking after the hash bucket has been
traversed.
Note that this bug can cause lockups and oopses, and is trivially invoked
by an unpriveleged user.
Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
[S390] fix user readable uninitialised kernel memory, take 2.
The previous patch to correct the copy_from_user padding is quite
broken. The execute instruction needs to be done via the register %r4,
not via %r2 and 31 bit doesn't know the instructions lgr and ahji.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
[S390] fix user readable uninitialised kernel memory (CVE-2006-5174)
A user space program can read uninitialised kernel memory
by appending to a file from a bad address and then reading
the result back. The cause is the copy_from_user function
that does not clear the remaining bytes of the kernel
buffer after it got a fault on the user space address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>