Herbert Xu [Sat, 19 May 2007 04:57:38 +0000 (14:57 +1000)]
[PATCH] CRYPTO: api: Read module pointer before freeing algorithm
The function crypto_mod_put first frees the algorithm and then drops
the reference to its module. Unfortunately we read the module pointer
which after freeing the algorithm and that pointer sits inside the
object that we just freed.
So this patch reads the module pointer out before we free the object.
Thanks to Luca Tettamanti for reporting this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Kleikamp [Wed, 16 May 2007 03:53:36 +0000 (22:53 -0500)]
[PATCH] JFS: Fix race waking up jfsIO kernel thread
It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.
The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Thu, 10 May 2007 14:45:17 +0000 (16:45 +0200)]
[PATCH] driver-core: don't free devt_attr till the device is released
Currently, devt_attr for the "dev" file is freed immediately on device
removal, but if the "dev" sysfs file is open when a device is removed,
sysfs will access its attribute structure for further access including
close resulting in jumping to garbled address. Fix it by postponing
freeing devt_attr to device release time.
Note that devt_attr for class_device is already freed on release.
This bug is reported by Chris Rankin as bugzilla bug#8198.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Chris Rankin <rankincj@yahoo.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jorge Boncompte [Thu, 3 May 2007 01:14:27 +0000 (03:14 +0200)]
[PATCH] {ip, nf}_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Williams [Wed, 2 May 2007 18:43:19 +0000 (11:43 -0700)]
[PATCH] iop13xx: fix i/o address translation
PCI devices were being programmed with an incorrect base address value.
This patch moves I/O space into a 16-bit addressable region and corrects
the i/o offset.
Much thanks to Martin Michlmayr for tracking this issue and testing
debug patches.
Cc: Martin Michlmayr <tbm@cyrius.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Rientjes [Fri, 27 Apr 2007 16:11:10 +0000 (12:11 -0400)]
[PATCH] oom: kill all threads that share mm with killed task
oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.
When creating a new connection by sending an unknown chunk type, we
don't transition to a valid state, causing a NULL pointer dereference in
sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE].
Fix by don't creating new conntrack entry if initial state is invalid.
Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu>
CC: Kiran Kumar Immidi <immidi_kiran@yahoo.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Allow in-place crypto operations. Also remove the coherent user flag
(we use it automagically now), and by default use the user written
key rather then the HW hidden key - this makes crypto just work without
any special considerations, and thats OK, since its our only usage
model.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of that is seen
here:
http://lkml.org/lkml/2007/4/15/41
Neil correctly diagnosed the situation for how this can happen, read
that analysis here:
http://lkml.org/lkml/2007/4/25/57
This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).
The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
Jean Delvare [Wed, 25 Apr 2007 07:51:01 +0000 (09:51 +0200)]
hwmon/w83627ehf: Fix the fan5 clock divider write
Users have been complaining about the w83627ehf driver flooding their
logs with debug messages like:
w83627ehf 9191-0a10: Increasing fan 4 clock divider from 64 to 128
or:
w83627ehf 9191-0290: Increasing fan 4 clock divider from 4 to 8
The reason is that we failed to actually write the LSB of the encoded
clock divider value for that fan, causing the next read to report the
same old value again and again.
Additionally, the fan number was improperly reported, making the bug
harder to find.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 23 Apr 2007 21:41:17 +0000 (14:41 -0700)]
reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock. As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.
This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.
Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Taskstats fix the structure members alignment issue
We broke the the alignment of members of taskstats to the 8 byte boundary
with the CSA patches. In the current kernel, the taskstats structure is
not suitable for use by 32 bit applications in a 64 bit kernel.
On x86_64
Offsets of taskstats' members (64 bit kernel, 64 bit application)
This is one way to solve the problem without re-arranging structure members
is to pack the structure. The patch adds an __attribute__((aligned(8))) to
the taskstats structure members so that 32 bit applications using taskstats
can work with a 64 bit kernel.
Using __attribute__((packed)) would break the 64 bit alignment of members.
The fix was tested on x86_64. After the fix, we got
Offsets of taskstats' members (64 bit kernel, 64 bit application)
NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to. If we replace the page in the radix tree then we may have to
shift the count to another zone.
Suggested-by: Ethan Solomita <solo@google.com> Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix possible NULL pointer access in 8250 serial driver
I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.
Fix the possible NULL pointer access in check_modem_status() in 8250.c. The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called. The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fix OOM killing processes wrongly thought MPOL_BIND
I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
James Bottomley [Fri, 6 Apr 2007 16:14:56 +0000 (11:14 -0500)]
3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling
3w-xxxx emulates a REQUEST_SENSE response by simply returning nothing.
Unfortunately, it's assuming that the REQUEST_SENSE command is
implemented with use_sg == 0, which is no longer the case. The oops
occurs because it's clearing the scatterlist in request_buffer instead
of the memory region.
This is fixed by using tw_transfer_internal() to transfer correctly to
the scatterlist.
Acked-by: adam radford <aradford@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] vt: fix potential race in VT_WAITACTIVE handler
On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0 if
fg_console has already been updated in redraw_screen() but the console
switch itself hasn't been completed. Fix this by checking fg_console in
vt_waitactive() with the console sem held.
Zwane Mwaikambo [Thu, 19 Apr 2007 20:33:13 +0000 (16:33 -0400)]
x86: Don't probe for DDC on VBE1.2
[PATCH] x86: Don't probe for DDC on VBE1.2
VBE1.2 doesn't support function 15h (DDC) resulting in a 'hang' whilst
uncompressing kernel with some video cards. Make sure we check VBE version
before fiddling around with DDC.
http://bugzilla.kernel.org/show_bug.cgi?id=1458
Opened: 2003-10-30 09:12 Last update: 2007-02-13 22:03
Much thanks to Tobias Hain for help in testing and investigating the bug.
Tested on;
i386, Chips & Technologies 65548 VESA VBE 1.2
CONFIG_VIDEO_SELECT=Y
CONFIG_FIRMWARE_EDID=Y
Alan Cox [Tue, 17 Apr 2007 23:59:01 +0000 (23:59 +0000)]
exec.c: fix coredump to pipe problem and obscure "security hole"
exec.c: fix coredump to pipe problem and obscure "security hole"
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)
Also fixes a very very obscure security corner case. If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands. I doubt anyone does
this.
Signed-off-by: Alan Cox <alan@redhat.com> Confirmed-by: Christopher S. Aker <caker@theshore.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.
Olaf Kirch [Wed, 18 Apr 2007 22:14:14 +0000 (15:14 -0700)]
Fix IRDA oops'er
This fixes and OOPS due to incorrect socket orpahning in the
IRDA stack.
[IrDA]: Correctly handling socket error
This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.
In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.
This is against the latest net-2.6 and should be considered for -stable
inclusion.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Netpoll UDP input handler needs to pull up the UDP headers
and handle receive checksum offloading properly just like
the normal UDP input path does else we get corrupted
checksums.
[NET]: Fix UDP checksum issue in net poll mode.
In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.
Signed-off-by: Aubrey.Li <aubreylee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)
Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:40:46 +0000 (14:40 -0700)]
Fix compat sys_ipc() on sparc64
The 32-bit syscall trampoline for sys_ipc() on sparc64
was sign extending various arguments, which is bogus when
using compat_sys_ipc() since that function expects zero
extended copies of all the arguments.
This bug breaks the sparc64 kernel when built with gcc-4.2.x
among other things.
[SPARC64]: Fix arg passing to compat_sys_ipc().
Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.
Based upon an excellent report by Mikael Pettersson.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:37:25 +0000 (14:37 -0700)]
Fix sparc64 SBUS IOMMU allocator
[SPARC64]: Fix SBUS IOMMU allocation code.
There are several IOMMU allocator bugs. Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.
I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code. All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.
This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.
madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise. (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)
holepunch: fix disconnected pages after second truncate
shmem_truncate_range has its own truncate_inode_pages_range, to free any
pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag
is set when this might have happened. But holepunching gets no chance
to clear that flag at the start of vmtruncate_range, so it's always set
(unless a truncate came just before), so holepunch almost always does
this second truncate_inode_pages_range.
shmem holepunch has unlikely swap<->file races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to
skip the second truncate in the punch_hole case, but Miklos points out
that would make holepunch correctness more vulnerable to swapoff. So
keep the second truncate, but follow it by an unmap_mapping_range to
eliminate the disconnected pages (freed from pagecache while still
mapped in userspace) that it might have left behind.
Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching,
which receives no such protection from i_size or next_index, and is left
vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage.
Hold info->lock throughout when holepunching? No, any user could prevent
rescheduling for far too long. Instead take info->lock just when needed:
in shmem_free_swp when removing the swap entries, and whenever removing
a directory page from the level above. But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except
at misaligned start and end of the hole.
holepunch: fix shmem_truncate_range punching too far
Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered
in rare circumstances, because shmem_truncate_range() erroneously
removes partially truncated directory pages at the end of the range:
later reclaim on pages pointing to these removed directories triggers
the BUG. Indeed, and it can also cause data loss beyond the hole.
Fix this as in the patch proposed by Miklos, but distinguish between
"limit" (how far we need to search: ignore truncation's next_index
optimization in the holepunch case - if there are races it's more
consistent to act on the whole range specified) and "upper_limit"
(how far we can free directory pages: generally we must be careful
to keep partially punched pages, but can relax at end of file -
i_size being held stable by i_mutex).
Avi Kivity [Sun, 22 Apr 2007 09:28:49 +0000 (12:28 +0300)]
KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram
PAGE_MASK is an unsigned long, so using it to mask physical addresses on
i386 (which are 64-bit wide) leads to truncation. This can result in
page->private of unrelated memory pages being modified, with disasterous
results.
Fix by not using PAGE_MASK for physical addresses; instead calculate
the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON().
Avi Kivity [Sun, 22 Apr 2007 09:28:05 +0000 (12:28 +0300)]
KVM: MMU: Fix guest writes to nonpae pde
KVM shadow page tables are always in pae mode, regardless of the guest
setting. This means that a guest pde (mapping 4MB of memory) is mapped
to two shadow pdes (mapping 2MB each).
When the guest writes to a pte or pde, we intercept the write and emulate it.
We also remove any shadowed mappings corresponding to the write. Since the
mmu did not account for the doubling in the number of pdes, it removed the
wrong entry, resulting in a mismatch between shadow page tables and guest
page tables, followed shortly by guest memory corruption.
This patch fixes the problem by detecting the special case of writing to
a non-pae pde and adjusting the address and number of shadow pdes zapped
accordingly.
This patch removes bogus zeroing of unused bits in output reports,
introduced in Simon's patch in commit d4ae650a.
According to the specification, any sane device should not care
about values of unused bits.
What is worse, the zeroing is done in a way which is broken and
might clear certain bits in output reports which are actually
_used_ - a device that has multiple fields with one value of
the size 1 bit each might serve as an example of why this is
bogus - the second call of hid_output_report() would clear the
first bit of report, which has already been set up previously.
This patch will break LEDs on SpaceNavigator, because this device
is broken and takes into account the bits which it shouldn't touch.
The quirk for this particular device will be provided in a separate
patch.
IB/mthca: Fix data corruption after FMR unmap on Sinai
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different. This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.
Fix by re-applying adjust_key() after masking the key.
Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[IPV4] nl_fib_lookup: Initialise res.r before fib_res_put(&res)
When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup()
needs to initialize the res.r field before fib_res_put(&res) - unlike
fib_lookup(), a direct call to ->tb_lookup does not set this field.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
Note: We allow RH2 by default because it is harmless.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.
The bug is present in all kernel versions since the feature appeared.
The patch also makes some minimal cleanup:
1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brian Pomerantz [Mon, 2 Apr 2007 06:49:41 +0000 (23:49 -0700)]
fix page leak during core dump
When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.
Signed-off-by: Brian Pomerantz <bapper@mvista.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"When we block_prepare_write() failed while ext3_prepare_write() we jump to
"failure" label and call ext3_prepare_failure() witch search last mapped bh
and invoke commit_write untill it. This is wrong!! because some bh from
begining to the last mapped bh may be not uptodate. As a result we commit to
disk not uptodate page content witch contains garbage from previous usage."
and
"Unexpected file size increasing."
Call trace the same as it was in first issue but result is different.
For example we have file with i_size is zero. we want write two blocks ,
but fs has only one free block.
->ext3_prepare_write(...from == 0, to == 2048)
retry:
->block_prepare_write() == -ENOSPC# we failed but allocated one block here.
->ext3_prepare_failure()
->commit_write( from == 0, to == 1024) # after this i_size becomes 1024 :)
if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
goto retry;
Finally when all retries will be spended ext3_prepare_failure return
-ENOSPC, but i_size was increased and later block trimm procedures can't
help here.
We don't appear to have the horsepower to fix these issues, so let's put
things back the way they were for now.
Based on this, those last two statements fill_result_tf()
appear to me to be in the wrong order, in that the tf->flags
are uninitialized at the point where tf_read() is invoked.
So for lba48 commands, tf_read() won't be reading back the
full lba48 register contents..
Correct?
This patch corrects fill_result_tf() so that the flags
get copied to result_tf before they are used by tf_read().
Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Conke Hu [Tue, 10 Apr 2007 17:06:56 +0000 (13:06 -0400)]
ahci.c: walkaround for SB600 SATA internal error issue
ahci.c: walkaround for SB600 SATA internal error issue
There is a HW issue in ATI SB600 SATA that PxSERR.E should not be
set on some conditions, for example, when there is no media in SATA
CD/DVD drive or media is not ready, AHCI controller fails to execute
ATAPI commands and reports PORT_IRQ_TF_ERR, but ATI SB600 SATA
controller sets PxSERR.E at the
same time, which is not necessary.
This patch is just to ignore the INTERNAL ERROR in such case.
Without this patch, ahci error handler will report many errors as
below:
----------- cut from dmesg -----------
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x40000001)
ata9.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
res 51/24:03:00:00:20/00:00:00:00:00/a0 Emask 0x40 (internal error)
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x40000001)
ata9.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x43 data 12 in
res 51/24:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
-------- end cut ---------
Signed-off-by: Conke Hu <conke.hu@amd.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
softmac: avoid assert in ieee80211softmac_wx_get_rate
[PATCH] softmac: avoid assert in ieee80211softmac_wx_get_rate
Unconfigured bcm43xx device can hit an assert() during wx_get_rate
queries. This is because bcm43xx calls ieee80211softmac_start late
(i.e. during open instead of probe).
Neil Brown [Wed, 4 Apr 2007 19:29:43 +0000 (15:29 -0400)]
knfsd: allow nfsd READDIR to return 64bit cookies
From Neil Brown <neilb@suse.de>
[PATCH] knfsd: allow nfsd READDIR to return 64bit cookies
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.
So filesystems that returned 64bit offsets would lose.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IDE error recovery is using IDLE IMMEDIATE if the drive is busy or has DRQ set.
This violates the ATA spec (can only send IDLEÃ\82 IMMEDIATE when drive is not
busy) and really hoses up some drives (modern drives will not be able to
recover using this error handling). The correct thing to do is issue a SRST
followed by a SET FEATURES command. This is what Western Digital recommends
for error recovery and what Western Digital says Windows does. Ã\82 ItÃ\82 also does
not violate the ATA spec as far as I can tell.
Bart:
* port the patch over the current tree
* undo the recalibration code removal
* send SET FEATURES command after checking for good drive status
* don't check whether the current request is of REQ_TYPE_ATA_{CMD,TASK}
type because we need to send SET FEATURES before handling any requests
* some pre-ATA4 drives require INITIALIZE DEVICE PARAMETERS command before
other commands (except IDENTIFY) so send SET FEATURES only if there are
no pending drive->special requests
* update comments and patch description
* any bugs introduced by this patch are mine and not Suleiman's :-)
David Miller [Tue, 10 Apr 2007 20:39:35 +0000 (13:39 -0700)]
Fix TCP slow_start_after_idle sysctl
[TCP]: slow_start_after_idle should influence cwnd validation too
For the cases that slow_start_after_idle are meant to deal
with, it is almost a certainty that the congestion window
tests will think the connection is application limited and
we'll thus decrease the cwnd there too. This defeats the
whole point of setting slow_start_after_idle to zero.
So test it there too.
We do not cancel out the entire tcp_cwnd_validate() function
so that if the sysctl is changed we still have the validation
state maintained.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed
to expect and use a u16 value in 2.6.11, which broke compatibility on
big endian machines. Change back to use int.
Reported by Ole Reinartz <ole.reinartz@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Tue, 10 Apr 2007 20:37:24 +0000 (13:37 -0700)]
Fix IPSEC replay window handling
[IPSEC]: Reject packets within replay window but outside the bit mask
Up until this point we've accepted replay window settings greater than
32 but our bit mask can only accomodate 32 packets. Thus any packet
with a sequence number within the window but outside the bit mask would
be accepted.
This patch causes those packets to be rejected instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sense buffer code in scsi_send_eh_cmnd was changed to use
alloc_page() and a scatter list, but the sense data copy was not
updated to match so what we actually get in the sense buffer is total
grabage starting with the kernel address of the struct page we got.
Basically the stack frame of scsi_send_eh_cmd() is what ends up
in the sense buffer.
Depending upon how pointers look on a given platform, you can
end up getting sr_ioctl.c errors when you mount a cdrom. If
the CDROM gives a check condition for GPCMD_GET_CONFIGURATION issued
by drivers/cdrom/cdrom.c:cdrom_mmc_profile(), sr_ioctl will
spit out this error message in sr_do_ioctl() with the way pointers
are on sparc64:
[IPv6]: Fix incorrect length check in rawv6_sendmsg()
In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says:
> From: Sridhar Samudrala <sri@us.ibm.com>
> Date: Thu, 29 Mar 2007 14:17:28 -0700
>
> > The check for length in rawv6_sendmsg() is incorrect.
> > As len is an unsigned int, (len < 0) will never be TRUE.
> > I think checking for IPV6_MAXPLEN(65535) is better.
> >
> > Is it possible to send ipv6 jumbo packets using raw
> > sockets? If so, we can remove this check.
>
> I don't see why such a limitation against jumbo would exist,
> does anyone else?
>
> Thanks for catching this Sridhar. A good compiler should simply
> fail to compile "if (x < 0)" when 'x' is an unsigned type, don't
> you think :-)
Dave, we use "int" for returning value,
so we should fix this anyway, IMHO;
we should not allow len > INT_MAX.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Tue, 10 Apr 2007 20:29:44 +0000 (13:29 -0700)]
Fix IFB net driver input device crashes
[IFB]: Fix crash on input device removal
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().
Fix by storing the interface index instead and do a lookup where neccessary.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Tue, 10 Apr 2007 12:47:21 +0000 (14:47 +0200)]
NETFILTER: ipt_CLUSTERIP: fix oops in checkentry function
[NETFILTER]: ipt_CLUSTERIP: fix oops in checkentry function
The clusterip_config_find_get() already increases entries reference
counter, so there is no reason to do it twice in checkentry() callback.
This causes the config to be freed before it is removed from the list,
resulting in a crash when adding the next rule.
Signed-off-by: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Neil Brown [Wed, 11 Apr 2007 03:31:07 +0000 (13:31 +1000)]
Fix calculation for size of filemap_attr array in md/bitmap.
If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
for of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required. We need a round-up in there.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adam Kropelin [Wed, 11 Apr 2007 09:13:13 +0000 (11:13 +0200)]
HID: Do not discard truncated input reports
HID: Do not discard truncated input reports
Truncated reports should not be discarded since it prevents buggy
devices from communicating with userspace.
Prior to the regession introduced in 2.6.20, a shorter-than-expected
report in hid_input_report() was passed thru after having the missing
bytes cleared. This behavior was established over a few patches in the
2.6.early-teens days, including commit cd6104572bca9e4afe0dcdb8ecd65ef90b01297b.
This patch restores the previous behavior and fixes the regression.
The ADEF bits in the TSCR register have different meanings in read
and write mode. For this reason ADEF has to be reset on every
read-modify-write operation.
This patch introduces a special write function for this register, which
takes care of it.
Thanks to Holger Magnussen for pointing my nose at this problem.
Driver needs to turn off carrier when down, otherwise it can
confuse bonding and bridging and looks like carrier is on immediately
when it is brought back up.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Mon, 2 Apr 2007 12:25:31 +0000 (14:25 +0200)]
i386: fix file_read_actor() and pipe_read() for original i386 systems
The __copy_to_user_inatomic() calls in file_read_actor() and pipe_read()
are broken on original i386 machines, where WP-works-ok == false, as
__copy_to_user_inatomic() on such systems calls functions which might
sleep and/or contain cond_resched() calls inside of a kmap_atomic()
region.
The original check for WP-works-ok was in access_ok(), but got moved
during the 2.5 series to fix a race vs. swap.
Return the number of bytes to copy in the case where we are in an atomic
region, so the non atomic code pathes in file_read_actor() and
pipe_read() are taken.
This could be optimized to avoid the kmap_atomic by moving the check for
WP-works-ok into fault_in_pages_writeable(), but this is more intrusive
and can be done later.
Jan Beulich [Sun, 1 Apr 2007 18:41:26 +0000 (20:41 +0200)]
kbuild: fix dependency generation
Commit 2e3646e51b2d6415549b310655df63e7e0d7a080 changed the way
the split config tree is built, but failed to also adjust fixdep
accordingly - if changing a config option from or to m, files
referencing the respective CONFIG_..._MODULE (but not the
corresponding CONFIG_...) didn't get rebuilt.
The problem is that trisate symbol are represent with three
different symbols:
SYMBOL=n => no symbol defined
SYMBOL=y => CONFIG_SYMBOL defined to '1'
SYMBOL=m => CONFIG_SYMBOL_MODULE defined to '1'
But conf_split_config do not distingush between the =y and =m case,
so only the =y case is honoured.
This is fixed in fixdep so when a CONFIG symbol with
_MODULE is found we skip that part and only look
for the CONFIG_SYMBOL version.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There was a typo in commit b40b478e9972ec14cf144f1a03f88918789cbfe0,
preventing it from working - 32bit binaries crashed hopelessly before
the below fix and work perfectly now.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: update changelog to reflect -stable commit id] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Jean Delvare [Thu, 5 Apr 2007 06:52:46 +0000 (23:52 -0700)]
APPLETALK: Fix a remotely triggerable crash
When we receive an AppleTalk frame shorter than what its header says,
we still attempt to verify its checksum, and trip on the BUG_ON() at
the end of function atalk_sum_skb() because of the length mismatch.
This has security implications because this can be triggered by simply
sending a specially crafted ethernet frame to a target victim,
effectively crashing that host. Thus this qualifies, I think, as a
remote DoS. Here is the frame I used to trigger the crash, in npg
format:
<Appletalk Killer>
{
# Ethernet header -----
XX XX XX XX XX XX # Destination MAC
00 00 00 00 00 00 # Source MAC
00 1D # Length
The destination MAC address must be set to those of the victim.
The severity is mitigated by two requirements:
* The target host must have the appletalk kernel module loaded. I
suspect this isn't so frequent.
* AppleTalk frames are non-IP, thus I guess they can only travel on
local networks. I am no network expert though, maybe it is possible
to somehow encapsulate AppleTalk packets over IP.
The bug has been reported back in June 2004:
http://bugzilla.kernel.org/show_bug.cgi?id=2979
But it wasn't investigated, and was closed in July 2006 as both
reporters had vanished meanwhile.
This code was new in kernel 2.6.0-test5:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=7ab442d7e0a76402c12553ee256f756097cae2d2
And not modified since then, so we can assume that vanilla kernels
2.6.0-test5 and later, and distribution kernels based thereon, are
affected.
Note that I still do not know for sure what triggered the bug in the
real-world cases. The frame could have been corrupted by the kernel if
we have a bug hiding somewhere. But more likely, we are receiving the
faulty frame from the network.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Daniel Drake [Tue, 27 Mar 2007 05:32:15 +0000 (21:32 -0800)]
generic_serial: fix decoding of baud rate
Commit d720bc4b8fc5d6d179ef094908d4fbb5e436ffad partially removed a private
implementation of baud speed decoding. However it doesn't seem to be
complete: after the speed is decoded, it is still being used as an index to
a local speed table (array overrun, no doubt).
This was found by Graham Murray who noticed it caused a 2.6.19 regression
with the SX driver: https://bugs.gentoo.org/170554
Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Garzik [Wed, 28 Mar 2007 22:39:22 +0000 (18:39 -0400)]
libata: sata_mv: Fix 50xx irq mask
[libata] sata_mv: Fix 50xx irq mask
IRQ mask bits assumed a 60xx or newer generation chip, which is very
wrong for the 50xx series. Luckily both generations shared the per-port
interrupt mask bits, leaving only the "misc chip features" bits to be
completely mismatched.
Fix 50xx by ensuring we only program bits that exist.
Mark Lord [Wed, 28 Mar 2007 22:35:21 +0000 (18:35 -0400)]
libata bugfix: HDIO_DRIVE_TASK
libata bugfix: HDIO_DRIVE_TASK
I was trying to use HDIO_DRIVE_TASK for something today,
and discovered that the libata implementation does not copy
over the upper four LBA bits from args[6].
This is serious, as any tools using this ioctl would have their
commands applied to the wrong sectors on the drive, possibly resulting
in disk corruption.
Ideally, newer apps should use SG_IO/ATA_16 directly,
avoiding this bug. But with libata poised to displace drivers/ide,
better compatibility here is a must.
This patch fixes libata to use the upper four LBA bits passed
in from the ioctl.
The original drivers/ide implementation copies over all bits
except for the master/slave select bit. With this patch,
libata will copy only the four high-order LBA bits,
just in case there are assumptions elsewhere in libata (?).
Signed-off-by: Mark Lord <mlord@pobox.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>