When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.
This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch. As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails. That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added. This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.
Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the BIOS reports an invalid throttling state (which seems to be
fairly common after system boot), a reset is done to state T0.
Because of a check in acpi_processor_get_throttling_ptc(), the reset
never actually gets executed, which results in the error reoccurring
on every access of for example /proc/acpi/processor/CPU0/throttling.
Add a 'force' option to acpi_processor_set_throttling() to ensure
the reset really takes effect.
This patch, together with the next one, fixes a regression introduced in
2.6.30, listed on the regression list. They have been available for 2.5
months now in bugzilla, but have not been picked up, despite various
reminders and without any reason given.
Google shows that numerous people are hitting this issue. The issue is in
itself relatively minor, but the bug in the code is clear.
The patches have been in all my kernels and today testing has shown that
throttling works correctly with the patches applied when the system
overheats (http://bugzilla.kernel.org/show_bug.cgi?id=13918#c14).
Signed-off-by: Frans Pop <elendil@planet.nl> Acked-by: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In a non-sparse extend, we correctly allocate (and zero) the clusters between
the old_i_size and pos, but we don't zero the portions of the cluster we're
writing to outside of pos<->len.
It handles clustersize > pagesize and blocksize < pagesize.
This dma_ops->dma_supported is first called in platform_dma_init() during kernel
boot. Then dma_ops->dma_supported will be called recursively in
iommu_dma_supported.
Kernel can not boot because kernel can not get out of iommu_dma_supported until
it runs out of stack memory.
Ulrich Drepper correctly points out that there is generally padding in
the structure on 64-bit hosts, and that copying the structure from
kernel to user space can leak information from the kernel stack in those
padding bytes.
Avoid the whole issue by just copying the three members one by one
instead, which also means that the function also can avoid the need for
a stack frame. This also happens to match how we copy the new structure
from user space, so it all even makes sense.
[ The obvious solution of adding a memset() generates horrid code, gcc
does really stupid things. ]
Moving the setting and clearing of the mutex's to
_config_request. There was a mutex deadlock when diag reset is called from
inside _config_request, so diag reset was moved to outside the mutexs.
Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix another ocurring when the system resumes. This oops was due to driver
setting the pci drvdata to NULL on the prior hibernation. Becuase it was
set to NULL, upon resmume we assume the pci drvdata is non-zero, and we oops.
To fix the ooops, we don't set pci drvdata to NULL at hibernation time.
Fix oops ocurring at hibernation time. This oops was due to the firmware fault
watchdog timer still running after we freed resources. To fix the issue we need
to terminate the watchdog timer at hibernation time.
Inhibit 0x3117 loginfos - during cable pull, there are too many printks going
to the syslog, this is have impact on how fast the interrupt routine can handle
keeping up with command completions; this was the root cause to the config
pages timeouts.
When a volume is activated, the driver will recieve a pair of ir config change
events to remove the foreign volume, then add the native.
In the process of the removal event, the hidden raid componet is removed from
the parent.When the disks is added back, the adding of the port fails becuase
there is no instance of the device in its parent.
To fix this issue, the driver needs to call mpt2sas_transport_update_links()
prior to calling _scsih_add_device. In addition, we added sanity checks on
volume add and removal to ignore events for foreign volumes.
Kernel panic is seen because driver did not tear down the port which should
be dnoe using mpt2sas_transport_port_remove(). without this fix When expander
is added back we would oops inside sas_port_add_phy.
OCZ Vertex SSD can't do HPA and not in a usual way. It reports HPA,
allows unlocking but then fails all IOs which fall in the unlocked
area. Quirk it so that HPA unlocking is not used for the device.
Do all key clearing except sending sommands to device when rfkill
enabled. When rfkill enabled the interface is brought down and will
be brought back up correctly after rfkill is enabled again.
Same change is not needed for iwl3945 as it ignores return code when
sending key clearing command to device.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13742
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
(Not needed upstream, due to the major rewrite in 2.6.31)
Due to rfkill and iwlwifi mishmash of SW / HW killswitch representation,
we have race conditions which make unable turn wifi radio on, after enable
and disable again killswitch. I can observe this problem on my laptop
with iwl3945 device.
In rfkill core HW switch and SW switch are separate 'states'. Device can
be only in one of 3 states: RFKILL_STATE_SOFT_BLOCKED, RFKILL_STATE_UNBLOCKED,
RFKILL_STATE_HARD_BLOCKED. Whereas in iwlwifi driver we have separate bits
STATUS_RF_KILL_HW and STATUS_RF_KILL_SW for HW and SW switches - radio can be
turned on, only if both bits are cleared.
In this particular race conditions, radio can not be turned on if in driver
STATUS_RF_KILL_SW bit is set, and rfkill core is in state
RFKILL_STATE_HARD_BLOCKED, because rfkill core is unable to call
rfkill->toggle_radio(). This situation can be entered in case:
- killswitch is turned on
- rfkill core 'see' button change first and move to RFKILL_STATE_SOFT_BLOCKED
also call ->toggle_radio() and STATE_RF_KILL_SW in driver is set
- iwl3945 get info about button from hardware to set STATUS_RF_KILL_HW bit and
force rfkill to move to RFKILL_STATE_HARD_BLOCKED
- killsiwtch is turend off
- driver clear STATUS_RF_KILL_HW
- rfkill core is unable to clear STATUS_RF_KILL_SW in driver
Additionally call to rfkill_epo() when STATUS_RF_KILL_HW in driver is set
cause move to the same situation.
In 2.6.31 this problem is fixed due to _total_ rewrite of rfkill subsystem.
This is a quite small fix for 2.6.30.x in iwlwifi driver. We are changing
internal rfkill state to always have below relations true:
So far, KVM copied the emulated_msrs (only MSR_IA32_MISC_ENABLE) to a
wrong address in user space due to broken pointer arithmetic. This
caused subtle corruption up there (missing MSR_IA32_MISC_ENABLE had
probably no practical relevance). Moreover, the size check for the
user-provided kvm_msr_list forgot about emulated MSRs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kvm_notify_acked_irq does not check irq type, so that it sometimes
interprets msi vector as irq. As a result, ack notifiers are not
called, which typially hangs the guest. The fix is to track and
check irq type.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kvm_mmu_change_mmu_pages mishandles the case where n_alloc_mmu_pages is
smaller then n_free_mmu_pages, by not checking if the result of
the subtraction is negative.
Its a valid condition which can happen if a large number of pages has
been recently freed.
MTRR, PAT, MCE, and MCA are all supported (to some extent) but not reported.
Vista requires these features, so if userspace relies on kernel cpuid
reporting, it loses support for Vista.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We ignore writes to the performance counters and performance event
selector registers already. Kaspersky antivirus reads the eventsel
MSR causing it to crash with the current behaviour.
Return 0 as data when the eventsel registers are read to stop the
crash.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A pte that is shadowed when the guest EFER.NXE=1 is not valid when
EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the
shadow EFER always has NX enabled, this won't happen.
Fix by using a different shadow page table for different EFER.NXE bits. This
allows vcpus to run correctly with different values of EFER.NXE, and for
transitions on this bit to be handled correctly without requiring a full
flush.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We currently unblock shadow interrupt state when we skip an instruction,
but failing to do so when we actually emulate one. This blocks interrupts
in key instruction blocks, in particular sti; hlt; sequences
If the instruction emulated is an sti, we have to block shadow interrupts.
The same goes for mov ss. pop ss also needs it, but we don't currently
emulate it.
Without this patch, I cannot boot gpxe option roms at vmx machines.
This is described at https://bugzilla.redhat.com/show_bug.cgi?id=494469
Signed-off-by: Glauber Costa <glommer@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Glauber Costa [Mon, 3 Aug 2009 17:57:52 +0000 (14:57 -0300)]
KVM: Introduce {set/get}_interrupt_shadow()
This patch introduces set/get_interrupt_shadow(), that does exactly
what the name suggests. It also replaces open code that explicitly does
it with the now existent functions. It differs slightly from upstream,
because upstream merged it after gleb's interrupt rework, that we don't
ship.
This patch replaces drop_interrupt_shadow with the more
general set_interrupt_shadow, that can either drop or raise
it, depending on its parameter. It also adds ->get_interrupt_shadow()
for future use.
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the wrong headphone output routing for MacBookPro 3,1/4,1
quirk with ALC889A codec, which caused the silent headphone output.
Also, this gives the individual Headphone and Speaker volume controls.
This patch fixes the bug that was reported in
http://bugzilla.kernel.org/show_bug.cgi?id=14053
If we're in the case where we need to force a reencode and then resend of
the RPC request, due to xprt_transmit failing with a networking error, then
we _must_ retransmit the entire request.
Summary:
Kernel panic arise when stack protection is enabled, since strncat will
add a null terminating byte '\0'; So in functions
like this one (wmi_query_block):
char wc[4]="WC";
....
strncat(method, block->object_id, 2);
...
the length of wc should be n+1 (wc[5]) or stack protection
fault will arise. This is not noticeable when stack protection is
disabled,but , isn't good either.
Config used: [CONFIG_CC_STACKPROTECTOR_ALL=y,
CONFIG_CC_STACKPROTECTOR=y]
do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);
}
return 0;
}
quickly creates an unkillable task.
If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.
Change copy_process() to increment count/live only when we know for sure
we can't fail. In this case the forked thread will take care of its
reference to signal correctly.
If copy_process() fails, check CLONE_THREAD flag. If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group. If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.
We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all. This patch only fixes the bug.
snd_interval_list() expected a sorted list but did not document this, so
there are drivers that give it an unsorted list. To fix this, change
the algorithm to work with any list.
This fixes the "Slave PCM not usable" error with USB devices that have
multiple alternate settings with sample rates in decreasing order, such
as the Philips Askey VC010 WebCam.
http://bugzilla.kernel.org/show_bug.cgi?id=14028
Reported-and-tested-by: Andrzej <adkadk@gmail.com> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The oops did not reveal any more details about the real stack
that we have and the system got into an infinite loop of
recursive pagefaults.
So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
parameter. The box did not crash (timings/conditions probably
changed a tiny bit to trigger the catastrophic crash), but the
/debug/tracing/stack_trace file was rather revealing:
There's a lot of fat functions on that stack trace, but
the largest of all is do_one_initcall(). This is due to
the boot trace entry variables being on the stack.
Fixing this is relatively easy, initcalls are fundamentally
serialized, so we can move the local variables to file scope.
Note that this large stack footprint was present for a
couple of months already - what pushed my system over
the edge was the addition of kmemleak to the call-chain:
2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
user_shm_lock() calls in hugetlb_file_setup() but left the
user_shm_unlock call in shm_destroy().
In detail:
Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
is not called in hugetlb_file_setup(). However, user_shm_unlock() is
called in any case in shm_destroy() and in the following
atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
up->__count gets zero, also cleanup_user_struct() is scheduled.
Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
However, the ref counter up->__count gets unexpectedly non-positive and
the corresponding structs are freed even though there are live
references to them, resulting in a kernel oops after a lots of
shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
time of shm_destroy() may give a different answer from at the time
of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
which has missed user_shm_unlock() ever since it came in 2.6.9.
Reported-by: Stefan Huber <shuber2@gmail.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: Stefan Huber <shuber2@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In commit a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5 ("Fix race in
create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
for a NULL page mapping unintentionally when some of the code inside
__set_page_dirty() was moved to the callers.
That removal generally didn't matter, since a filesystem would serialize
truncation (which clears the page mapping) against writing (which marks
the buffer dirty), so locking at a higher level (either per-page or an
inode at a time) should mean that the buffer page would be stable. And
indeed, nothing bad seemed to happen.
Except it turns out that apparently reiserfs does something odd when
under load and writing out the journal, and we have a number of bugzilla
entries that look similar:
We splice skbs from the pending queue for a TID
onto the local pending queue when tearing down a
block ack request. This is not necessary unless we
actually have received a request to start a block ack
request (rate control, for example). If we never received
that request we should not be splicing the tid pending
queue as it would be null, causing a panic.
Not sure yet how exactly we allowed through a call when the
tid state does not have at least HT_ADDBA_REQUESTED_MSK set,
that will require some further review as it is not quite
obvious.
For more information see the bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=13922
This fixes this oops:
BUG: unable to handle kernel NULL pointer dereference at 00000030
IP: [<f8806c70>] ieee80211_agg_splice_packets+0x40/0xc0 [mac80211]
*pdpt = 0000000002d1e001 *pde = 0000000000000000
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
last sysfs file: /sys/module/aes_generic/initstate
Modules linked in: <bleh>
Testedy-by: Jack Lau <jackelectronics@hotmail.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change rt2x00_rf_read() and rt2x00_rf_write() to subtract 1 from the rf
register number. This is needed because the rf registers are enumerated
starting with one. The size of the rf register cache is just enough to
hold all registers, so writing to the highest register was corrupting
memory. Add a check to make sure that the rf register number is valid.
Signed-off-by: Pavel Roskin <proski@gnu.org> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If node_load[] is cleared everytime build_zonelists() is
called,node_load[] will have no help to find the next node that should
appear in the given node's fallback list.
Because of the bug, zonelist's node_order is not calculated as expected.
This bug affects on big machine, which has asynmetric node distance.
This (my bug) is very old but no one has reported this for a long time.
Maybe because the number of asynmetric NUMA is very small and they use
cpuset for customizing node memory allocation fallback.
[akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Signed-off-by: Bo Liu <bo-liu@hotmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As noted in 83d349f35e1ae72268c5104dbf9ab2ae635425d4 ("x86: don't send
an IPI to the empty set of CPU's"), some APIC's will be very unhappy
with an empty destination mask. That commit added a WARN_ON() for that
case, and avoided the resulting problem, but didn't fix the underlying
reason for why those empty mask cases happened.
This fixes that, by checking the result of 'cpumask_andnot()' of the
current CPU actually has any other CPU's left in the set of CPU's to be
sent a TLB flush, and not calling down to the IPI code if the mask is
empty.
The reason this started happening at all is that we started passing just
the CPU mask pointers around in commit 4595f9620 ("x86: change
flush_tlb_others to take a const struct cpumask"), and when we did that,
the cpumask was no longer thread-local.
Before that commit, flush_tlb_mm() used to create it's own copy of
'mm->cpu_vm_mask' and pass that copy down to the low-level flush
routines after having tested that it was not empty. But after changing
it to just pass down the CPU mask pointer, the lower level TLB flush
routines would now get a pointer to that 'mm->cpu_vm_mask', and that
could still change - and become empty - after the test due to other
CPU's having flushed their own TLB's.
The default_send_IPI_mask_logical() function uses the "flat" APIC mode
to send an IPI to a set of CPU's at once, but if that set happens to be
empty, some older local APIC's will apparently be rather unhappy. So
just warn if a caller gives us an empty mask, and ignore it.
This fixes a regression in 2.6.30.x, due to commit 4595f9620 ("x86:
change flush_tlb_others to take a const struct cpumask"), documented
here:
http://bugzilla.kernel.org/show_bug.cgi?id=13933
which causes a silent lock-up. It only seems to happen on PPro, P2, P3
and Athlon XP cores. Most developers sadly (or not so sadly, if you're
a developer..) have more modern CPU's. Also, on x86-64 we don't use the
flat APIC mode, so it would never trigger there even if the APIC didn't
like sending an empty IPI mask.
Reported-by: Pavel Vilim <wylda@volny.cz> Reported-and-tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Reported-and-tested-by: Martin Rogge <marogge@onlinehome.de> Cc: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When 'and'ing two bitmasks (where 'andnot' is a variation on it), some
cases want to know whether the result is the empty set or not. In
particular, the TLB IPI sending code wants to do cpumask operations and
determine if there are any CPU's left in the final set.
So this just makes the bitmask (and cpumask) functions return a boolean
for whether the result has any bits set.
It was first set to 1 in pollwake() (now __pollwake() ), tested and
later set to 0 in poll_schedule_timeout(), but not initialized before.
As a result when the process needs to sleep, triggered was likely to be
non-zero even if pollwake() is not called before the first
poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
called and an extra loop calling all ->poll() would be done.
This patch initialize triggered to 0 in poll_initwait() so the ->poll()
are not called twice before the process goes to sleep when it needs to.
This patch fixes the napi list handling when an ehea interface is shut
down to avoid corruption of the napi list.
Signed-off-by: Hannes Hering <hering2@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I'm using ide on 2.6.30.1 with xfs filesystem. I noticed a kernel memory
leak after writing lots of data, the kmalloc-96 slab cache keeps
growing. It seems the struct ide_cmd kmalloced by idedisk_prepare_flush
is never kfreed.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Simon Kirby <sim@netnation.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.
We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.
Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().
Robert Richter [Wed, 12 Aug 2009 15:59:52 +0000 (17:59 +0200)]
ring-buffer: Fix advance of reader in rb_buffer_peek()
Backport for 2.6.30-stable of:
469535a ring-buffer: Fix advance of reader in rb_buffer_peek()
When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.
Also, I simplified some code in ring_buffer_consume().
kernel_sendpage() does the proper default case handling for when the
socket doesn't have a native sendpage implementation.
Now, arguably this might be something that we could instead solve by
just specifying that all protocols should do it themselves at the
protocol level, but we really only care about the common protocols.
Does anybody really care about sendpage on something like Appletalk? Not
likely.
Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Julien TINNES <julien@cr0.org> Acked-by: Tavis Ormandy <taviso@sdf.lonestar.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.
Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.
Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.
With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
With CONFIG_STACK_PROTECTOR turned on, VMI doesn't boot with
more than one processor. The problem is with the gs value not
being initialized correctly when registering the secondary
processor for VMI's case.
The patch below initializes the gs value for the AP to
__KERNEL_STACK_CANARY. Without this the secondary processor
keeps on taking a GP on every gs access.
access_ok() checks must be done on every part of the userspace structure
that is accessed. If access_ok() on one part of the struct succeeded, it
does not imply it will succeed on other parts of the struct. (Does
depend on the architecture implementation of access_ok()).
This changes the __get_user() users to first check access_ok() on the
data structure.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Cc: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1272) changes the error code returned when an open call
for a USB device node fails to locate the corresponding device. The
appropriate error code is -ENODEV, not -ENOENT.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Attached patch adds USB vendor and product IDs for Bayer's USB to serial
converter cable used by Bayer blood glucose meters. It seems to be a
FT232RL based device and works without any problem with ftdi_sio driver
when this patch is applied. See: http://winglucofacts.com/cables/
Signed-off-by: Marko Hänninen <bugitus@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The FIEMAP_IOC_FIEMAP mapping ioctl was missing a 32-bit compat handler,
which means that 32-bit suerspace on 64-bit kernels cannot use this ioctl
command.
The structure is nicely aligned, padded, and sized, so it is just this
simple.
Tested w/ 32-bit ioctl tester (from Josef) on a 64-bit kernel on ext4.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Lord <lkml@rtr.ca> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.
clone() syscall has special support for TID of created threads. This
support includes two features.
One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.
One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.
The integer location is a user provided pointer, provided at clone()
time.
kernel keeps this pointer value into current->clear_child_tid.
At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.
As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.
Following sequence could happen:
1) bash (or any program) starts a new process, by a fork() call that
glibc maps to a clone( ... CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
...) syscall
2) When new process starts, its current->clear_child_tid is set to a
location that has a meaning only in bash (or initial program) context
(&THREAD_SELF->tid)
3) This new process does the execve() syscall to start a new program.
current->clear_child_tid is left unchanged (a non NULL value)
4) If this new program creates some threads, and initial thread exits,
kernel will attempt to clear the integer pointed by
current->clear_child_tid from mm_release() :
/*
* We don't check the error code - if userspace has
* not set up a proper pointer then tough luck.
*/
<< here >> put_user(0, tidptr);
sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
}
5) OR : if new program is not multi-threaded, but spied by /proc/pid
users (ps command for example), mm_users > 1, and the exiting program
could corrupt 4 bytes in a persistent memory area (shm or memory mapped
file)
If current->clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.
Fix is straightforward and should not break any sane program.
The v1.x metadata does not have a fixed size and can grow
when devices are added.
If it grows enough to require an extra sector of storage,
we need to update the 'sb_size' to match.
Without this, md can write out an incomplete superblock with a
bad checksum, which will be rejected when trying to re-assemble
the array.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Parentheses are required or the comparison occurs before the bitand.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The WAKE_MCAST bit is tested twice, the first should be WAKE_UCAST.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Jie Yang <jie.yang@atheros.com> Cc: Jay Cliburn <jcliburn@gmail.com> Cc: Chris Snook <csnook@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Increase the command ORB data structure to transport up to 16 bytes long
CDBs (instead of 12 bytes), and tell the SCSI mid layer about it. This
is notably necessary for READ CAPACITY(16) and friends, i.e. support of
large disks.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Increase the command ORB data structure to transport up to 16 bytes long
CDBs (instead of 12 bytes), and tell the SCSI mid layer about it. This
is notably necessary for READ CAPACITY(16) and friends, i.e. support of
large disks.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I've tested TSL2550 driver and I've found a bug: when light is off,
returned value from tsl2550_calculate_lux function is -1 when it should
be 0 (sensor correctly read that light was off).
I think the bug is that a zero c0 value (approximated value of ch0) is
misinterpreted as an error.
Signed-off-by: Michele Jr De Candia <michele.decandia@valueteam.com> Acked-by: Rodolfo Giometti <giometti@linux.it> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are some broken devices that report multiple DMA xfer modes
enabled at once (ATA spec doesn't allow it) but otherwise work fine
with DMA so just delete ide_id_dma_bug().
[ As discovered by detective work by Frans and Bart, due to how
handling of the ID block was handled before commit c419993
("ide-iops: only clear DMA words on setting DMA mode") this
check was always seeing zeros in the fields or other similar
garbage. Therefore this check wasn't actually checking anything.
Now that the tests actually check the real bits, all we see are
devices that trigger the check yet work perfectly fine, therefore
killing this useless check is the best thing to do. -DaveM ]
Reported-by: Frans Pop <elendil@planet.nl> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add ide_host_enable_irqs() helper and use it in ide_host_register()
before registering ports. Then remove no longer needed IRQ unmasking
from in init_irq().
This should fix the problem with "screaming" shared IRQ on the first
port (after request_irq() call while we have the unexpected IRQ pending
on the second port) which was uncovered by my rework of the serialized
interfaces support.
Reported-by: Frans Pop <elendil@planet.nl> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When an array is changed from RAID6 to RAID5, fewer drives are
needed. So any device that is made superfluous by the level
conversion must be marked as not-active.
For the RAID6->RAID5 conversion, this will be a drive which only
has 'Q' blocks on it.
Changeset 3869c4aa18835c8c61b44bd0f3ace36e9d3b5bd0
that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc()
to make it complaint with SDM requirements. But, introduced a nasty bug, which
can result in crash and/or strange corruptions when set_memory_wc is used.
One such crash reported here
http://lkml.org/lkml/2009/7/30/94
Actually, that changeset introduced two bugs.
* change_page_attr_set() takes &addr as first argument and can the addr value
might have changed on return, even for single page change_page_attr_set()
call. That will make the second change_page_attr_set() in this routine
operate on unrelated addr, that can eventually cause strange corruptions
and bad page state crash.
* The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should
clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not
be WC (will be UC instead).
The patch below fixes both these problems. Sending a single patch to fix both
the problems, as the change is to the same line of code. The change to have a
addr_copy is not very clean. But, it is simpler than making more changes
through various routines in pageattr.c.
A huge thanks to Jerome for reporting this problem and providing a simple test
case that helped us root cause the problem.
Reported-by: Jerome Glisse <glisse@freedesktop.org> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
11static inline unsigned long native_save_fl(void)
12{
13 unsigned long flags;
14
15 asm volatile("# __raw_save_flags\n\t"
16 "pushf ; pop %0"
17 : "=g" (flags)
18 : /* no input */
19 : "memory");
20
21 return flags;
22}
If gcc chooses to put flags on the stack, for instance because this is
inlined into a larger function with more register pressure, the offset
of the flags variable from the stack pointer will change when the
pushf is performed. gcc doesn't attempt to understand that fact, and
address used for pop will still be the same. It will write to
somewhere near flags on the stack but not actually into it and
overwrite some other value.
I saw this happen in the ide_device_add_all function when running in a
simulator I work on. I'm assuming that some quirk of how the simulated
hardware is set up caused the code path this is on to be executed when
it normally wouldn't.
A simple fix might be to change "=g" to "=r".
Reported-by: Gabe Black <spamforgabe@umich.edu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code was incorrectly reserving memtypes using the page
virtual address instead of the physical address. Furthermore,
the code was not ignoring highmem pages as it ought to.
( upstream does not pass in highmem pages yet - but upcoming
graphics code will do it and there's no reason to not handle
this properly in the CPA APIs.)
Narayanan reports "The regression is around 15%. There is no disk controller
as our setup is based on Samsung OneNAND used as a memory mapped device on a
OMAP2430 based board."
The page allocator tries to preserve contiguous PFN ordering when returning
pages such that repeated callers to the allocator have a strong chance of
getting physically contiguous pages, particularly when external fragmentation
is low. However, of the bulk of the allocations have __GFP_COLD set as they
are due to aio_read() for example, then the PFNs are in reverse PFN order.
This can cause performance degration when used with IO controllers that could
have merged the requests.
This patch attempts to preserve the contiguous ordering of PFNs for users of
__GFP_COLD.
This is because hugetlb_unreserve_pages() is unconditionally removing
blocks_per_huge_page(h) on each call rather than using the freed amount.
If there were 0 blocks, it goes negative, resulting in the above.
usb0 and usb1 mux settings in the sicrl register were swapped (twice!)
in mpc834x_usb_cfg(), leading to various strange issues with fsl-ehci
and full speed devices.
The USB port config on mpc834x is done using 2 muxes: Port 0 is always
used for MPH port 0, and port 1 can either be used for MPH port 1 or DR
(unless DR uses UTMI phy or OTG, then it uses both ports) - See 8349 RM
figure 1-4..
mpc8349_usb_cfg() had this inverted for the DR, and it also had the bit
positions of the usb0 / usb1 mux settings swapped. It would basically
work if you specified port1 instead of port0 for the MPH controller (and
happened to use ULPI phys), which is what all the 834x dts have done,
even though that configuration is physically invalid.
Instead fix mpc8349_usb_cfg() and adjust the dts files to match reality.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes regression (battery "vanishing" on resume) introduced by
commit d0c71fe7ebc180f1b7bc7da1d39a07fc19eec768 ("ACPI Suspend: Enable
ACPI during resume if SCI_EN is not set") and also the issue with
the "screaming" IRQ 9.
Reported-and-tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These pointers can be NULL, the is_mesh() case isn't
ever hit in the current kernel, but cmp_ies() can be
hit under certain conditions.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
loff_t is a signed type. If userspace passes a negative ppos, the "count"
range check is weakened. "count"s bigger than HPEE_MAX_LENGTH will pass the check.
Also, if ppos is negative, the readb(eisa_eeprom_addr + *ppos) will poke in random
memory.
About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.
The same problem appears with ftrace_dump()
Example of an output before this patch:
1) | ktime_get_real() {
1) 2.846 us | read_hpet();
1) 4.558 us | }
1) 6.195 us | }
After this patch:
0) | ktime_get_real() {
0) | getnstimeofday() {
0) 1.960 us | read_hpet();
0) 3.597 us | }
0) 5.196 us | }
The fix also applies on 2.6.30
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When print_graph_entry() computes a function call entry event, it needs
to also check the next entry to guess if it matches the return event of
the current function entry.
In order to look at this next event, it needs to consume the current
entry before going ahead in the ring buffer.
However, if the current event that gets consumed is the last one in the
ring buffer head page, the ring_buffer may reuse the page for writers.
The consumed entry will then become invalid because of possible
racy overwriting.
Me must then handle this entry by making a copy of it.
The fix also applies on 2.6.30
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrea Gelmini gave me a report that a kernel oops hit on a nilfs
filesystem with a 1KB block size when doing rsync.
This turned out to be caused by an inconsistency of dirty state
between a page and its buffers storing b-tree node blocks.
If the page had multiple buffers split over multiple logs, and if the
logs were written at a time, a dirty flag remained in the page even
every dirty flag in the buffers was cleared.
This will fix the failure by dropping the dirty flag properly for
pages with the discrete multiple b-tree nodes.
Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, the ThinkPad-ACPI bay and dock drivers are completely
broken, and cause a NULL pointer derreference in kernel mode (and,
therefore, an OOPS) when they try to issue events (i.e. on dock,
undock, bay ejection, etc).
OTOH, the standard ACPI dock driver can handle the hotplug bays and
docks of the ThinkPads just fine (including batteries) as of 2.6.27.
In fact, it does a much better job of it than thinkpad-acpi ever did.
It is just not worth the hassle to find a way to fix this crap without
breaking the (deprecated) thinkpad-acpi dock/bay ABI. This is old,
deprecated code that sees little testing or use.
As a quick fix suitable for -stable backports, mark the thinkpad-acpi
bay and dock subdrivers as BROKEN in Kconfig. The dead code will be
removed by a later patch.
This fixes bugzilla #13669, and should be applied to 2.6.27 and later.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Reported-by: Joerg Platte <jplatte@naasa.net> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>