The missing break here means that we always return early and the
function is a no-op.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 9fdca9df (spi: omap2-mcspi: convert to module_platform_driver)
broke the SPI display/panel driver probe on RX-51/N900. The exact cause is
not fully understood, but it seems to be related to the probe order. SPI
communication to the panel driver (spi1.2) fails unless the touchscreen
(spi1.0) has been probed/initialized before. When the omap2-mcspi driver
was converted to a platform driver, it resulted in that the devices are
probed immediately after the board registers them in the order they are
listed in the board file.
Fix the issue by moving the touchscreen before the panel in the SPI
device list.
The invalid guest state emulation loop does not check halt_request
which causes 100% cpu loop while guest is in halt and in invalid
state, but more serious issue is that this leaves halt_request set, so
random instruction emulated by vm86 #GP exit can be interpreted
as halt which causes guest hang. Fix both problems by handling
halt_request in emulation loop.
Reported-by: Tomas Papan <tomas.papan@gmail.com> Tested-by: Tomas Papan <tomas.papan@gmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jerry Hoemann [Tue, 30 Apr 2013 21:15:55 +0000 (15:15 -0600)]
x86/mm: account for PGDIR_SIZE alignment
Patch for -stable. Function find_early_table_space removed upstream.
Fixes panic in alloc_low_page due to pgt_buf overflow during
init_memory_mapping.
find_early_table_space sizes pgt_buf based upon the size of the
memory being mapped, but it does not take into account the alignment
of the memory. When the region being mapped spans a 512GB (PGDIR_SIZE)
alignment, a panic from alloc_low_pages occurs.
kernel_physical_mapping_init takes into account PGDIR_SIZE alignment.
This causes an extra call to alloc_low_page to be made. This extra call
isn't accounted for by find_early_table_space and causes a kernel panic.
Change is to take into account PGDIR_SIZE alignment in find_early_table_space.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
audit_trim_trees() calls get_tree(). If a failure occurs we must call
put_tree().
[akpm@linux-foundation.org: run put_tree() before mutex_lock() for small scalability improvement] Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ftrace_dump() had a lot of issues. What ftrace_dump() does, is when
ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it
will dump out the ftrace buffers to the console when either a oops,
panic, or a sysrq-z occurs.
This was written a long time ago when ftrace was fragile to recursion.
But it wasn't written well even for that.
There's a possible deadlock that can occur if a ftrace_dump() is happening
and an NMI triggers another dump. This is because it grabs a lock
before checking if the dump ran.
It also totally disables ftrace, and tracing for no good reasons.
As the ring_buffer now checks if it is read via a oops or NMI, where
there's a chance that the buffer gets corrupted, it will disable
itself. No need to have ftrace_dump() do the same.
ftrace_dump() is now cleaned up where it uses an atomic counter to
make sure only one dump happens at a time. A simple atomic_inc_return()
is enough that is needed for both other CPUs and NMIs. No need for
a spinlock, as if one CPU is running the dump, no other CPU needs
to do it too.
The tracing_on variable is turned off and not turned on. The original
code did this, but it wasn't pretty. By just disabling this variable
we get the result of not seeing traces that happen between crashes.
For sysrq-z, it doesn't get turned on, but the user can always write
a '1' to the tracing_on file. If they are using sysrq-z, then they should
know about tracing_on.
The new code is much easier to read and less error prone. No more
deadlock possibility when an NMI triggers here.
Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As discussed in this thread
http://lists.freedesktop.org/archives/dri-devel/2013-April/037411.html
GMBUS based DVO transmitter detection seems to be unreliable which could
result in an unusable DVO port.
The attached patch fixes this by falling back to bit banging mode for
the time DVO transmitter detection is in progress.
Signed-off-by: David Müller <d.mueller@elsoft.ch> Tested-by: David Müller <d.mueller@elsoft.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "Mobile Sandy Bridge CPUs" in the Fujitsu Esprimo Q900
mini desktop PCs are probably misleading the LVDS detection
code in intel_lvds_supported. Nothing is connected to the
LVDS ports in these systems.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When checking if an autofs mount point is busy it isn't sufficient to
only check if it's a mount point.
For example, if the mount of an offset mountpoint in a tree is denied
for this host by its export and the dentry becomes a process working
directory the check incorrectly returns the mount as not in use at
expire.
This can happen since the default when mounting within a tree is
nostrict, which means ingnore mount fails on mounts within the tree and
continue. The nostrict option is meant to allow mounting in this case.
Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.
All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes. This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same. (value of 'level' in sched_init_numa() will remain 0)
Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.
Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added
a helper function wrapper around io_remap_pfn_range(), and every other
architecture defined it in <asm/pgtable.h>.
The s390 choice of <asm/io.h> may make sense, but is not very convenient
for this case, and gratuitous differences like that cause unexpected errors like this:
mm/memory.c: In function 'vm_iomap_memory':
mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration]
Glory be the kbuild test robot who noticed this, bisected it, and
reported it to the guilty parties (ie me).
The adp5520 unfortunately also clears the BL_EN bit when the nSTNDBY bit is
cleared. So we need to make sure to restore it during resume if it was set
before suspend.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PARTITION_SUPPORT needs to be set before doing the compare on version
number so the bit width test does not get invalid data. Before this
patch, a Sandisk iNAND eMMC card would detect 1-bit width although
the hardware supports 4-bit.
Only affects old emmc devices - pre 4.4 devices.
Reported-by: Elad Yi <elad.yi@gmail.com> Signed-off-by: Philip Rakity <prakity@yahoo.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
change made by commit a0b30c1229: ext4: use module parameters instead
of debugfs for mballoc_debug
Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB. By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.
In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz
part of the scan disables a 5.2 GHz channel due to, e.g. receiving
country or frequency information, that 5.2 GHz channel might already
be in the list of channels to scan next. Then, when the driver checks
if it should do a passive scan, that will return false and attempt an
active scan. This is not only wrong but can also lead to the iwlwifi
device firmware crashing since it checks regulatory as well.
Fix this by not setting the channel flags to just disabled but rather
OR'ing in the disabled flag. That way, even if the race happens, the
channel will be scanned passively which is still (mostly) correct.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance
After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.
Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.
Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333
There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.
We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().
The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e76897 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.
Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: 700333@bugs.debian.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 3rd parameter of flex_array_prealloc() is the number of elements,
not the index of the last element.
The effect of the bug is, when opening cgroup.procs, a flex array will
be allocated and all elements of the array is allocated with
GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
allocate memory for it, it'll trigger a BUG_ON().
There's a bug where rtc alarms are ignored after the rtc cmos suspends
but before the system finishes suspend. Since hpet emulation is
disabled and it still handles the interrupts, a wake event is never
registered which is done from the rtc layer.
This patch reverts commit d1b2efa83fbf ("rtc: disable hpet emulation on
suspend") which disabled hpet emulation. To fix the problem mentioned
in that commit, hpet_rtc_timer_init() is called directly on resume.
Signed-off-by: Derek Basehore <dbasehore@chromium.org> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we change the system time to a low value like this, the value of
timekeeper->offs_real will be a negative value.
It seems that the WARN occurs because an hrtimer has been started in the time
between the releasing of the timekeeper lock and the IPI call (via a call to
on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result
is that a REALTIME_CLOCK timer has been added with softexpires = expires =
KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at
kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the
clock base's offset (which was set to timekeeper->offs_real in
do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
was KTIME_MAX):
KTIME_MAX - (a negative value) = overflow
A simple check for an overflow can resolve this problem. Using KTIME_MAX
instead of the overflow value will result in the hrtimer function being run,
and the reprogramming of the timer after that.
Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One can trigger an overflow when using ktime_add_ns() on a 32bit
architecture not supporting CONFIG_KTIME_SCALAR.
When passing a very high value for u64 nsec, e.g. 7881299347898368000
the do_div() function converts this value to seconds (7881299347) which
is still to high to pass to the ktime_set() function as long. The result
in is a negative value.
The problem on my system occurs in the tick-sched.c,
tick_nohz_stop_sched_tick() when time_delta is set to
timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
valid, thus ktime_add_ns() is called with a too large value resulting in
a negative expire value. This leads to an endless loop in the ticker code:
time_delta: 7881299347898368000
expires = ktime_add_ns(last_update, time_delta)
expires: negative value
This fix caps the value to KTIME_MAX.
This error doesn't occurs on 64bit or architectures supporting
CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
Signed-off-by: David Engraf <david.engraf@sysgo.com>
[jstultz: Minor tweaks to commit message & header] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 88a8516a2128 (ALSA: usbaudio: implement USB autosuspend)
introduced autopm for all USB audio/MIDI devices. However, many MIDI
devices, such as synthesizers, do not merely transmit MIDI messages but
use their MIDI inputs to control other functions. With autopm, these
devices would get powered down as soon as the last MIDI port device is
closed on the host.
Even some plain MIDI interfaces could get broken: they automatically
send Active Sensing messages while powered up, but as soon as these
messages cease, the receiving device would interpret this as an
accidental disconnection.
Commit f5f165418cab (ALSA: usb-audio: Fix missing autopm for MIDI input)
introduced another regression: some devices (e.g. the Roland GAIA SH-01)
are self-powered but do a reset whenever the USB interface's power state
changes.
To work around all this, just disable autopm for all USB MIDI devices.
There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.
The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release. Hence fix
with correct release function - single_release().
The list of output registers is
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The attached patch fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.
This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The list of output registers is
: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
" mov %0=r0 \n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:
static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
u32 uval, u32 newval)
{
int ret;
pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();
return ret;
}
Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
The lines
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
are the instructions of the assembly block.
The line
2b6: 80 00 00 00 42 00 mov r8=r0
sets the r8 register to 0 and after that
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.
(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.
Patched the Kernel 3.2.23 and compiled it with GCC4.4:
Much better.
There is a
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
which was generated by C code r8 = 0. Below
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
what means that oldval is no longer overwritten.
This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Found problem on system that firmware that could handle pci aer.
Firmware get error reporting after pci injecting error, before os boots.
But after os boots, firmware can not get report anymore, even pci=noaer
is passed.
Root cause: BIOS _OSC has problem with query bit checking.
It turns out that BIOS vendor is copying example code from ACPI Spec.
In ACPI Spec 5.0, page 290:
If (Not(And(CDW1,1))) // Query flag clear?
{ // Disable GPEs for features granted native control.
If (And(CTRL,0x01)) // Hot plug control granted?
{
Store(0,HPCE) // clear the hot plug SCI enable bit
Store(1,HPCS) // clear the hot plug SCI status bit
}
...
}
When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
So it will get into code path that should be for control set only.
BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
Current kernel code is using _OSC query to notify firmware about support
from OS and then use _OSC to set control bits.
During query support, current code is using all possible controls.
So will execute code that should be only for control set stage.
That will have problem when pci=noaer or aer firmware_first is used.
As firmware have that control set for os aer already in query support stage,
but later will not os aer handling.
We should avoid passing all possible controls, just use osc_control_set
instead.
That should workaround BIOS bugs with affected systems on the field
as more bios vendors are copying sample code from ACPI spec.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.
Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com> Tested-by: Fred Hartnett <fred.hartnett@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The TX_FIFO register is 10 bits wide. The lower 8 bits are the data to be
written, while the upper two bits are flags to indicate stop/start.
The driver apparently attempted to optimize write access, by only writing a
byte in those cases where the stop/start bits are zero. However, we have
seen cases where the lower byte is duplicated onto the upper byte by the
hardware, which causes inadvertent stop/starts.
This patch changes the write access to the transmit FIFO to always be 16 bits
wide.
Signed off by: Steven A. Falco <sfalco@harris.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the depth reported in the stack tracer stack_trace file
does not match the stack_max_size file. This is because the stack_max_size
includes the overhead of stack tracer itself while the depth does not.
The first time a max is triggered, a calculation is not performed that
figures out the overhead of the stack tracer and subtracts it from
the stack_max_size variable. The overhead is stored and is subtracted
from the reported stack size for comparing for a new max.
Now the stack_max_size corresponds to the reported depth:
While testing against and older gcc on x86 that uses mcount instead
of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
stack trace show one more function deep which was missing before.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When gcc 4.6 on x86 is used, the function tracer will use the new
option -mfentry which does a call to "fentry" at every function
instead of "mcount". The significance of this is that fentry is
called as the first operation of the function instead of the mcount
usage of being called after the stack.
This causes the stack tracer to show some bogus results for the size
of the last function traced, as well as showing "ftrace_call" instead
of the function. This is due to the stack frame not being set up
by the function that is about to be traced.
The 216 size for ftrace_call includes both the ftrace_call stack
(which includes the saving of registers it does), as well as the
stack size of the parent.
To fix this, if CC_USING_FENTRY is defined, then the stack_tracer
will reserve the first item in stack_dump_trace[] array when
calling save_stack_trace(), and it will fill it in with the parent ip.
Then the code will look for the parent pointer on the stack and
give the real size of the parent's stack pointer:
I'm Cc'ing stable, although it's not urgent, as it only shows bogus
size for item #0, the rest of the trace is legit. It should still be
corrected in previous stable releases.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use the stack of stack_trace_call() instead of check_stack() as
the test pointer for max stack size. It makes it a bit cleaner
and a little more accurate.
Adding stable, as a later fix depends on this patch.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fbcon: don't lose the console font across generic->chip driver switch
uses a pointer in vc->vc_font.data to load font into the new driver.
However if the font is actually freed, we need to clear the data
so that we don't reload font from dangling pointer.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=892340 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We first tried to avoid updating atime/mtime entirely (commit b0de59b5733d: "TTY: do not update atime/mtime on read/write"), and then
limited it to only update it occasionally (commit 37b7f3c76595: "TTY:
fix atime/mtime regression"), but it turns out that this was both
insufficient and overkill.
It was insufficient because we let people attach to the shared ptmx node
to see activity without even reading atime/mtime, and it was overkill
because the "only once a minute" means that you can't really tell an
idle person from an active one with 'w'.
So this tries to fix the problem properly. It marks the shared ptmx
node as un-notifiable, and it lowers the "only once a minute" to a few
seconds instead - still long enough that you can't time individual
keystrokes, but short enough that you can tell whether somebody is
active or not.
The serial core uses device_find_child() but does not drop the reference to
the retrieved child after using it. This patch add the missing put_device().
What I have done to test this issue.
I used a machine with an AMBA PL011 serial driver. I tested the patch on
next-20120408 because the last branch [next-20120415] does not boot on this
board.
For test purpose, I added some pr_info() messages to print the refcount
after device_find_child() (lines: 1937,2009), and after put_device()
(lines: 1947, 2021).
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case a machine supports memory hotplug all active memory increments
present at IPL time have been initialized with a "usecount" of 1.
This is wrong if the memory increment size is larger than the memory
section size of the memory hotplug code. If that is the case the
usecount must be initialized with the number of memory sections that
fit into one memory increment.
Otherwise it is possible to put a memory increment into standby state
even if there are still active sections.
Afterwards addressing exceptions might happen which cause the kernel
to panic.
However even worse, if a memory increment was put into standby state
and afterwards into active state again, it's contents would have been
zeroed, leading to memory corruption.
This was only an issue for machines that support standby memory and
have at least 256GB memory.
This is broken since commit fdb1bb15 "[S390] sclp/memory hotplug: fix
initial usecount of increments".
Many cards based on CY7C68300A/B/C use the USB ID 04b4:6830 but only the
B and C variants (EZ-USB AT2LP) support the ATA Command Block
functionality, according to the data sheets. The A variant (EZ-USB AT2)
locks up if ATACB is attempted, until a typical 30 seconds timeout runs
out and a USB reset is performed.
https://bugs.launchpad.net/bugs/428469
It seems that one way to spot a CY7C68300A (at least where the card
manufacturer left Cypress' EEPROM default vaules, against Cypress'
recommendations) is to look at the USB string descriptor indices.
A http://media.digikey.com/pdf/Data%20Sheets/Cypress%20PDFs/CY7C68300A.pdf
B http://www.farnell.com/datasheets/43456.pdf
C http://www.cypress.com/?rID=14189
Note that a CY7C68300B/C chip appears as CY7C68300A if it is running
in Backward Compatibility Mode, and if ATACB would be supported in this
case there is anyway no way to tell which chip it really is.
For 5 years my external USB drive has been locking up for half a minute
when plugged in and ata_id is run by udev, or anytime hdparm or similar
is run on it.
Finally looking at the /correct/ datasheet I think I found the reason. I
am aware the quirk in this patch is a bit hacky, but the hardware
manufacturers haven't made it easy for us.
When usbfs receives a ctrl-request from userspace it calls check_ctrlrecip,
which for a request with USB_RECIP_ENDPOINT tries to map this to an interface
to see if this interface is claimed, except for ctrl-requests with a type of
USB_TYPE_VENDOR.
When trying to use this device: http://www.akaipro.com/eiepro
redirected to a Windows vm running on qemu on top of Linux.
The windows driver makes a ctrl-req with USB_TYPE_CLASS and
USB_RECIP_ENDPOINT with index 0, and the mapping of the endpoint (0) to
the interface fails since ep 0 is the ctrl endpoint and thus never is
part of an interface.
This patch fixes this ctrl-req failing by skipping the checkintf call for
USB_RECIP_ENDPOINT ctrl-reqs on the ctrl endpoint.
Reported-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl> Tested-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current ST Micro Connect Lite uses the FT4232H hi-speed quad USB
UART FTDI chip. It is also possible to drive STM reference targets
populated with an on-board JTAG debugger based on the FT2232H chip with
the same STMicroelectronics tools.
For this reason, the ST Micro Connect Lite PIDs should be
ST_STMCLT_2232_PID: 0x3746
ST_STMCLT_4232_PID: 0x3747
Signed-off-by: Adrian Thomasset <adrian.thomasset@st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 85fe402 (fs: do not assign default i_ino in new_inode), the
initialisation of i_ino was removed from new_inode() and pushed down
into the callers. However spufs_new_inode() was not updated.
This exhibits as no files appearing in /spu, because all our dirents
have a zero inode, which readdir() seems to dislike.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush. After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.
Unfortunately there's no isync between the copy of this code and the
jump to it. Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.
We've seen this on real machines and it's results in no console output
after:
calling quiesce...
returning from prom_init
The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no-one that really require atomic64_t support on sparc32.
But several drivers fails to build without proper atomic64 support.
And for an allyesconfig build for sparc32 this is annoying.
Include the generic atomic64_t support for sparc32.
This has a text footprint cost:
$size vmlinux (before atomic64_t support)
text data bss dec hex filename 3578860 134260 108781 3821901 3a514d vmlinux
$size vmlinux (after atomic64_t support)
text data bss dec hex filename 3579892 130684 108781 3819357 3a475d vmlinux
data decreases - but I fail to explain why!
I have rebuild twice to check my numbers.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.
Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.
Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.
Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)
When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.
Use same logic in IPv6, as there is no need to hold dst references.
Reported-by: Tom Parkin <tparkin@katalix.com> Tested-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sizeof() when applied to a pointer typed expression gives the size of the
pointer, not that of the pointed data.
Introduced by commit 3ce5ef(netrom: fix info leak via msg_name in nr_recvmsg)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.
Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.
Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.
Fix the issue by initializing the memory used for sockaddr info with
memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix both issues by initializing the memory with memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.
Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.
Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.
Fix this by moving the msg_namelen assignment in front of the shutdown
test.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix both issues by initializing the memory with memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.
Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.
This just undoes that (presumably unintentional) part of the commit.
Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.
1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
(ecr 200 should be ecr 300 in packets 3 & 4)
Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.
Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.
Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
RT_SCOPE_UNIVERSE, IPPROTO_TCP,
inet_sk_flowi_flags(sk),
(opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
ireq->loc_addr, th->source, th->dest);
Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock ->
tcp_v4_syn_recv_sock), so its packets may take the wrong path.
Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.
nf_reset() is used in the following cases:
- when passing packets up the the socket layer, at which point we want to
release all netfilter references that might keep modules pinned while
the packet is queued. nf_trace doesn't matter anymore at this point.
- when encapsulating or decapsulating IPsec packets. We want to continue
tracing these packets after IPsec processing.
- when passing packets through virtual network devices. Only devices on
that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
used anymore. Its not entirely clear whether those packets should
be traced after that, however we've always done that.
- when passing packets through virtual network devices that make the
packet cross network namespace boundaries. This is the only cases
where we clearly want to reset nf_trace and is also what the
original patch intended to fix.
Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.
The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.
The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.
Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.
I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.
Reported-by: Karel Srot <ksrot@redhat.com> Reported-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().
v2: no change
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.
Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.
IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.
This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.
==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link
In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps
This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.
To fix this problem we prevent change of q->now until its synchronization
with real time.
Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.
So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.
Fix this by using generic infrastructure to synchonize on the
completion of the cross call.
This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a
region, we flush TLBs synchronously.
1) Get rid of xcall_flush_tlb_pending and per-cpu type
implementations.
a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
flush if it's clear.
4) Add infrastructure for synchronous TLB page flushes.
a) Implement __flush_tlb_page and per-cpu variants, patch
as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
upon CONFIG_SMP
5) It turns out that singleton batches are very common, 2 out of every
3 batch flushes have only a single entry in them.
The batch flush waiting is very expensive, both because of the poll
on sibling cpu completeion, as well as because passing the tlb batch
pointer to the sibling cpus invokes a shared memory dereference.
Therefore, in flush_tlb_pending(), if there is only one entry in
the batch perform a completely asynchronous global_flush_tlb_page()
instead.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit b0de59b5733d ("TTY: do not update atime/mtime on read/write")
we removed timestamps from tty inodes to fix a security issue and waited
if something breaks. Well, 'w', the utility to find out logged users
and their inactivity time broke. It shows that users are inactive since
the time they logged in.
To revert to the old behaviour while still preventing attackers to
guess the password length, we update the timestamps in one-minute
intervals by this patch.
On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find
out length of a password using timestamps of /dev/ptmx. It is
documented in "Timing Analysis of Keystrokes and Timing Attacks on
SSH". To avoid that problem, do not update time when reading
from/writing to a TTY.
I am afraid of regressions as this is a behavior we have since 0.97
and apps may expect the time to be current, e.g. for monitoring
whether there was a change on the TTY. Now, there is no change. So
this would better have a lot of testing before it goes upstream.
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.
Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Reviewed-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is my example conversion of a few existing mmap users. The mtdchar
case is actually disabled right now (and stays disabled), but I did it
because it showed up on my "git grep", and I was familiar with the code
due to fixing an overflow problem in the code in commit 9c603e53d380
("mtdchar: fix offset overflow detection").
This is my example conversion of a few existing mmap users. The HPET
case is simple, widely available, and easy to test (Clemens Ladisch sent
a trivial test-program for it).
This is my example conversion of a few existing mmap users. The
fb_mmap() case is a good example because it is a bit more complicated
than some: fb_mmap() mmaps one of two different memory areas depending
on the page offset of the mmap (but happily there is never any mixing of
the two, so the helper function still works).