During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mel Gorman [Mon, 7 Oct 2013 10:28:45 +0000 (11:28 +0100)]
mm: numa: Sanitize task_numa_fault() callsites
There are three callers of task_numa_fault():
- do_huge_pmd_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_numa_page():
Accounts against the current node, not the node where the
page resides, unless we migrated, in which case it accounts
against the node we migrated to.
- do_pmd_numa_page():
Accounts not at all when the page isn't migrated, otherwise
accounts against the node we migrated towards.
This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.
So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.
They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mel Gorman [Mon, 7 Oct 2013 10:28:44 +0000 (11:28 +0100)]
mm: Prevent parallel splits during THP migration
THP migrations are serialised by the page lock but on its own that does
not prevent THP splits. If the page is split during THP migration then
the pmd_same checks will prevent page table corruption but the unlock page
and other fix-ups potentially will cause corruption. This patch takes the
anon_vma lock to prevent parallel splits during migration.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mel Gorman [Mon, 7 Oct 2013 10:28:43 +0000 (11:28 +0100)]
mm: Wait for THP migrations to complete during NUMA hinting faults
The locking for migrating THP is unusual. While normal page migration
prevents parallel accesses using a migration PTE, THP migration relies on
a combination of the page_table_lock, the page lock and the existance of
the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.
If a THP page is currently being migrated and another thread traps a
fault on the same page it checks if the page is misplaced. If it is not,
then pmd_numa is cleared. The problem is that it checks if the page is
misplaced without holding the page lock meaning that the racing thread
can be migrating the THP when the second thread clears the NUMA bit
and faults a stale page.
This patch checks if the page is potentially being migrated and stalls
using the lock_page if it is potentially being migrated before checking
if the page is misplaced or not.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 27 Oct 2013 17:45:00 +0000 (10:45 -0700)]
Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"This is a 2-line patch to save the CPU register which holds our task
thread info pointer before calling a firmware function and then to
restore it again afterwards.
This is necessary because on some 64bit machines the high-order 32bits
are being clobbered by the firmware call, and thus we failed to bring
up secondary CPUs (and instead crashed the kernel) in some situations
eg if we had more than 4GB RAM. This patch fixes a bug which has been
since ever in the parisc linux kernel and which prevented some people
to use a 64bit kernel"
* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
Linus Torvalds [Sun, 27 Oct 2013 17:28:35 +0000 (10:28 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"The tree contains three fixes:
- Two tooling fixes
- Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
this merge window. (Patches were proposed to fix it but it was all
a bit late and we felt it's safer to just delay the ABI one more
kernel release and do it right)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable PERF_RECORD_MMAP2 support
perf scripting perl: Fix build error on Fedora 12
perf probe: Fix to initialize fname always before use it
Linus Torvalds [Sun, 27 Oct 2013 17:18:15 +0000 (10:18 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
kernels built with GCC 3.x (there are still such distros)"
Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Avoid gcc version dependent __builtin_constant_p() usage
Pull SCSI target fixes from Nicholas Bellinger:
"Here are the outstanding target pending fixes for v3.12-rc7.
This includes a number of EXTENDED_COPY related fixes as a result of
Thomas and Doug's continuing testing and feedback.
Also included is an important vhost/scsi fix that addresses a long
standing issue where the 'write' parameter for get_user_pages_fast()
was incorrectly set for virtio-scsi WRITEs -> DMA_TO_DEVICE, and not
for virtio-scsi READs -> DMA_FROM_DEVICE.
This resulted in random userspace segfaults and other unpleasantness
on KVM host, and unfortunately has been an issue since the initial
merge of vhost/scsi in v3.6. This patch is CC'ed to stable, along
with two other less critical items"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
target/pscsi: fix return value check
target: Fail XCOPY for non matching source + destination block_size
target: Generate failure for XCOPY I/O with non-zero scsi_status
target: Add missing XCOPY I/O operation sense_buffer
iser-target: check device before dereferencing its variable
target: Return an error for WRITE SAME with ANCHOR==1
target: Fix assignment of LUN in tracepoints
target: Reject EXTENDED_COPY when emulate_3pc is disabled
target: Allow non zero ListID in EXTENDED_COPY parameter list
target: Make target_do_xcopy failures return INVALID_PARAMETER_LIST
Linus Torvalds [Sun, 27 Oct 2013 17:13:03 +0000 (10:13 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
"Here is the late fixes pull request for dmaengine while you fly back
from KS.
We have a new dmaengine ML hosted by vger so a patch for that along
with addition of Dave as driver mainatainer for ioat. Other fixes are
memeory leak fixes on edma driver, small fixes on rcar-hpbdma driver
by Sergei"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: edma: fix another memory leak
dma: edma: Fix memory leak
MAINTAINERS: add to ioatdma maintainer list
MAINTAINERS: add the new dmaengine mailing list
Helge Deller [Sat, 26 Oct 2013 21:19:25 +0000 (23:19 +0200)]
parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
not able to bring up other CPUs than the monarch CPU and instead crashed the
kernel. The reason was unclear, esp. since it involved various machines (e.g.
J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
when less than 4GB were installed, or if a 32bit Linux kernel was booted.
In the end, the fix for those SMP problems is trivial:
During the early phase of the initialization of the CPUs, including the monarch
CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
It's documented that this firmware function may clobber various registers, and
one one of those possibly clobbered registers is %cr30 which holds the task
thread info pointer.
Now, if %cr30 would always have been clobbered, then this bug would have been
detected much earlier. But lots of testing finally showed, that - at least for
%cr30 - on some machines only the upper 32bits of the 64bit register suddenly
turned zero after the firmware call.
So, after finding the root cause, the explanation for the various crashes
became clear:
- On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
problem.
- Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
thread info pointer was below 4GB.
- Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
the upper 32bit were zero anyay.
- Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
info pointer was located above the 4GB boundary.
Finally, the patch to fix this problem is trivial by saving the %cr30 register
before the firmware call and restoring it afterwards.
Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # 2.6.12+ Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Sat, 26 Oct 2013 03:38:47 +0000 (04:38 +0100)]
Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from
"These fix two bugs in the intel_pstate driver, a hibernate bug leading
to nasty resume failures sometimes and acpi-cpufreq initialization bug
that causes problems to happen during module unload when intel_pstate
is in use.
Specifics:
- Fix for rounding errors in intel_pstate causing CPU utilization to
be underestimated from Brennan Shacklett.
- intel_pstate fix to always use the correct max pstate value when
computing the min pstate from Dirk Brandewie.
- Hibernation fix for deadlocking resume in cases when the probing of
the device containing the image is deferred from Russ Dill.
- acpi-cpufreq fix to prevent the module from staying in memory when
the driver cannot be registered and then attempting to unregister
things that have never been registered on exit"
* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
acpi-cpufreq: Fail initialization if driver cannot be registered
PM / hibernate: Move software_resume to late_initcall_sync
intel_pstate: Correct calculation of min pstate value
intel_pstate: Improve accuracy by not truncating until final result
Linus Torvalds [Fri, 25 Oct 2013 19:15:13 +0000 (20:15 +0100)]
Merge tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd
Pull final mtd fixes from Brian Norris:
"A few more last-minute regression fixes, prepared jointly by me and
David Woodhouse:
- Revert pxa3xx to its old name to avoid breaking existing
'mtdparts=' boot strings.
- Return GPMI NAND to its legacy ECC layout for backwards
compatibility. We will revisit this in 3.13.
A note from David on the latter fix: 'This leaves a harmless cosmetic
warning about an unused function. At this point in the cycle I really
don't care.'"
* tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd:
mtd: gpmi: fix ECC regression
mtd: nand: pxa3xx: Fix registered MTD name
vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
This patch addresses a long-standing bug where the get_user_pages_fast()
write parameter used for setting the underlying page table entry permission
bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
However, this parameter is intended to signal WRITEs to pinned userspace
PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
This bug would manifest itself as random process segmentation faults on
KVM host after repeated vhost starts + stops and/or with lots of vhost
endpoints + LUNs.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Asias He <asias@redhat.com> Cc: <stable@vger.kernel.org> # 3.6+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Wei Yongjun [Fri, 25 Oct 2013 13:53:33 +0000 (21:53 +0800)]
target/pscsi: fix return value check
In case of error, the function scsi_host_lookup() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Fri, 25 Oct 2013 17:16:47 +0000 (18:16 +0100)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes (try two) from Al Viro:
"nfsd performance regression fix + seq_file lseek(2) fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
seq_file: always update file->f_pos in seq_lseek()
nfsd regression since delayed fput()
David Woodhouse [Fri, 25 Oct 2013 14:03:59 +0000 (15:03 +0100)]
mtd: gpmi: fix ECC regression
The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
computing the ECC strength and ECC step size ourselves.
Commit 2febcdf84b ("mtd: gpmi: set the BCHs geometry with the ecc info")
makes the driver use the ECC info (ECC strength and ECC step size)
provided by the MTD code, and creates a different NAND ECC layout
for the BCH, and use the new ECC layout. This causes a regression:
We can not mount the ubifs which was created by the old NAND ECC layout.
This patch fixes this issue by reverting to the legacy ECC layout.
We will probably introduce a new device-tree property to indicate that
the new ECC layout can be used. For now though, for the imminent 3.12
release, we just unconditionally revert to the 3.11 behaviour.
This leaves a harmless cosmetic warning about an unused function. At
this point in the cycle I really don't care.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Huang Shijie <b32955@freescale.com> Acked-by: Marek Vasut <marex@denx.de> Tested-by: Marek Vasut <marex@denx.de>
Gu Zheng [Fri, 25 Oct 2013 10:15:06 +0000 (18:15 +0800)]
seq_file: always update file->f_pos in seq_lseek()
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41
As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:
char str1[32] = { 0 };
char str2[32] = { 0 };
int poffset = 10;
int count = 20;
/*open any seq file*/
int fd = open("/proc/modules", O_RDONLY);
acpi-cpufreq: Fail initialization if driver cannot be registered
Make acpi_cpufreq_init() return error codes when the driver cannot be
registered so that the module doesn't stay useless in memory and so
that acpi_cpufreq_exit() doesn't attempt to unregister things that
have never been registered when the module is unloaded.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Linus Torvalds [Fri, 25 Oct 2013 10:49:23 +0000 (11:49 +0100)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"There's really only one bugfix in this branch, which is a fix for
timers on the integrator platform. Since Linus Walleij is
resurrecting support for the platform it seems valuable to get the fix
into 3.12 even though the regression has been around a while.
The rest are a handful of maintainers updates. If you prefer to hold
those until 3.13 then just merge the first patch on the branch which
is the fix"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Add maintainers entry for Rockchip SoCs
MAINTAINERS: Tegra updates, and driver ownership
MAINTAINERS: ARM: mvebu: add Sebastian Hesselbarth
ARM: integrator: deactivate timer0 on the Integrator/CP
Linus Torvalds [Fri, 25 Oct 2013 06:32:01 +0000 (07:32 +0100)]
Merge tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Two important fixes
- Fix long standing memory leak in the (rarely used) public key
support
- Fix large file corruption on 32 bit architectures"
* tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: fix 32 bit corruption issue
ecryptfs: Fix memory leakage in keystore.c
Russ Dill [Thu, 24 Oct 2013 13:25:26 +0000 (14:25 +0100)]
PM / hibernate: Move software_resume to late_initcall_sync
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.
Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.
Signed-off-by: Russ Dill <Russ.Dill@ti.com> Acked-by: Pavel Machek <Pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There's no advantage in using a hardcoded name for the mtd device.
Instead use the provided by the platform_device.
The MTD name was changed to use the one provided by the platform_device.
However, this can be problematic as some users want to set partitions
using the kernel parameter 'mtdparts', where the name is needed.
Therefore, to avoid regressions in users relying in 'mtdparts' we revert
the change and use the previous one 'pxa3xx_nand-0'.
While at it, let's put a big comment and prevent this change from happening
ever again.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reported-by: Lars Duesing <lars.duesing@camelotsweb.de> Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
target: Fail XCOPY for non matching source + destination block_size
This patch adds an explicit check + failure for XCOPY I/O to source +
destination devices with a non-matching block_size.
This limitiation is currently due to the fact that the scatterlist
memory allocated for the XCOPY READ operation is passed zero-copy
to the XCOPY WRITE operation.
Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Reported-by: Douglas Gilbert <dgilbert@interlog.com> Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
target: Generate failure for XCOPY I/O with non-zero scsi_status
This patch adds the missing non-zero se_cmd->scsi_status check required
for local XCOPY I/O within target_xcopy_issue_pt_cmd() to signal an
exception case failure.
This will trigger the generation of SAM_STAT_CHECK_CONDITION status
from within target_xcopy_do_work() process context code.
Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Reported-by: Douglas Gilbert <dgilbert@interlog.com> Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds the missing xcopy_pt_cmd->sense_buffer[] required for
correctly handling CHECK_CONDITION exceptions within the locally
generated XCOPY I/O path.
Also update target_xcopy_read_source() + target_xcopy_setup_pt_cmd()
to pass this buffer into transport_init_se_cmd() to correctly setup
se_cmd->sense_buffer.
Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Reported-by: Douglas Gilbert <dgilbert@interlog.com> Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Thu, 24 Oct 2013 06:45:34 +0000 (07:45 +0100)]
Merge tag 'md/3.12-fixes' of git://neil.brown.name/md
Pull md bugfixes from Neil Brown:
"Assorted md bug-fixes for 3.12.
All tagged for -stable releases too"
* tag 'md/3.12-fixes' of git://neil.brown.name/md:
raid5: avoid finding "discard" stripe
raid5: set bio bi_vcnt 0 for discard request
md: avoid deadlock when md_set_badblocks.
md: Fix skipping recovery for read-only arrays.
Linus Torvalds [Thu, 24 Oct 2013 06:44:47 +0000 (07:44 +0100)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of two fixes which cause oopses (Buslogic, qla2xxx) and
one fix which may cause a hang because of request miscounting (sd)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] sd: call blk_pm_runtime_init before add_disk
[SCSI] qla2xxx: Fix request queue null dereference.
[SCSI] BusLogic: Fix an oops when intializing multimaster adapter
Vu Pham [Mon, 21 Oct 2013 21:48:54 +0000 (00:48 +0300)]
iser-target: check device before dereferencing its variable
This patch changes isert_connect_release() to correctly check for
the existence struct isert_device *device before checking for
isert_device->use_frwr.
Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Shaohua Li [Sat, 19 Oct 2013 06:51:42 +0000 (14:51 +0800)]
raid5: avoid finding "discard" stripe
SCSI discard will damage discard stripe bio setting, eg, some fields are
changed. If the stripe is reused very soon, we have wrong bios setting. We
remove discard stripe from hash list, so next time the strip will be fully
initialized.
Suitable for backport to 3.7+.
Cc: <stable@vger.kernel.org> (3.7+) Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
Shaohua Li [Sat, 19 Oct 2013 06:50:28 +0000 (14:50 +0800)]
raid5: set bio bi_vcnt 0 for discard request
SCSI layer will add new payload for discard request. If two bios are merged
to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
SCSI and cause oops.
Suitable for backport to 3.7+
Cc: stable@vger.kernel.org (v3.7+) Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Bian Yu [Sat, 12 Oct 2013 05:10:03 +0000 (01:10 -0400)]
md: avoid deadlock when md_set_badblocks.
When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.
This bug was introduce in commit 2e8ac30312973dd20e68073653
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.
spares are activated on a read-only array. In case of raid1 and raid10
personalities it causes that not-in-sync devices are marked in-sync
without checking if recovery has been finished.
If a read-only array is degraded and one of its devices is not in-sync
(because the array has been only partially recovered) recovery will be skipped.
This patch adds checking if recovery has been finished before marking a device
in-sync for raid1 and raid10 personalities. In case of raid5 personality
such condition is already present (at raid5.c:6029).
Bug was introduced in 3.10 and causes data corruption.
Let's say the disk_events_workfn() calls sd_check_events() which tries
to send test_unit_ready() and because of sd_revalidate_disk() trying to
send another commands the test_unit_ready() might be re-queued as the
tagged command queuing is disabled.
The problem is, the test_unit_ready request doesn't get counted the
first time it is queued, so the later decrement of q->nr_pending in
blk_pm_requeue_request makes it unbalanced.
Fix this by calling blk_pm_runtime_init before add_disk so that all
requests initiated there will all be counted.
Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reported-and-tested-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If an invalid IOCB is returned on the response queue then the index into the
request queue map could be invalid and could return to us a bogus value. This
could cause us to try to deference an invalid pointer and cause an exception.
If we encounter this condition, simply return as no context can be established
for this response.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Thomas Gleixner [Tue, 24 Sep 2013 19:50:23 +0000 (21:50 +0200)]
clockevents: Sanitize ticks to nsec conversion
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.
The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.
The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.
Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:
u64 clc = (latch << evt->shift) / evt->mult;
So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:
u64 clc = (latch << evt->shift) + evt->mult - 1;
So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.
[ukl: move assignment to rnd after eventually changing mult, fix build
issue and correct comment with the right math]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: nicolas.ferre@atmel.com Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: john.stultz@linaro.org Cc: kernel@pengutronix.de Cc: Ronald Wahl <ronald.wahl@raritan.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Linus Torvalds [Wed, 23 Oct 2013 07:10:25 +0000 (08:10 +0100)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Several last minute bug fixes.
Two of them are on the larger side for rc7, the dasd format patch for
older storage devices and the store-clock-fast patch where we have
been to optimistic with an optimization"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/time: correct use of store clock fast
s390/vmlogrdr: fix array access in vmlogrdr_open()
s390/compat,signal: fix return value of copy_siginfo_(to|from)_user32()
s390/dasd: check for availability of prefix command during format
s390/mm,kvm: fix software dirty bits vs. kvm for old machines
Linus Torvalds [Wed, 23 Oct 2013 06:58:22 +0000 (07:58 +0100)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
"These includes several commits that are necessary to properly fix
regression for TMU test MUX address setting after reset, for exynos
thermal driver.
Specifics:
- fix a regression that the removal of setting a certain field at TMU
configuration setting results in immediately shutdown after reset
on Exynos4412 SoC.
- revert a patch which tries to link the thermal_zone device and its
hwmon node but breaks libsensors.
- fix a deadlock/lockdep warning issue in x86_pkg_temp thermal
driver, which can be reproduced on a buggy platform only.
- fix ti-soc-thermal driver to fall back on bandgap reading when
reading from PCB temperature sensor fails"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
Revert "drivers: thermal: parent virtual hwmon with thermal zone"
drivers: thermal: allow ti-soc-thermal run without pcb zone
thermal: exynos: Provide initial setting for TMU's test MUX address at Exynos4412
thermal: exynos: Provide separate TMU data for Exynos4412
thermal: exynos: Remove check for thermal device pointer at exynos_report_trigger()
Thermal: x86_pkg_temp: change spin lock
Linus Torvalds [Wed, 23 Oct 2013 06:52:36 +0000 (07:52 +0100)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- Compilation fixes for GCC < 4.4.6
- one Kbuild dependency select fix (selecting videobuf on msi3101)
- driver fixes on tda10071, e4000, msi3101, soc_camera, s5p-jpeg,
saa7134 and adv7511
- some device quirks needed to make them work properly
- some videobuf2 core regression fixes for some features used only on
embedded drivers
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] saa7134: Fix crash when device is closed before streamoff
[media] adv7511: fix error return code in adv7511_probe()
[media] ths8200: fix compilation with GCC < 4.4.6
[media] ad9389b: fix compilation with GCC < 4.4.6
[media] adv7511: fix compilation with GCC < 4.4.6
[media] adv7842: fix compilation with GCC < 4.4.6
[media] s5p-jpeg: Initialize vfd_decoder->vfl_dir field
[media] videobuf2-dc: Fix support for mappings without struct page in userptr mode
[media] vb2: Allow queuing OUTPUT buffers with zeroed 'bytesused'
[media] mx3-camera: locking cleanup in mx3_videobuf_queue()
[media] sh_vou: almost forever loop in sh_vou_try_fmt_vid_out()
[media] tda10071: change firmware download condition
[media] msi3101: correct max videobuf2 alloc
[media] Add HCL T12Rg-H to STK webcam upside-down table
[media] msi3101: Kconfig select VIDEOBUF2_VMALLOC
[media] msi3101: msi3101_ioctl_ops can be static
[media] e4000: fix PLL calc bug on 32-bit arch
[media] uvcvideo: quirk PROBE_DEF for Microsoft Lifecam NX-3000
[media] uvcvideo: quirk PROBE_DEF for Dell SP2008WFP monitor
Linus Torvalds [Wed, 23 Oct 2013 06:51:25 +0000 (07:51 +0100)]
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband bugfix from Roland Dreier:
"Disable not-quite-ready userspace ABI for IB flow steering"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/core: Temporarily disable create_flow/destroy_flow uverbs
Pull networking fixes from David Miller:
"Sorry I let so much accumulate, I was in Buffalo and wanted a few
things to cook in my tree for a while before sending to you. Anyways,
it's a lot of little things as usual at this stage in the game"
1) Make bonding MAINTAINERS entry reflect reality, from Andy
Gospodarek.
2) Fix accidental sock_put() on timewait mini sockets, from Eric
Dumazet.
3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6
addresses, from François CACHEREUL.
4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan
Carpenter.
5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric
Dumazet.
6) SFC driver bug fixes from Ben Hutchings.
7) Fix TX packet scheduling wedge after channel change in ath9k driver,
from Felix Fietkau.
8) Fix user after free in BPF JIT code, from Alexei Starovoitov.
9) Source address selection test is reversed in
__ip_route_output_key(), fix from Jiri Benc.
10) VLAN and CAN layer mis-size netlink attributes, from Marc
Kleine-Budde.
11) Fix permission checks in sysctls to use current_euid() instead of
current_uid(). From Eric W Biederman.
12) IPSEC policies can go away while a timer is still pending for them,
add appropriate ref-counting to fix, from Steffen Klassert.
13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth
chips, from Nguyen Hong Ky and Simon Horman.
14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai.
15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama
Ghorbel.
16) Fix typo in fq_change(), we were assigning "initial quantum" to
"quantum". From Eric Dumazet.
17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise
FQ packet scheduler does not pace those flows properly. Also from
Eric Dumazet.
18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland.
19) l2tp_xmit_skb() can be called from process context, not just softirq
context, so we must always make sure to BH disable around it. From
Eric Dumazet.
20) On qdisc reset, we forget to purge the RB tree of SKBs in netem
packet scheduler. From Stephen Hemminger.
21) Fix info leak in farsync WAN driver ioctl() handler, from Dan
Carpenter and Salva Peiró.
22) Fix PHY reset and other issues in dm9000 driver, from Nikita
Kiryanov and Michael Abbott.
23) When hardware can do SCTP crc32 checksums, we accidently don't
disable the csum offload when IPSEC transformations have been
applied. From Fan Du and Vlad Yasevich.
24) Tail loss probing in TCP leaves the socket in the wrong congestion
avoidance state. From Yuchung Cheng.
25) In CPSW driver, enable NAPI before interrupts are turned on, from
Markus Pargmann.
26) Integer underflow and dual-assignment in YAM hamradio driver, from
Dan Carpenter.
27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must
unclone it. This fixes various hard to track down crashes in
drivers where the SKBs ->gso_segs was changing right from underneath
the driver during TX queueing. From Eric Dumazet.
28) Fix the handling of VLAN IDs, and in particular the special IDs 0
and 4095, in the bridging layer. From Toshiaki Makita.
29) Another info leak, this time in wanxl WAN driver, from Salva Peiró.
30) Fix race in socket credential passing, from Daniel Borkmann.
31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly,
from Seif Mazareeb.
32) Fix identification of fragmented frames in ipv4/ipv6 UDP
Fragmentation Offload output paths, from Jiri Pirko.
33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel
Elior.
34) When we removed the explicit neighbour pointer from ipv6 routes a
slight regression was introduced for users such as IPVS, xt_TEE, and
raw sockets. We mix up the users requested destination address with
the routes assigned nexthop/gateway. From Julian Anastasov and
Simon Horman.
35) Fix stack overruns in rt6_probe(), the issue is that can end up
doing two full packet xmit paths at the same time when emitting
neighbour discovery messages. From Hannes Frederic Sowa.
36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from
Mariusz Ceier.
37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT
sample, from Neal Cardwell.
38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from
Steffen Klassert.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter
ax88179_178a: Correct the RX error definition in RX header
Revert "bridge: only expire the mdb entry when query is received"
tcp: initialize passive-side sk_pacing_rate after 3WHS
davinci_emac.c: Fix IFF_ALLMULTI setup
mac802154: correct a typo in ieee802154_alloc_device() prototype
ipv6: probe routes asynchronous in rt6_probe
netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
ipv6: fill rt6i_gateway with nexthop address
ipv6: always prefer rt6i_gateway if present
bnx2x: Set NETIF_F_HIGHDMA unconditionally
bnx2x: Don't pretend during register dump
bnx2x: Lock DMAE when used by statistic flow
bnx2x: Prevent null pointer dereference on error flow
bnx2x: Fix config when SR-IOV and iSCSI are enabled
bnx2x: Fix Coalescing configuration
bnx2x: Unlock VF-PF channel on MAC/VLAN config error
bnx2x: Prevent an illegal pointer dereference during panic
bnx2x: Fix Maximum CoS estimation for VFs
drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
...
Linus Lüssing [Sat, 19 Oct 2013 22:58:57 +0000 (00:58 +0200)]
Revert "bridge: only expire the mdb entry when query is received"
While this commit was a good attempt to fix issues occuring when no
multicast querier is present, this commit still has two more issues:
1) There are cases where mdb entries do not expire even if there is a
querier present. The bridge will unnecessarily continue flooding
multicast packets on the according ports.
2) Never removing an mdb entry could be exploited for a Denial of
Service by an attacker on the local link, slowly, but steadily eating up
all memory.
Actually, this commit became obsolete with
"bridge: disable snooping if there is no querier" (b00589af3b)
which included fixes for a few more cases.
Therefore reverting the following commits (the commit stated in the
commit message plus three of its follow up fixes):
CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not touch keyboard backlight unless explicitly passed a module
parameter. In this way we won't make wrong assumptions about what are
good default values since they actually are different from model to
model.
The only side effect is that we won't know what is the current value
until set via the sysfs attributes.
Randy Dunlap [Sat, 19 Oct 2013 21:57:07 +0000 (14:57 -0700)]
vfs: fix new kernel-doc warnings
Move kernel-doc notation to immediately before its function to eliminate
kernel-doc warnings introduced by commit db14fc3abcd5 ("vfs: add
d_walk()")
Warning(fs/dcache.c:1343): No description found for parameter 'data'
Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 22 Oct 2013 09:24:29 +0000 (10:24 +0100)]
Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The pending last-minute ASoC fixes, all of which are driver-local
(tlv320aic3x, rcar, pcm1681, pcm1792a, omap, fsl) and should be pretty
safe to apply"
* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: Add MAINTAINERS entry for dmaengine helpers
ASoC: pcm1792a: Fix max_register setting
ASoC: pcm1681: Fix max_register setting
ASoC: pcm1681: Fix max_register setting
ASoC: rcar: fixup generation checker
ASoC: tlv320aic3x: Connect 'Left Line1R Mux' and 'Right Line1L Mux'
ASoC: fsl: imx-ssi: fix probe on imx31
ASoC: omap: Fix incorrect ARM dependency
ASoC: fsl: Fix sound on mx31moboard
ASoC: fsl_ssi: Fix irq_of_parse_and_map() return value check
Linus Torvalds [Tue, 22 Oct 2013 07:23:41 +0000 (08:23 +0100)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Travelling slowed down getting these out.
Two vmwgfx fixes, a radeon revert to avoid a regression, i915 fixes,
and some ioctl sizing issues fixed with 32 on 64"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/audio: don't set speaker allocation on DCE4+
drm/radeon: rework audio option
drm/radeon/audio: don't set speaker allocation on DCE3.2
drm/radeon: make missing smc ucode non-fatal (CI)
drm/radeon: make missing smc ucode non-fatal (r7xx-SI)
drm/radeon/uvd: revert lower msg&fb buffer requirements on UVD3
drm/radeon: stop the leaks in cik_ib_test
drm/radeon/atom: workaround vbios bug in transmitter table on rs780
drm/i915: Disable GGTT PTEs on GEN6+ suspend
drm/i915: Make PTE valid encoding optional
drm: Pad drm_mode_get_connector to 64-bit boundary
drm: Prevent overwriting from userspace underallocating core ioctl structs
drm/vmwgfx: Don't kill clients on VT switch
drm/vmwgfx: Don't put resources with invalid id's on lru list
drm/i915: disable LVDS clock gating on CPT v2
Linus Torvalds [Tue, 22 Oct 2013 07:22:40 +0000 (08:22 +0100)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- a partial revert of exponent parsing changes to make "Unit" exponent
item work properly again, by Nikolai Kondrashov
- a few new device IDs additions piggy-backing, by AceLan Kao and David
Herrmann
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wiimote: add LEGO-wiimote VID
HID: Fix unit exponent parsing again
HID: usbhid: quirk for SiS Touchscreen
HID: usbhid: quirk for Synaptics Large Touchccreen
Linus Torvalds [Tue, 22 Oct 2013 07:21:34 +0000 (08:21 +0100)]
Merge branch 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"The only interesting bit is ata_eh_qc_retry() update which fixes a
problem where a SG_IO command may fail across suspend/resume cycle
without the command actually being at fault.
Other changes are low level driver specific and fairly low impact"
* 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libahci: fix turning on LEDs in ahci_start_port()
libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures
ahci_platform: use dev_info() instead of printk()
ahci: use dev_info() instead of printk()
pata_isapnp: Don't use invalid I/O ports
Linus Torvalds [Tue, 22 Oct 2013 07:20:34 +0000 (08:20 +0100)]
Merge branch 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two late fixes for cgroup.
One fixes descendant walk introduced during this rc1 cycle. The other
fixes a post 3.9 bug during task attach which can lead to hang. Both
fixes are critical and the fixes are relatively straight-forward"
* 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix to break the while loop in cgroup_attach_task() correctly
cgroup: fix cgroup post-order descendant walk of empty subtree
The result of the store-clock-fast (STCKF) instruction is a bit fuzzy.
It can happen that the value stored on one CPU is smaller than the value
stored on another CPU, although the order of the stores is the other
way around. This can cause deltas of get_tod_clock() values to become
negative when they should not be.
We need to be more careful with store-clock-fast, this patch partially
reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time:
always use stckf instead of stck if available". The get_tod_clock()
function now uses the store-clock-extended (STCKE) instruction.
get_tod_clock_fast() can be used if the fuzziness of store-clock-fast
is acceptable e.g. for wait loops local to a CPU.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dave Airlie [Tue, 22 Oct 2013 06:35:17 +0000 (07:35 +0100)]
Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Most just regression fixes for audio, dpm, and uvd, plus
a resource leak fix for cik.
* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon/audio: don't set speaker allocation on DCE4+
drm/radeon: rework audio option
drm/radeon/audio: don't set speaker allocation on DCE3.2
drm/radeon: make missing smc ucode non-fatal (CI)
drm/radeon: make missing smc ucode non-fatal (r7xx-SI)
drm/radeon/uvd: revert lower msg&fb buffer requirements on UVD3
drm/radeon: stop the leaks in cik_ib_test
drm/radeon/atom: workaround vbios bug in transmitter table on rs780
Dave Airlie [Tue, 22 Oct 2013 06:32:40 +0000 (07:32 +0100)]
Merge tag 'drm-intel-fixes-2013-10-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Just an lvds clock gating fix and a pte clearing hack for hsw to avoid
memory corruption when hibernating - something doesn't seem to switch off
properly, we're still investigating.
* tag 'drm-intel-fixes-2013-10-21' of git://people.freedesktop.org/~danvet/drm-intel: (96 commits)
drm/i915: Disable GGTT PTEs on GEN6+ suspend
drm/i915: Make PTE valid encoding optional
drm/i915: disable LVDS clock gating on CPT v2
Dirk Brandewie [Mon, 21 Oct 2013 16:20:33 +0000 (09:20 -0700)]
intel_pstate: Correct calculation of min pstate value
The minimum pstate is supposed to be a percentage of the maximum P
state available. Calculate min using max pstate and not the
current max which may have been limited by the user
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_pstate: Improve accuracy by not truncating until final result
This patch addresses Bug 60727
(https://bugzilla.kernel.org/show_bug.cgi?id=60727)
which was due to the truncation of intermediate values in the
calculations, which causes the code to consistently underestimate the
current cpu frequency, specifically 100% cpu utilization was truncated
down to the setpoint of 97%. This patch fixes the problem by keeping
the results of all intermediate calculations as fixed point numbers
rather scaling them back and forth between integers and fixed point.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60727 Signed-off-by: Brennan Shacklett <bpshacklett@gmail.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Neal Cardwell [Mon, 21 Oct 2013 19:40:19 +0000 (15:40 -0400)]
tcp: initialize passive-side sk_pacing_rate after 3WHS
For passive TCP connections, upon receiving the ACK that completes the
3WHS, make sure we set our pacing rate after we get our first RTT
sample.
On passive TCP connections, when we receive the ACK completing the
3WHS we do not take an RTT sample in tcp_ack(), but rather in
tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.
Originally the initial sk_pacing_rate value was 0, so passive-side
connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
made in the first RTT. With a default initial cwnd of 10 packets, this
happened to be correct for RTTs 5ms or bigger, so it was hard to
see problems in WAN or emulated WAN testing.
Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the
initial sk_pacing_rate is 0xffffffff. So after that change, passive
TCP connections were keeping this value (and using large numbers of
segments per skbuff) until receiving an ACK for data.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mariusz Ceier [Mon, 21 Oct 2013 17:45:04 +0000 (19:45 +0200)]
davinci_emac.c: Fix IFF_ALLMULTI setup
When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't,
emac_dev_mcast_set should only enable RX of multicasts and reset
MACHASH registers.
It does this, but afterwards it either sets up multicast MACs
filtering or disables RX of multicasts and resets MACHASH registers
again, rendering IFF_ALLMULTI flag useless.
This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and
disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set.
Tested with kernel 2.6.37.
Signed-off-by: Mariusz Ceier <mceier+kernel@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.
We update neigh->updated still at once, otherwise we would send to
many probes.
Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Oct 2013 22:39:36 +0000 (18:39 -0400)]
Merge branch 'rt6i_gateway'
Julian Anastasov says:
====================
ipv6: use rt6i_gateway as nexthop
The following patchset makes sure that rt6i_gateway
contains valid nexthop information in all cases, so that
we can use different nexthop for sending.
The first patch is a simple fix that makes IPVS, TEE,
RAW(hdrincl) and RTF_DYNAMIC(without RTF_GATEWAY) work as
before 3.9. There is a single corner case not solved by
this patch: RAW(hdrincl) or TEE using local address for
nexthop, a silly feature, I guess. In this case we
see zeroes in rt6i_gateway because we get route that is not
cloned. This is solved only with patch 2.
The second patch is an optimization that makes sure
all resulting routes have rt6i_gateway filled, so that we
can avoid the complex ipv6_addr_any() call added to rt6_nexthop()
by patch 1. And it sets rt6i_gateway for local routes, a case
not handled by patch 1.
The third patch uses the new rt6_nexthop() function to fix
the matching of gateways in the same way as commit bbb5823cf742a7
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper")
fixes nf_conntrack_h323_main.c for IPv4. Currently, it depends on
the new definition of rt6_nexthop() in patch 2. Actually, if
patch 2 is applied, patch 3 becomes a cosmetic change.
I see the following two alternatives for applying these
patches:
1. Linger patch 2 in net-next to avoid surprises in the upcoming
release. In this case patch 3 can be reworked not to depend on
the new rt6_nexthop() definition in patch 2. I guess this is a
better option, so that patch 2 can be reviewed and tested for
longer time.
2. Include all 3 patches in net tree - more risky because this
is my first attempt to change IPv6.
Here is the situation as handled by patch 2:
In IPv6 the resolved routes are always host routes (/128
with DST_HOST), mostly cloned ones. We allow routes in FIB
to contain rt6i_gateway with zeroes (eg. for local subnets) but
on cloning we can fill the rt6i_gateway field in result.
This works even without this patchset.
There is a single special case where dst is provided as
skb_dst directly without a routing call: icmp6_dst_alloc(). It is a
private dst allocated just for the particular ICMP packet. Patch 2
fills rt6i_gateway in this case, needed for the new rt6_nexthop()
simplification.
The last case is addrconf_dst_alloc(), it can put in
FIB local/anycast routes when addresses are added. Patch 2
needs to fill rt6i_gateway in this case because such routes
are returned without cloning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Sun, 20 Oct 2013 12:43:05 +0000 (15:43 +0300)]
netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
Now when rt6_nexthop() can return nexthop address we can use it
for proper nexthop comparison of directly connected destinations.
For more information refer to commit bbb5823cf742a7
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").
Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Sun, 20 Oct 2013 12:43:04 +0000 (15:43 +0300)]
ipv6: fill rt6i_gateway with nexthop address
Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.
The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Sun, 20 Oct 2013 12:43:03 +0000 (15:43 +0300)]
ipv6: always prefer rt6i_gateway if present
In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.
As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.
Another case is when redirect can create RTF_DYNAMIC
route without RTF_GATEWAY flag, we ignore the rt6i_gateway
in rt6_nexthop().
Fix the above problems by considering the rt6i_gateway if
present, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.
Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.
Thanks to Hannes Frederic Sowa for his review and help in testing.
Reported-by: Phil Oester <kernel@linuxace.com> Reported-by: Mark Brooks <mark@loadbalancer.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 21 Oct 2013 22:31:53 +0000 (18:31 -0400)]
Merge branch 'bnx2x'
Yuval Mintz says:
====================
bnx2x: Bug fixes patch series
This patch series contains fixes for various flows - several SR-IOV issues
are fixed, ethtool callbacks (coalescing and register dump) are corrected,
null pointer dereference on error flows is prevented, etc.
Changes from V1
---------------
- Patch 2 "bnx2x: Prevent an illegal pointer dereference during panic"
is revised, with improved handling of edge cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 20 Oct 2013 14:51:33 +0000 (16:51 +0200)]
bnx2x: Don't pretend during register dump
As part of a register dump, the interface pretends to have the identity
of other interfaces of the same physical device in order to perform
HW configuration for them - specifically, it needs to prevent attentions
from generating on those functions as the register dump accesses registers
in common blocks which whose reading might generate an attention.
However, such pretension is unsafe - unlike other flows in which the driver
uses pretend, during register dump there is no guarantee no other HW access
will take place (by other flows). If such access will take place, the HW will
be accessed by the wrong interface, and leave both functions in an incorrect
state.
This patch removes all pretensions from the register dump flow. Instead, it
changes initial configuration of attentions such that no fatal attention will
be generated for other functions as a result of the register dump
(notice however, a debug print claiming an attention from other functions IS
possible during the register dump)
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 20 Oct 2013 14:51:32 +0000 (16:51 +0200)]
bnx2x: Lock DMAE when used by statistic flow
bnx2x has several clients to its DMAE machines - all of them with the exception
of the statistics flow used the same locking mechanisms to synchronize the DMAE
machines' usage.
Since statistics (which are periodically entered) use DMAE without taking the
locks, they may erase the commands which were previously set -
e.g., it may cause a VF to timeout while waiting for a PF answer on the VF-PF
channel as that command header would have been overwritten by the statistics'
header.
This patch makes certain that all flows utilizing DMAE will use the same
API, assuring that the locking scheme will be kept by all said flows.
Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 20 Oct 2013 14:51:30 +0000 (16:51 +0200)]
bnx2x: Fix config when SR-IOV and iSCSI are enabled
Starting with commit b9871bc "bnx2x: VF RSS support - PF side", if a PF will
have SR-IOV supported in its PCI configuration space, storage drivers will not
work for that interface.
This patch fixes the resource calculation to allow such a configuration to
properly work.
Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 20 Oct 2013 14:51:29 +0000 (16:51 +0200)]
bnx2x: Fix Coalescing configuration
bnx2x drivers configure coalescing incorrectly (e.g., as a result of a call
to 'ethtool -c'). Although this is almost invisible to the user (due to NAPI)
designated tests will show the configuration is incorrect.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 20 Oct 2013 14:51:28 +0000 (16:51 +0200)]
bnx2x: Unlock VF-PF channel on MAC/VLAN config error
Current code returns upon failure, leaving the VF-PF in an unusable state;
This patch adds the missing release so further commands could pass between
PF and VF.
Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 20 Oct 2013 14:51:27 +0000 (16:51 +0200)]
bnx2x: Prevent an illegal pointer dereference during panic
During a panic, the driver tries to print the Management FW buffer of recent
commands. To do so, the driver reads the address of that buffer from a known
address. If the buffer is unavailable (e.g., PCI reads don't work, MCP is
failing, etc.), the driver will try to access the address it has read, possibly
causing a kernel panic.
This check 'sanitizes' the access, validating the read value is indeed a valid
address inside the management FW's buffers.
The patch also removes a read outside the scope of the buffer, which resulted
in some unrelated chraracters appearing in the log.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Sun, 20 Oct 2013 07:02:26 +0000 (12:32 +0530)]
drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
When interrupt pacing is enabled, receive/transmit statistics are not
updated properly by hardware which leads to ISR return with IRQ_NONE
and inturn kernel disables the interrupt. This patch removed the checking
of receive/transmit statistics from ISR.
This patch is verified with AM335x Beagle Bone Black and below is the
kernel warn when interrupt pacing is enabled.
[ 104.298254] irq 58: nobody cared (try booting with the "irqpoll" option)
[ 104.305356] CPU: 0 PID: 1073 Comm: iperf Not tainted 3.12.0-rc3-00342-g77d4015 #3
[ 104.313284] [<c001bb84>] (unwind_backtrace+0x0/0xf0) from [<c0017db0>] (show_stack+0x10/0x14)
[ 104.322282] [<c0017db0>] (show_stack+0x10/0x14) from [<c0507920>] (dump_stack+0x78/0x94)
[ 104.330816] [<c0507920>] (dump_stack+0x78/0x94) from [<c0088c1c>] (__report_bad_irq+0x20/0xc0)
[ 104.339889] [<c0088c1c>] (__report_bad_irq+0x20/0xc0) from [<c008912c>] (note_interrupt+0x1dc/0x23c)
[ 104.349505] [<c008912c>] (note_interrupt+0x1dc/0x23c) from [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238)
[ 104.359851] [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238) from [<c0086f24>] (handle_irq_event+0x3c/0x5c)
[ 104.370198] [<c0086f24>] (handle_irq_event+0x3c/0x5c) from [<c008991c>] (handle_level_irq+0xac/0x10c)
[ 104.379907] [<c008991c>] (handle_level_irq+0xac/0x10c) from [<c00866d8>] (generic_handle_irq+0x20/0x30)
[ 104.389812] [<c00866d8>] (generic_handle_irq+0x20/0x30) from [<c0014ce8>] (handle_IRQ+0x4c/0xb0)
[ 104.399066] [<c0014ce8>] (handle_IRQ+0x4c/0xb0) from [<c000856c>] (omap3_intc_handle_irq+0x60/0x74)
[ 104.408598] [<c000856c>] (omap3_intc_handle_irq+0x60/0x74) from [<c050d8e4>] (__irq_svc+0x44/0x5c)
[ 104.418021] Exception stack(0xde4f7c00 to 0xde4f7c48)
[ 104.423345] 7c00: 000000010000000000000000dd00214060000013de006e540000000200000000
[ 104.431952] 7c20: de34574800000040c11c858800018ee000000000de4f7c48c009dfc8c050d300
[ 104.440553] 7c40: 60000013ffffffff
[ 104.444237] [<c050d8e4>] (__irq_svc+0x44/0x5c) from [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44)
[ 104.454220] [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44) from [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38)
[ 104.465295] [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38) from [<c0088068>] (enable_irq+0x4c/0x74)
[ 104.474829] [<c0088068>] (enable_irq+0x4c/0x74) from [<c03abd24>] (cpsw_poll+0xb8/0xdc)
[ 104.483276] [<c03abd24>] (cpsw_poll+0xb8/0xdc) from [<c044ef68>] (net_rx_action+0xc0/0x1e8)
[ 104.492085] [<c044ef68>] (net_rx_action+0xc0/0x1e8) from [<c0048a90>] (__do_softirq+0x100/0x27c)
[ 104.501338] [<c0048a90>] (__do_softirq+0x100/0x27c) from [<c0048cd0>] (do_softirq+0x68/0x70)
[ 104.510224] [<c0048cd0>] (do_softirq+0x68/0x70) from [<c0048e8c>] (local_bh_enable+0xd0/0xe4)
[ 104.519211] [<c0048e8c>] (local_bh_enable+0xd0/0xe4) from [<c048c774>] (tcp_rcv_established+0x450/0x648)
[ 104.529201] [<c048c774>] (tcp_rcv_established+0x450/0x648) from [<c0494904>] (tcp_v4_do_rcv+0x154/0x474)
[ 104.539195] [<c0494904>] (tcp_v4_do_rcv+0x154/0x474) from [<c043d750>] (release_sock+0xac/0x1ac)
[ 104.548448] [<c043d750>] (release_sock+0xac/0x1ac) from [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c)
[ 104.557528] [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c) from [<c04a8720>] (inet_recvmsg+0xcc/0xf0)
[ 104.566507] [<c04a8720>] (inet_recvmsg+0xcc/0xf0) from [<c0439744>] (sock_recvmsg+0x90/0xb0)
[ 104.575394] [<c0439744>] (sock_recvmsg+0x90/0xb0) from [<c043b778>] (SyS_recvfrom+0x88/0xd8)
[ 104.584280] [<c043b778>] (SyS_recvfrom+0x88/0xd8) from [<c043b7e0>] (sys_recv+0x18/0x20)
[ 104.592805] [<c043b7e0>] (sys_recv+0x18/0x20) from [<c0013da0>] (ret_fast_syscall+0x0/0x48)
[ 104.601587] handlers:
[ 104.603992] [<c03acd94>] cpsw_interrupt
[ 104.608040] Disabling IRQ #58
Cc: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.
So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).
The feature will be enabled after proper cleanup for v3.13.
Zhang Rui [Mon, 21 Oct 2013 02:28:34 +0000 (10:28 +0800)]
Revert "drivers: thermal: parent virtual hwmon with thermal zone"
Commit b82715fdd4a5407f56853b24d387d484dd9c3b5b introduces
a 'device' subdirectory under /sys/class/hwmon/hwmonX/ directory,
for the thermal_zone hwmon devices. And this results in different
handling by libsensors.
The problem is reported and discussed in this thread
http://marc.info/?l=linux-pm&m=138229306109596&w=2
Linus Torvalds [Sun, 20 Oct 2013 22:20:45 +0000 (23:20 +0100)]
Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:
"There are just two small fixes in here:
- Revert a commit which exported the flush_cache_page function. This
was noticed by Christoph Hellwig.
- Enable the DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD config
options in the parisc defconfigs so that latest udev/initrd finds
the root disk at boot"
* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: enable DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD in defconfigs
Revert "parisc: Export flush_cache_page() (needed by lustre)"
Al Viro [Sun, 20 Oct 2013 12:44:39 +0000 (08:44 -0400)]
nfsd regression since delayed fput()
Background: nfsd v[23] had throughput regression since delayed fput
went in; every read or write ends up doing fput() and we get a pair
of extra context switches out of that (plus quite a bit of work
in queue_work itselfi, apparently). Use of schedule_delayed_work()
gives it a chance to accumulate a bit before we do __fput() on all
of them. I'm not too happy about that solution, but... on at least
one real-world setup it reverts about 10% throughput loss we got from
switch to delayed fput.
Jiri Pirko [Sat, 19 Oct 2013 10:29:17 +0000 (12:29 +0200)]
ip_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 19 Oct 2013 10:29:16 +0000 (12:29 +0200)]
ip6_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 19 Oct 2013 10:29:15 +0000 (12:29 +0200)]
udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames
if up->pending != 0 dontfrag is left with default value -1. That
causes that application that do:
sendto len>mtu flag MSG_MORE
sendto len>mtu flag 0
will receive EMSGSIZE errno as the result of the second sendto.
This patch fixes it by respecting IPV6_DONTFRAG socket option.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Seif Mazareeb [Fri, 18 Oct 2013 03:33:21 +0000 (20:33 -0700)]
net: fix cipso packet validation when !NETLABEL
When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.
This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:
Daniel Borkmann [Thu, 17 Oct 2013 20:51:31 +0000 (22:51 +0200)]
net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race
In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").
We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.
On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.
As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>