git.karo-electronics.de Git - karo-tx-linux.git/log

Use the recently-added bio front_pad field to allocate struct dm_target_io.

Prior to this patch, dm_target_io was allocated from a mempool. For each
dm_target_io, there is exactly one bio allocated from a bioset.

This patch merges these two allocations into one allocation: we create a
bioset with front_pad equal to the size of dm_target_io so that every
bio allocated from the bioset has sizeof(struct dm_target_io) bytes
before it. We allocate a bio and use the bytes before the bio as
dm_target_io.

This idea was introduced by Kent Overstreet.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: tj@kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Mikulas Patocka [Tue, 18 Sep 2012 00:15:17 +0000 (10:15 +1000)]

Use the ACCESS_ONCE macro in dm-bufio and dm-verity where a variable
can be modified asynchronously (through sysfs) and we want to prevent
compiler optimizations that assume that the variable hasn't changed.
(See Documentation/atomic_ops.txt.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Wei Yongjun [Tue, 18 Sep 2012 00:15:17 +0000 (10:15 +1000)]

Use list_move() instead of list_del() + list_add().

spatch with a semantic match was used to find this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Mike Snitzer [Tue, 18 Sep 2012 00:15:16 +0000 (10:15 +1000)]

A little thin discard code refactoring in preparation for the next
patch. Use bools instead of unsigned for features.

No functional changes.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Mike Snitzer [Tue, 18 Sep 2012 00:15:16 +0000 (10:15 +1000)]

The dm thin pool target claims to support the zeroing of discarded
data areas.  This turns out to be incorrect when processing discards
that do not exactly cover a complete number of blocks, so the target
must always set discard_zeroes_data_unsupported.

The thin pool target will zero blocks when they are allocated if the
skip_block_zeroing feature is not specified.  The block layer
may send a discard that only partly covers a block.  If a thin pool
block is partially discarded then there is no guarantee that the
discarded data will get zeroed before it is accessed again.
Due to this, thin devices cannot claim discards will always zero data.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

commit | commitdiff | tree

Paolo Bonzini [Mon, 17 Sep 2012 23:36:11 +0000 (16:36 -0700)]

target: go through normal processing for all zero-length commands

Yay, all users of transport_kmap_data_sg now check for a zero-length
request and/or a too-small parameter list length.  We can thus go through
the normal emulation path even for such commands.

This means that out-of-bounds reads and writes are now reported correctly
even if they transfer 0 blocks.  Other errors are also reported correctly.

Testcase: sg_raw /dev/sdb 28 00 80 00 00 00 00 00 00 00
    should fail with ILLEGAL REQUEST / LBA OUT OF RANGE sense
    does not fail without the patch
    (still wrong with the patch, but better: the ASC is INVALID FIELD IN CDB)

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Sep 2012 15:30:41 +0000 (17:30 +0200)]

target: do not submit a zero-bio I/O request

scsi_setup_fs_cmnd does not like to receive requests with no
bios attached to it.  Special-case zero-length reads and writes,
by not submitting any bio.

Testcase: sg_raw /dev/sdb 28 00 00 00 00 00 00 00 00 00
    should not fail
    panics with the rest of the series but not this patch
    behaves correctly without or with this series

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Sep 2012 15:30:40 +0000 (17:30 +0200)]

target: support zero allocation length in SBC commands

READ CAPACITY must be subject to the same treatment as INQUIRY,
REQUEST SENSE, and MODE SENSE, but there are no pre-existing bugs
to fix here. Just use an on-stack buffer, and copy to it after
checking the return value of transport_kmap_data_sg.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Sep 2012 15:30:39 +0000 (17:30 +0200)]

target: fix truncation of mode data, support zero allocation length

The offset was not bumped back to the full size after writing the
header of the MODE SENSE response, so the last 1 or 2 bytes were
not copied.

On top of this, support zero-length requests by checking for the
return value of transport_kmap_data_sg.

Testcase: sg_raw -r20 /dev/sdb 5a 00 0a 00 00 00 00 00 14 00
    last byte should be 0x1e
    it is 0x00 without the patch
    it is correct with the patch

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Sep 2012 15:30:38 +0000 (17:30 +0200)]

target: support zero allocation length in INQUIRY

INQUIRY processing already uses an on-heap bounce buffer for loopback,
but not for other fabrics.  Switch this to a cheaper on-stack bounce
buffer, similar to the one used by MODE SENSE and REQUEST SENSE, and
use it unconditionally.  With this in place, zero allocation length is
handled simply by checking the return address of transport_kmap_data_sg.

Testcase: sg_raw /dev/sdb 12 00 83 00 00 00
    should fail with ILLEGAL REQUEST / INVALID FIELD IN CDB sense
    does not fail without the patch
    fails correctly with the series

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Wei Yongjun [Wed, 5 Sep 2012 06:42:48 +0000 (14:42 +0800)]

target: use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Roland Dreier [Thu, 30 Aug 2012 23:28:30 +0000 (16:28 -0700)]

target/iscsi: Don't log "iSCSI Login negotiation failed." twice

There's no need for iscsi_target_init_negotiation() to print

iSCSI Login negotiation failed.

on failure, since its only caller (__iscsi_target_login_thread())
prints exactly the same message if it gets an error return back.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Nicholas Bellinger [Sun, 26 Aug 2012 20:35:58 +0000 (13:35 -0700)]

target: Drop se_subsystem_api->[write_cache,fua_write]_emulated flags

This patch drops se_subsystem_api->[write_cache,fua_write]_emulated flags
set by viritual FILEIO/IBLOCK/RD_MCP backend drivers in favor of explict
TRANSPORT_PLUGIN_PHBA_PDEV checks to know when to fail if userspace is
attempting to set virtual emulation bits for an pSCSI (passthrough)
backend device.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Wei Yongjun [Sun, 26 Aug 2012 00:38:22 +0000 (08:38 +0800)]

target: remove unused including <generated/utsrelease.h>

Remove including <generated/utsrelease.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Wei Yongjun [Sun, 26 Aug 2012 00:37:50 +0000 (08:37 +0800)]

tcm_fc: remove unused including <generated/utsrelease.h>

Remove including <generated/utsrelease.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Nicholas Bellinger [Thu, 23 Aug 2012 01:53:12 +0000 (18:53 -0700)]

target/rd: Allow WriteCacheEnabled=1 operation with rd_mcp backends

This patch adds the missing rd_mcp_template->write_cache_emulated=1 bit to
optionally allow WriteCacheEnabled=1 (WCE) to be enabled for the built-in
TCM/rd_mcp backend driver.

Tested on v3.6-rc[0,2] code with loopback+tcm_vhost fabric ports.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Nicholas Bellinger [Thu, 23 Aug 2012 01:45:11 +0000 (18:45 -0700)]

target/iblock: Use match_strlcpy for Opt_udev_path string assignment

Following commit dbc6e0222 from Al Viro for fileio, go ahead and make
Opt_udev_path within iblock_set_configfs_dev_params use match_strlcpy
instead of the match_strdup -> snprintf -> kfree equivalent.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Masanari Iida [Thu, 16 Aug 2012 13:43:13 +0000 (22:43 +0900)]

target: Fix minor spelling typos in drivers/target

Correct spelling typo in printk and comment within drivers/target.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Roland Dreier [Wed, 15 Aug 2012 21:35:25 +0000 (14:35 -0700)]

target: Simplify fabric sense data length handling

Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2.  However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().

So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core.  Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.

(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
      change transport_get_sense_buffer to follow v3.6-rc6 code w/o
      ->set_fabric_sense_len usage)

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Roland Dreier [Wed, 15 Aug 2012 21:35:24 +0000 (14:35 -0700)]

target: Remove unused target_core_fabric_ops.get_fabric_sense_len method

There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Andy Grover [Mon, 30 Jul 2012 22:54:18 +0000 (15:54 -0700)]

target/sbp: Remove strict param from sbp_parse_wwn

It's always set, and controls whether uppercase A-F are allowed hex values.
I don't see a reason not to accept these.

Signed-off-by: Andy Grover <agrover@redhat.com>
Cc: Chris Boot <bootc@bootc.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Andy Grover [Mon, 30 Jul 2012 22:54:17 +0000 (15:54 -0700)]

target: Cleanup transport_subsystem_check_init

Move static into function body from file scope.

Remove extraneous return statement

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Andy Grover [Mon, 30 Jul 2012 22:54:16 +0000 (15:54 -0700)]

target: Remove request_module for target_core_stgt

It is no longer a supported module.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

commit | commitdiff | tree

Linus Torvalds [Mon, 17 Sep 2012 23:05:23 +0000 (16:05 -0700)]

Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull another workqueue fix from Tejun Heo:
"Unfortunately, yet another late fix.  This too is discovered and fixed
  by Lai.  This bug was introduced during this merge window by commit
  25511a477657 ("workqueue: reimplement CPU online rebinding to handle
  idle workers") which started using WORKER_REBIND flag for idle rebind
  too.

  The bug is relatively easy to trigger if the CPU rapidly goes through
  off, on and then off (and stay off).  The fix is on the safer side.
  This hasn't been on linux-next yet but I'm pushing early so that it
  can get more exposure before v3.6 release."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()

commit | commitdiff | tree

Lai Jiangshan [Mon, 17 Sep 2012 22:42:31 +0000 (15:42 -0700)]

workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()

busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
(CPU is down again).  This used to be okay because the flag wasn't
used for anything else.

However, after 25511a477 "workqueue: reimplement CPU online rebinding
to handle idle workers", WORKER_REBIND is also used to command idle
workers to rebind.  If not cleared, the worker may confuse the next
CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
prematurely calling idle_worker_rebind().

  WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
00()
  Hardware name: Bochs
  Modules linked in: test_wq(O-)
  Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
  Call Trace:
   [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
   [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
   [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  ---[ end trace e977cf20f4661968 ]---
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: test_wq(O-)
  CPU 0
  Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
  RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
  RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
  R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
  FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
  Stack:
   ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
   ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
   ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
  Call Trace:
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
  RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
   RSP <ffff88001e1c9de0>
  CR2: 0000000000000000

There was no reason to keep WORKER_REBIND on failure in the first
place - WORKER_UNBOUND is guaranteed to be set in such cases
preventing incorrectly activating concurrency management.  Always
clear WORKER_REBIND.

tj: Updated comment and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Mon, 17 Sep 2012 22:01:14 +0000 (15:01 -0700)]

Merge branch 'akpm' (Andrew's patch-bomb)

Merge fixes from Andrew Morton:
"13 patches.  12 are fixes and one is a little preparatory thing for
  Andi."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits)
  memory hotplug: fix section info double registration bug
  mm/page_alloc: fix the page address of higher page's buddy calculation
  drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe
  compiler.h: add __visible
  pid-namespace: limit value of ns_last_pid to (0, max_pid)
  include/net/sock.h: squelch compiler warning in sk_rmem_schedule()
  slub: consider pfmemalloc_match() in get_partial_node()
  slab: fix starting index for finding another object
  slab: do ClearSlabPfmemalloc() for all pages of slab
  nbd: clear waiting_queue on shutdown
  MAINTAINERS: fix TXT maintainer list and source repo path
  mm/ia64: fix a memory block size bug
  memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails

commit | commitdiff | tree

qiuxishi [Mon, 17 Sep 2012 21:09:24 +0000 (14:09 -0700)]

memory hotplug: fix section info double registration bug

There may be a bug when registering section info.  For example, on my
Itanium platform, the pfn range of node0 includes the other nodes, so
other nodes' section info will be double registered, and memmap's page
count will equal to 3.

  node0: start_pfn=0x100,    spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
  node1: start_pfn=0x80000,  spanned_pfn=0x80000,  present_pfn=0x80000, => 0x080000-0x100000
  node2: start_pfn=0x100000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x100000-0x180000
  node3: start_pfn=0x180000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x180000-0x200000

  free_all_bootmem_node()
register_page_bootmem_info_node()
register_page_bootmem_info_section()

When hot remove memory, we can't free the memmap's page because
page_count() is 2 after put_page_bootmem().

  sparse_remove_one_section()
free_section_usemap()
free_map_bootmem()
put_page_bootmem()

[akpm@linux-foundation.org: add code comment]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Li Haifeng [Mon, 17 Sep 2012 21:09:21 +0000 (14:09 -0700)]

mm/page_alloc: fix the page address of higher page's buddy calculation

The heuristic method for buddy has been introduced since commit
43506fad21ca ("mm/page_alloc.c: simplify calculation of combined index
of adjacent buddy lists"). But the page address of higher page's buddy
was wrongly calculated, which will lead page_is_buddy to fail for ever.
IOW, the heuristic method would be disabled with the wrong page address
of higher page's buddy.

Calculating the page address of higher page's buddy should be based
higher_page with the offset between index of higher page and index of
higher page's buddy.

Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: <stable@vger.kernel.org> [2.6.38+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Kevin Hilman [Mon, 17 Sep 2012 21:09:17 +0000 (14:09 -0700)]

drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe

On some platforms, bootloaders are known to do some interesting RTC
programming.  Without going into the obscurities as to why this may be
the case, suffice it to say the the driver should not make any
assumptions about the state of the RTC when the driver loads.  In
particular, the driver probe should be sure that all interrupts are
disabled until otherwise programmed.

This was discovered when finding bursty I2C traffic every second on
Overo platforms.  This I2C overhead was keeping the SoC from hitting
deep power states.  The cause was found to be the RTC firing every
second on the I2C-connected TWL PMIC.

Special thanks to Felipe Balbi for suggesting to look for a rogue driver
as the source of the I2C traffic rather than the I2C driver itself.

Special thanks to Steve Sakoman for helping track down the source of the
continuous RTC interrups on the Overo boards.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Felipe Balbi <balbi@ti.com>
Tested-by: Steve Sakoman <steve@sakoman.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Andi Kleen [Mon, 17 Sep 2012 21:09:15 +0000 (14:09 -0700)]

compiler.h: add __visible

gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away. Add a __visible macro to
use it with that compiler version or later.

This is used (at least) by the "Link Time Optimization" patchset.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Andrew Vagin [Mon, 17 Sep 2012 21:09:12 +0000 (14:09 -0700)]

pid-namespace: limit value of ns_last_pid to (0, max_pid)

The kernel doesn't check the pid for negative values, so if you try to
write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic.

The crash happens because the next pid is -1, and alloc_pidmap() will
try to access to a nonexistent pidmap.

map = &pid_ns->pidmap[pid/BITS_PER_PAGE];

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Chuck Lever [Mon, 17 Sep 2012 21:09:11 +0000 (14:09 -0700)]

include/net/sock.h: squelch compiler warning in sk_rmem_schedule()

This warning:

  In file included from linux/include/linux/tcp.h:227:0,
                   from linux/include/linux/ipv6.h:221,
                   from linux/include/net/ipv6.h:16,
                   from linux/include/linux/sunrpc/clnt.h:26,
                   from linux/net/sunrpc/stats.c:22:
  linux/include/net/sock.h: In function `sk_rmem_schedule':
  linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.

Commit c76562b6709f ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int.  This changes the semantics of the comparison in the
return statement.

In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer.  In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.

Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Joonsoo Kim [Mon, 17 Sep 2012 21:09:09 +0000 (14:09 -0700)]

slub: consider pfmemalloc_match() in get_partial_node()

get_partial() is currently not checking pfmemalloc_match() meaning that
it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
This is a problem in the following situation.  Assume that there is a
request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects() is
called which may return a PFMEMALLOC page.  As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called
([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
checks]) and returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation,
slab_alloc() enters the slow-path and calls new_slab_objects().  In
new_slab_objects(), we call get_partial() and get a partial slab which
was just deactivated but is a pfmemalloc page.  We extract one object
from it and re-deactivate.

  "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial()
scenario.  Instead, new_slab() is called.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linux kernel for KaRo TX COM modules

RSS Atom