ocfs2: avoid access invalid address when read o2dlm debug messages
The following case will lead to a lockres is freed but is still in use.
cat /sys/kernel/debug/o2dlm/locking_state dlm_thread
lockres_seq_start
-> lock dlm->track_lock
-> get resA
resA->refs decrease to 0,
call dlm_lockres_release,
and wait for "cat" unlock.
Although resA->refs is already set to 0,
increase resA->refs, and then unlock
lock dlm->track_lock
-> list_del_init()
-> unlock
-> free resA
In such a race case, invalid address access may occurs. So we should
delete list res->tracking before resA->refs decrease to 0.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()
After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
jbd2_journal_restart() may also be called, in this function transaction
A's t_updates-- and obtains a new transaction B. If
jbd2_journal_commit_transaction() is happened to commit transaction A,
when t_updates==0, it will continue to complete commit and unfile buffer.
So when jbd2_journal_dirty_metadata(), the handle is pointed a new
transaction B, and the buffer head's journal head is already freed,
jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns
EINVAL, So it triggers the BUG_ON(status).
So I think we should put ocfs2_journal_access_di before
ocfs2_journal_dirty in the ocfs2_write_end. and it works well after my
modification.
Signed-off-by: vicky <vicky.yangwenfang@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tina Ruchandani [Tue, 7 Apr 2015 23:43:58 +0000 (09:43 +1000)]
ocfs2: use 64bit variables to track heartbeat time
o2hb_elapsed_msecs computes the time taken for a disk heartbeat. 'struct
timeval' variables are used to store start and end times. On 32-bit
systems, the 'tv_sec' component of 'struct timeval' will overflow in year
2038 and beyond.
This patch solves the overflow with the following:
1. Replace o2hb_elapsed_msecs using 'ktime_t' values to measure start
and end time, and built-in function 'ktime_ms_delta' to compute the
elapsed time. ktime_get_real() is used since the code prints out the
wallclock time.
2. Changes format string to print time as a single 64-bit nanoseconds
value ("%lld") instead of seconds and microseconds. This simplifies
the code since converting ktime_t to that format would need expensive
computation. However, the debug log string is less readable than the
previous format.
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: fix a tiny case that inode can not removed.
When running dirop_fileop_racer we found a case that inode
can not removed.
2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create
two dirs /race/1/ and /race/2/ in the filesystem.
Node A Node B
rm -r /race/2/
mv /race/1/ /race/2/
call ocfs2_unlink(), get
the EX mode of /race/2/
wait for B unlock /race/2/
decrease i_nlink of /race/2/ to 0,
and add inode of /race/2/ into
orphan dir, unlock /race/2/
got EX mode of /race/2/. because
/race/1/ is dir, so inc i_nlink
of /race/2/ and update into disk,
unlock /race/2/
because i_nlink of /race/2/
is not zero, this inode will
always remain in orphan dir
This patch fixes this case by test whether i_nlink of new dir is zero.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Caveat: This may return -EROFS for a read case, which seems wrong. This
is happening even without this patch series though. Should we convert
EROFS to EIO?
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
OCFS2 is often used in high-availaibility systems. However, ocfs2
converts the filesystem to read-only at the drop of the hat. This may not
be necessary, since turning the filesystem read-only would affect other
running processes as well, decreasing availability.
This attempt is to add errors=continue, which would return the EIO to the
calling process and terminate furhter processing so that the filesystem is
not corrupted further. However, the filesystem is not converted to
read-only.
As a future plan, I intend to create a small utility or extend fsck.ocfs2
to fix small errors such as in the inode. The input to the utility such
as the inode can come from the kernel logs so we don't have to schedule a
downtime for fixing small-enough errors.
The patch changes the ocfs2_error to return an error. The error returned
depends on the mount option set. If none is set, the default is to turn
the filesystem read-only.
Perhaps errors=continue is not the best option name. Historically it is
used for making an attempt to progress in the current process itself.
Should we call it errors=eio? or errors=killproc? Suggestions/Comments
welcome.
Sources are available at:
https://github.com/goldwynr/linux/tree/error-cont
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: flush inode data to disk and free inode when i_count becomes zero
Disk inode deletion may be heavily delayed when one node unlink a file
after the same dentry is freed on another node(say N1) because of memory
shrink but inode is left in memory. This inode can only be freed while N1
doing the orphan scan work.
However, N1 may skip orphan scan for several times because other nodes may
do the work earlier. In our tests, it may take 1 hour on 4 nodes cluster
and it hurts the user experience. So we think the inode should be freed
after the data flushed to disk when i_count becomes zero to avoid such
circumstances.
Signed-off-by: Joyce.xue <xuejiufei@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The trusted extended attributes are only visible to the process which hvae
CAP_SYS_ADMIN capability but the check is missing in ocfs2 xattr_handler
trusted list. The check is important because this will be used for
implementing mechanisms in the userspace for which other ordinary
processes should not have access to.
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Taesoo kim <taesoo@gatech.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Tue, 7 Apr 2015 23:43:56 +0000 (09:43 +1000)]
ocfs2: double evaluation concerns in mlog_errno()
It won't happen in real life, but for consistency etc then we should
only evaluate the "st" parameter once.
Also, since not all callers use the new return, it causes at static
checker warning:
fs/ocfs2/export.c:138 ocfs2_get_dentry() warn: unchecked 'PTR_ERR'
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Tue, 7 Apr 2015 23:43:56 +0000 (09:43 +1000)]
ocfs2: make mlog_errno return the errno
ocfs2 does
mlog_errno(v);
return v;
in many places. Change mlog_errno() so we can do
return mlog_errno(v);
For some weird reason this patch reduces the size of ocfs2 by 6k:
akpm3:/usr/src/25> size fs/ocfs2/ocfs2.ko
text data bss dec hex filename 1146613 82767 832192 2061572 1f7504 fs/ocfs2/ocfs2.ko-before 1140857 82767 832192 2055816 1f5e88 fs/ocfs2/ocfs2.ko-after
Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
alex chen [Tue, 7 Apr 2015 23:43:56 +0000 (09:43 +1000)]
ocfs2: check if the ocfs2 lock resource has been initialized before calling ocfs2_dlm_lock
If ocfs2 lockres has not been initialized before calling ocfs2_dlm_lock,
the lock won't be dropped and then will lead umount hung. The case is
described below:
ocfs2_mknod
ocfs2_mknod_locked
__ocfs2_mknod_locked
ocfs2_journal_access_di
Failed because of -ENOMEM or other reasons, the inode lockres
has not been initialized yet.
iput(inode)
ocfs2_evict_inode
ocfs2_delete_inode
ocfs2_inode_lock
ocfs2_inode_lock_full_nested
__ocfs2_cluster_lock
Succeeds and allocates a new dlm lockres.
ocfs2_clear_inode
ocfs2_open_unlock
ocfs2_drop_inode_locks
ocfs2_drop_lock
Since lockres has not been initialized, the lock
can't be dropped and the lockres can't be
migrated, thus umount will hang forever.
Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Tue, 7 Apr 2015 23:43:56 +0000 (09:43 +1000)]
ocfs2: logging: remove static buffer, use vsprintf extension %pV
Use the vsprintf %pV extension to avoid using a static buffer and remove
the now unnecessary buffer.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chengyu Song [Tue, 7 Apr 2015 23:43:55 +0000 (09:43 +1000)]
ocfs2: incorrect check for debugfs returns
debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs
is not configured, so the return value should be checked against
ERROR_VALUE as well, otherwise the later dereference of the dentry pointer
would crash the kernel.
This patch tries to solve this problem by fixing certain checks. However,
I have that found other call sites are protected by #ifdef CONFIG_DEBUG_FS.
In current implementation, if CONFIG_DEBUG_FS is defined, then the above
two functions will never return any ERROR_VALUE. So another possibility
to fix this is to surround all the buggy checks/functions with the same
#ifdef CONFIG_DEBUG_FS. But I'm not sure if this would break any functionality,
as only OCFS2_FS_STATS declares dependency on DEBUG_FS.
Signed-off-by: Chengyu Song <csong84@gatech.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:55 +0000 (09:43 +1000)]
ocfs2: fix possible uninitialized variable access
In ocfs2_local_alloc_find_clear_bits and ocfs2_get_dentry, variable
numfound and set may be uninitialized and then used in tracepoint. In
ocfs2_xattr_block_get and ocfs2_delete_xattr_in_bucket, variable block_off
and xv may be uninitialized and then used in the following logic due to
unchecked return value.
This patch fixes these possible issues.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: remove goto statement in ocfs2_check_dir_for_entry()
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:55 +0000 (09:43 +1000)]
ocfs2: rollback the cleared bits if error occurs after ocfs2_block_group_clear_bits
ocfs2_block_group_clear_bits will clear bits in block group bitmap.
Once it succeeds but fails in the following step, it will cause block
group bitmap mismatch the corresponding count recorded in dinode.
So rollback the cleared bits if error occurs.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:54 +0000 (09:43 +1000)]
ocfs2: use ENOENT instead of EEXIST when get system file fails
When ocfs2_get_system_file_inode fails, it is obscure to set the return
value to -EEXIST. So change it to -ENOENT.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:54 +0000 (09:43 +1000)]
ocfs2: use actual name length when find entry in ocfs2_orphan_del()
If the namelen is 20 and name only has actual length 16, it will fail in
ocfs2_find_entry because of mismatch. So use actual name length when find
entry.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Tue, 7 Apr 2015 23:43:54 +0000 (09:43 +1000)]
ocfs2: dereferencing freed pointers in ocfs2_reflink()
The code at the "out" label assumes that "default_acl" and "acl" are NULL,
but actually the pointers can be NULL, unitialized, or freed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:54 +0000 (09:43 +1000)]
ocfs2: fix typo in ocfs2_reserve_local_alloc_bits
In ocfs2_reserve_local_alloc_bits, it calls ocfs2_error if local alloc
inode bitmap used bits mismatch, but the log mistakes it as free bits.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:53 +0000 (09:43 +1000)]
ocfs2: do not use ocfs2_zero_extend during direct IO
In ocfs2_direct_IO_write, we use ocfs2_zero_extend to zero allocated
clusters in case of cluster not aligned. But ocfs2_zero_extend uses page
cache, this may happen that it clears the data which blockdev_direct_IO
has already written.
We should use blkdev_issue_zeroout instead of ocfs2_zero_extend during
direct IO.
So fix this issue by introducing ocfs2_direct_IO_zero_extend and
ocfs2_direct_IO_extend_no_holes.
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Tested-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:53 +0000 (09:43 +1000)]
ocfs2: take inode lock when get clusters
We need take inode lock when calling ocfs2_get_clusters.
And use GFP_NOFS instead of GFP_KERNEL.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:53 +0000 (09:43 +1000)]
ocfs2: no need get dinode bh when zeroing extend
Since di_bh won't be used when zeroing extend, set it to NULL.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Tue, 7 Apr 2015 23:43:52 +0000 (09:43 +1000)]
ocfs2: fix a typing error in ocfs2_direct_IO_write
Only when direct IO succeeds we need consider zeroing out in case of
cluster not aligned.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:52 +0000 (09:43 +1000)]
ocfs2: one function call less in user_cluster_connect() after error detection
kfree() was called by user_cluster_connect() even if a previous call of
the kzalloc() function failed.
Return from this implementation directly after failure detection.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:52 +0000 (09:43 +1000)]
ocfs2: one function call less in ocfs2_init_slot_info() after error detection
__ocfs2_free_slot_info() was called by ocfs2_init_slot_info() even if a
call of the kzalloc() function failed.
Return from this implementation directly after corresponding
exception handling.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:52 +0000 (09:43 +1000)]
ocfs2: one function call less in ocfs2_merge_rec_right() after error detection
ocfs2_free_path() was called by ocfs2_merge_rec_right() even if a call of
the ocfs2_get_right_path() function failed.
Return from this implementation directly after corresponding
exception handling.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:51 +0000 (09:43 +1000)]
ocfs2: one function call less in ocfs2_merge_rec_left() after error detection
ocfs2_free_path() was called by ocfs2_merge_rec_left() even if a call of
the ocfs2_get_left_path() function failed.
Return from this implementation directly after corresponding
exception handling.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:51 +0000 (09:43 +1000)]
ocfs2: less function calls in ocfs2_figure_merge_contig_type() after error detection
ocfs2_free_path() was called in some cases by
ocfs2_figure_merge_contig_type() during error handling even if the passed
variables "left_path" and "right_path" contained still a null pointer.
Corresponding implementation details could be improved by adjustments for
jump labels according to the current Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:51 +0000 (09:43 +1000)]
ocfs2: less function calls in ocfs2_convert_inline_data_to_extents() after error detection
kfree() was called in a few cases by
ocfs2_convert_inline_data_to_extents() during error handling even if the
passed variable "pages" contained a null pointer.
* Return from this implementation directly after failure detection for
the function call "kcalloc".
* Corresponding details could be improved by the introduction of another
jump label.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Elfring [Tue, 7 Apr 2015 23:43:51 +0000 (09:43 +1000)]
ocfs2: delete unnecessary checks before three function calls
kfree(), ocfs2_free_path() and __ocfs2_free_slot_info() test whether their
argument is NULL and then return immediately. Thus the test around their
calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Tue, 7 Apr 2015 23:43:51 +0000 (09:43 +1000)]
jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL
This basically reverts 47def82672b3 ("jbd2: Remove __GFP_NOFAIL from jbd2
layer"). The deprecation of __GFP_NOFAIL was a bad choice because it led
to open coding the endless loop around the allocator rather than removing
the dependency on the non failing allocation. So the deprecation was a
clear failure and the reality tells us that __GFP_NOFAIL is not even close
to go away.
It is still true that __GFP_NOFAIL allocations are generally discouraged
and new uses should be evaluated and an alternative (pre-allocations or
reservations) should be considered but it doesn't make any sense to lie
the allocator about the requirements. Allocator can take steps to help
making a progress if it knows the requirements.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mel Gorman <mgorman@suse.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vipul Pandya <vipul@chelsio.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/ext4/fsync.c: generic_file_fsync call based on barrier flag
generic_file_fsync has been updated to issue a flush for older
filesystems.
This patch tests for barrier flag in ext4 mount flags and calls the right
function.
Lukas said:
: Note that the actual generic_file_fsync change fixes a real bug in ext4
: where we would _not_ send a flush on sync if we have file system
: without journal.
:
: Ted, it would be useful to mention that in the commit description
: along with the commit id:
:
: ac13a829f6adb674015ab399594c089990104af7 fs/libfs.c: add generic
: data flush to fsync
Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Hocko [Tue, 7 Apr 2015 23:43:50 +0000 (09:43 +1000)]
cxgb4: drop __GFP_NOFAIL allocation
set_filter_wr is requesting __GFP_NOFAIL allocation although it can return
ENOMEM without any problems obviously (t4_l2t_set_switching does that
already). So the non-failing requirement is too strong without any
obvious reason. Drop __GFP_NOFAIL and reorganize the code to have the
failure paths easier.
The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL
and then checks the return value and returns -ENOMEM on failure. This
doesn't make any sense what so ever. Either the allocation cannot fail or
it can.
del_filter_wr seems to be safe as well because the filter entry is not
marked as pending and the return value is propagated up the stack up to
c4iw_destroy_listen.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mel Gorman <mgorman@suse.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ptrace/x86: fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal, handle_signal() clears
TIF_FORCED_TF and X86_EFLAGS_TF but leaves TIF_SINGLESTEP set.
If the tracer does PTRACE_SINGLESTEP again, enable_single_step() sets
X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the subsequent
PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the tracee gets the wrong
SIGTRAP.
Test-case (needs -O2 to avoid prologue insns in signal handler):
The last assert() fails because PTRACE_CONT wrongly triggers another
single-step and X86_EFLAGS_TF can't be cleared by debugger until the
tracee does sys_rt_sigreturn().
Change handle_signal() to do user_disable_single_step() if stepping, we do
not need to preserve TIF_SINGLESTEP because we are going to do
ptrace_notify(), and it is simply wrong to leak this bit.
While at it, change the comment to explain why we also need to clear TF
unconditionally after setup_rt_frame().
Note: in the longer term we should probably change setup_sigcontext() to
use get_flags() and then just remove this user_disable_single_step().
And, the state of TIF_FORCED_TF can be wrong after restore_sigcontext()
which can set/clear TF, this needs another fix.
Reported-by: Evan Teran <eteran@alum.rit.edu> Reported-by: Pedro Alves <palves@redhat.com> Tested-by: Andres Freund <andres@anarazel.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: numa: disable change protection for vma(VM_HUGETLB)
Currently when a process accesses a hugetlb range protected with PROTNONE,
unexpected COWs are triggered, which finally puts the hugetlb subsystem
into a broken/uncontrollable state, where for example h->resv_huge_pages
is subtracted too much and wraps around to a very large number, and the
free hugepage pool is no longer maintainable.
This patch simply stops changing protection for vma(VM_HUGETLB) to fix the
problem. And this also allows us to avoid useless overhead of minor
faults.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Suggested-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: move zone lock to a different cache line than order-0 free page lists
Huang Ying reported the following problem due to commit 3484b2de9499 ("mm:
rearrange zone fields into read-only, page alloc, statistics and page
reclaim lines") from the Intel performance tests
The problem is specific to very large machines under stress. It was not
reproducible with the machines I had used to justify the original patch
because large numbers of CPUs are required. When pressure is high enough,
the cache line is bouncing between CPUs trying to acquire the lock and the
holder of the lock adjusting free lists. The intention was that the
acquirer of the lock would automatically have the cache line holding the
free lists but according to Huang, this is not a universal win.
One possibility is to move the zone lock to its own cache line but it
increases the size of the zone. This patch moves the lock to the other
end of the free lists where they do not contend under high pressure. It
does mean the page allocator paths now require more cache lines but Huang
reports that it restores performance to previous levels on large machines
1) In TCP, don't register an FRTO for cumulatively ACK'd data that was
previously SACK'd, from Neal Cardwell.
2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup,
from Cong WANG.
3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also
from Cong WANG.
4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel.
5) When we encapsulate for a tunnel device, skb->sk still points to the
user socket. So this leads to cases where we retraverse the
ipv4/ipv6 output path with skb->sk being of some other address
family (f.e. AF_PACKET). This can cause things to crash since the
ipv4 output path is dereferencing an AF_PACKET socket as if it were
an ipv4 one.
The short term fix for 'net' and -stable is to elide these socket
checks once we've entered an encapsulation sequence by testing
xmit_recursion.
Longer term we have a better solution wherein we pass the tunnel's
socket down through the output paths, but that is way too invasive
for 'net' and -stable.
From Hannes Frederic Sowa.
6) l2tp_init() failure path forgets to unregister per-net ops, from
Cong WANG.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
net: dsa: fix filling routing table from OF description
l2tp: unregister l2tp_net_ops on failure path
mvneta: dont call mvneta_adjust_link() manually
ipv6: protect skb->sk accesses from recursive dereference inside the stack
netns: don't allocate an id for dead netns
Revert "netns: don't clear nsid too early on removal"
ip6mr: call del_timer_sync() in ip6mr_free_table()
net: move fib_rules_unregister() under rtnl lock
ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup
tcp: fix FRTO undo on cumulative ACK of SACKed range
xen-netfront: transmit fully GSO-sized packets
net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.
Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: fix filling routing table from OF description
According to description in 'include/net/dsa.h', in cascade switches
configurations where there are more than one interconnected devices,
'rtable' array in 'dsa_chip_data' structure is used to indicate which
port on this switch should be used to send packets to that are destined
for corresponding switch.
However, dsa_of_setup_routing_table() fills 'rtable' with port numbers
of the _target_ switch, but not current one.
This commit removes redundant devicetree parsing and adds needed port
number as a function argument. So dsa_of_setup_routing_table() now just
looks for target switch number by parsing parent of 'link' device node.
To remove possible misunderstandings with the way of determining target
switch number, a corresponding comment was added to the source code and
to the DSA device tree bindings documentation file.
This was tested on a custom board with two Marvell 88E6095 switches with
following corresponding routing tables: { -1, 10 } and { 8, -1 }.
Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Updates for the input subsystem - two more tweaks for ALPS driver to
work out kinks after splitting the touchpad, trackstick, and potential
external PS/2 mouse into separate input devices.
Changes to support ALPS SS4 devices (protocol V8) will be coming in
4.1..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: alps - document stick behavior for protocol V2
Input: alps - report V2 Dualpoint Stick events via the right evdev node
Input: alps - report interleaved bare PS/2 packets via dev3
ipv6: protect skb->sk accesses from recursive dereference inside the stack
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Sat, 4 Apr 2015 00:20:05 +0000 (17:20 -0700)]
Input: alps - report V2 Dualpoint Stick events via the right evdev node
On V2 devices the DualPoint Stick reports bare packets, these should be
reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also
has the INPUT_PROP_POINTING_STICK propbit set.
Note that since there is no way to distinguish these packets from an external
PS/2 mouse (insofar as these laptops have an external PS/2 port) this means
that we will be reporting PS/2 mouse events via this evdev node too, as we've
been doing in kernel 3.19 and older.
This has been tested on a Dell Latitude D620 and a Dell Latitude E6400,
which both have a V2 touchpad + a DualPoint Stick which reports bare packets.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Hans de Goede [Sat, 4 Apr 2015 00:14:40 +0000 (17:14 -0700)]
Input: alps - report interleaved bare PS/2 packets via dev3
Bare packets should be reported via the same evdev device independent on
whether they are detected on the beginning of a packet or in the middle
of a packet.
This has been tested on a Dell Latitude E6400, where the DualPoint Stick
reports bare packets, which get reported via dev3 when the touchpad is
idle, and via dev2 when the touchpad and stick are used simultaneously.
This commit fixes this inconsistency by always reporting bare packets via
dev3. Note that since the come from a DualPoint Stick they really should be
reported via dev2, this gets fixed in a later commit.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Merge tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.0-rc6. Nothing
major, some xhci fixes for reported problems, and some usb-serial
device ids.
All have been in linux-next for a while"
* tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: ftdi_sio: Use jtag quirk for SNAP Connect E10
usb: isp1760: fix spin unlock in the error path of isp1760_udc_start
usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
usb: xhci: handle Config Error Change (CEC) in xhci driver
USB: keyspan_pda: add new device id
USB: ftdi_sio: Added custom PID for Synapse Wireless product
Merge tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some staging driver fixes, well, really all just IIO driver
fixes, for 4.0-rc6. They fix issues that have been reported with
these drivers.
All of these patches have been in linux-next for a while"
* tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: imu: Use iio_trigger_get for indio_dev->trig assignment
iio: adc: vf610: use ADC clock within specification
iio/adc/cc10001_adc.c: Fix !HAS_IOMEM build
iio: core: Fix double free.
iio:inv-mpu6050: Fix inconsistency for the scale channel
staging: iio: dummy: Fix undefined symbol build error
iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo
staging: iio: hmc5843: Set iio name property in sysfs
iio: bmc150: change sampling frequency
iio: fix drivers that check buffer->scan_mask
Merge tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are 3 serial driver fixes for 4.0-rc6. They fix some reported
issues with the samsung and fsl_lpuart drivers.
All have been in linux-next for a while"
* tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fsl_lpuart: clear receive flag on FIFO flush
tty: serial: fsl_lpuart: specify transmit FIFO size
serial: samsung: Clear operation mode on UART shutdown
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"A fix for ALPS driver for issue introduced in the latest update and a
tweak for yet another Lenovo box in Synaptics.
There will be more ALPS tweaks coming.."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: define INPUT_PROP_ACCELEROMETER behavior
Input: synaptics - fix min-max quirk value for E440
Input: synaptics - add quirk for Thinkpad E440
Input: ALPS - fix max coordinates for v5 and v7 protocols
Input: add MT_TOOL_PALM
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization
fix, a reboot quirk and a kgdb fixlet"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
MAINTAINERS: Change the x86 microcode loader maintainer
firmware: dmi_scan: Prevent dmi_num integer overflow
Nicolas Dichtel [Fri, 3 Apr 2015 10:02:37 +0000 (12:02 +0200)]
netns: don't allocate an id for dead netns
First, let's explain the problem.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
- the bar nsid into the netns foo is removed
- the netns exit method of ipip is called, thus our ipip iface is removed:
=> a netlink message is built in the netns foo to advertise this deletion
=> this netlink message requests an nsid for bar, thus a new nsid is
allocated for bar and never removed.
This patch adds a check in peernet2id() so that an id cannot be allocated for
a netns which is currently destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 31 Mar 2015 18:01:47 +0000 (11:01 -0700)]
ip6mr: call del_timer_sync() in ip6mr_free_table()
We need to wait for the flying timers, since we
are going to free the mrtable right after it.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Note, net->rules_mod_lock is actually not needed at all,
either upper layer netns code or rtnl lock guarantees
we are safe.
Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 31 Mar 2015 18:01:45 +0000 (11:01 -0700)]
ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup
This is the IPv4 part for commit 905a6f96a1b1
(ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup).
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One drm core fix, one exynos regression fix, two sets of radeon fixes
(Alex was a bit behind last week), and two i915 fixes.
Nothing too serious we seem to have calmed down i915 since last week"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
drm: Exynos: Respect framebuffer pitch for FIMD/Mixer
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches
drm/radeon: programm the VCE fw BAR as well
drm/radeon: always dump the ring content if it's available
radeon: Do not directly dereference pointers to BIOS area.
drm/radeon/dpm: fix 120hz handling harder
drm/edid: set ELD for firmware and debugfs override EDIDs
Merge tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux
Pull irqchip fixes from Jason Cooper:
"This is the second round of fixes for irqchip. It contains some fixes
found while the arm64 guys were writing the kvm gicv3 its emulation.
GICv3 ITS:
- Small batch of fixes discovered while writing the kvm ITS emulation"
* tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux:
irqchip: gicv3-its: Use non-cacheable accesses when no shareability
irqchip: gicv3-its: Fix PROP/PEND and BASE/CBASE confusion
irqchip: gicv3-its: Fix device ID encoding
irqchip: gicv3-its: Fix encoding of collection's target redistributor
Dave Airlie [Thu, 2 Apr 2015 23:28:55 +0000 (09:28 +1000)]
Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just two small fixes for radeon, both destined for stable.
* 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
Dave Airlie [Thu, 2 Apr 2015 23:27:48 +0000 (09:27 +1000)]
Merge tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes
one oops fixes and a 0-length allocation fix from next backported.
* tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches
Merge tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen regression fixes from David Vrabel:
"Fix two regressions in the balloon driver's use of memory hotplug when
used in a PV guest"
* tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: before adding hotplugged memory, set frames to invalid
x86/xen: prepare p2m list for memory hotplug
tcp: fix FRTO undo on cumulative ACK of SACKed range
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma fix from Roland Dreier:
"Fix for exploitable integer overflow in uverbs interface"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
Jonathan Davies [Tue, 31 Mar 2015 10:05:15 +0000 (11:05 +0100)]
xen-netfront: transmit fully GSO-sized packets
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This time we have addition of caps for jz4740 which fixes intentional
warning at boot. Then we have memory leak issues in drivers using
virt-dma by Peter on few drive"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: moxart-dma: Fix memory leak when stopping a running transfer
dmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer
dmaengine: omap-dma: Fix memory leak when terminating running transfer
dmaengine: edma: fix memory leak when terminating running transfers
dmaengine: jz4740: Define capabilities
1) Fix use-after-free with mac80211 RX A-MPDU reorder timer, from
Johannes Berg.
2) iwlwifi leaks memory every module load/unload cycles, fix from Larry
Finger.
3) Need to use for_each_netdev_safe() in rtnl_group_changelink()
otherwise we can crash, from WANG Cong.
4) mlx4 driver does register_netdev() too early in the probe sequence,
from Ido Shamay.
5) Don't allow router discovery hop limit to decrease the interface's
hop limit, from D.S. Ljungmark.
6) tx_packets and tx_bytes improperly accounted for certain classes of
USB network devices, fix from Ben Hutchings.
7) ip{6}mr_rules_init() mistakenly use plain kfree to release the ipmr
tables in the error path, they must instead use ip{6}mr_free_table().
Fix from WANG Cong.
8) cxgb4 doesn't properly quiesce all RX activity before unregistering
the netdevice. Fix from Hariprasad Shenai.
9) Fix hash corruptions in ipvlan driver, from Jiri Benc.
10) nla_memcpy(), like a real memcpy, should fully initialize the
destination buffer, even if the source attribute is smaller. Fix
from Jiri Benc.
11) Fix wrong error code returned from iucv_sock_sendmsg(). We should
use whatever sock_alloc_send_skb() put into 'err'. From Eugene
Crosser.
12) Fix slab object leak on module unload in TIPC, from Ying Xue.
13) Need a READ_ONCE() when reading the cached RX socket route in
tcp_v{4,6}_early_demux(). From Michal Kubecek.
14) Still too many problems with TPC support in the ath9k driver, so
disable it for now. From Felix Fietkau.
15) When in AP mode the rtlwifi driver can leak DMA mappings, fix from
Larry Finger.
16) Missing kzalloc() failure check in gs_usb CAN driver, from Colin Ian
King.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
cxgb4: Fix to dump devlog, even if FW is crashed
cxgb4: Firmware macro changes for fw verison 1.13.32.0
bnx2x: Fix kdump when iommu=on
bnx2x: Fix kdump on 4-port device
mac80211: fix RX A-MPDU session reorder timer deletion
MAINTAINERS: Update Intel Wired Ethernet Driver info
tipc: fix a slab object leak
net/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet
af_iucv: fix AF_IUCV sendmsg() errno
openvswitch: Return vport module ref before destruction
netlink: pad nla_memcpy dest buffer with zeroes
bonding: Bonding Overriding Configuration logic restored.
ipvlan: fix check for IP addresses in control path
ipvlan: do not use rcu operations for address list
ipvlan: protect against concurrent link removal
ipvlan: fix addr hash list corruption
net: fec: setup right value for mdio hold time
net: tcp6: fix double call of tcp_v6_fill_cb()
cxgb4vf: Fix sparse warnings
netns: don't clear nsid too early on removal
...
Shachar Raindel [Wed, 18 Mar 2015 17:39:08 +0000 (17:39 +0000)]
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.
Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.
This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.
Addresses: CVE-2014-8159 Cc: <stable@vger.kernel.org> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Some of the CYCLE_ACTIVITY.* events can only be scheduled on
counter 2. Due to a typo Haswell matched those with
INTEL_EVENT_CONSTRAINT, which lead to the events never
matching as the comparison does not expect anything
in the umask too. Fix the typo.
Kan Liang [Fri, 27 Mar 2015 14:38:25 +0000 (10:38 -0400)]
perf/x86/intel: Filter branches for PEBS event
For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e88630 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.
However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.
This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().
We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().
Daniel Stone [Tue, 17 Mar 2015 13:24:58 +0000 (13:24 +0000)]
drm: Exynos: Respect framebuffer pitch for FIMD/Mixer
When performing a modeset, use the framebuffer pitch value to set FIMD
IMG_SIZE and Mixer SPAN registers. These are both defined as pitch - the
distance between contiguous lines (bytes for FIMD, pixels for mixer).
Fixes display on Snow (1366x768).
Signed-off-by: Daniel Stone <daniels@collabora.com> Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andy Lutomirski [Wed, 1 Apr 2015 21:26:34 +0000 (14:26 -0700)]
x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
When I wrote the opportunistic SYSRET code, I missed an important difference
between SYSRET and IRET.
Both instructions are capable of setting EFLAGS.TF, but they behave differently
when doing so:
- IRET will not issue a #DB trap after execution when it sets TF.
This is critical -- otherwise you'd never be able to make forward progress when
returning to userspace.
- SYSRET, on the other hand, will trap with #DB immediately after
returning to CPL3, and the next instruction will never execute.
This breaks anything that opportunistically SYSRETs to a user
context with TF set. For example, running this code with TF set
and a SIGTRAP handler loaded never gets past 'post_nop':
Ville Syrjälä [Fri, 27 Mar 2015 17:59:40 +0000 (19:59 +0200)]
drm/i915: Reject the colorkey ioctls for primary and cursor planes
The legcy colorkey ioctls are only implemented for sprite planes, so
reject the ioctl for primary/cursor planes. If we want to support
colorkeying with these planes (assuming we have hw support of course)
we should just move ahead with the colorkey property conversion.
Testcase: kms_legacy_colorkey Cc: Tommi Rantala <tt.rantala@gmail.com> Cc: stable@vger.kernel.org
Reference: http://mid.gmane.org/CA+ydwtr+bCo7LJ44JFmUkVRx144UDFgOS+aJTfK6KHtvBDVuAw@mail.gmail.com Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
David S. Miller [Wed, 1 Apr 2015 18:47:21 +0000 (14:47 -0400)]
Merge branch 'cxgb4-net'
Hariprasad Shenai says:
====================
cxgb4 FW macro changes for new FW
Fix to dump device log even in the case of firmware crash. Also
incorporates changes for new FW.
This patch series has been created against net tree and includes patches on
cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new Common Code routines to retrieve Firmware Device Log
parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very
early on and stores the parameters for its location/size in that register.
Using the parameters from the register allows us to access the Firmware
Device Log even when the firmware crashes very early on or we're not
attached to the firmware
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Apr 2015 18:19:22 +0000 (14:19 -0400)]
Merge tag 'mac80211-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This contains just a single fix for a crash I happened to randomly
run into today during testing. It's clearly been around for a while,
but is pretty hard to trigger, even when I tried explicitly (and
modified the code to make it more likely) it rarely did.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull lazytime fixes from Ted Ts'o:
"This fixes a problem in the lazy time patches, which can cause
frequently updated inods to never have their timestamps updated.
These changes guarantee that no timestamp on disk will be stale by
more than 24 hours"
* tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs: add dirtytime_expire_seconds sysctl
fs: make sure the timestamps for lazytime inodes eventually get written
Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Two main issues:
- We found that turning on pNFS by default (when it's configured at
build time) was too aggressive, so we want to switch the default
before the 4.0 release.
- Recent client changes to increase open parallelism uncovered a
serious bug lurking in the server's open code.
Also fix a krb5/selinux regression.
The rest is mainly smaller pNFS fixes"
* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
sunrpc: make debugfs file creation failure non-fatal
nfsd: require an explicit option to enable pNFS
NFSD: Fix bad update of layout in nfsd4_return_file_layout
NFSD: Take care the return value from nfsd4_encode_stateid
NFSD: Printk blocklayout length and offset as format 0x%llx
nfsd: return correct lockowner when there is a race on hash insert
nfsd: return correct openowner when there is a race to put one in the hash
NFSD: Put exports after nfsd4_layout_verify fail
NFSD: Error out when register_shrinker() fail
NFSD: Take care the return value from nfsd4_decode_stateid
NFSD: Check layout type when returning client layouts
NFSD: restore trace event lost in mismerge