get_event_name uses sprintf to fill a buffer declared on the stack. It fills
the buffer 2 bytes at a time. What the code doesn't take into account is that
sprintf(buf, "%02x", data) actually writes 3 bytes. 2 bytes for the data and
then it nul terminates the string. Since we declare buf to be 40 characters
long and then we write 40 bytes of data into buf sprintf is going to write 41
characters. The fix is to leave room in buf for the nul terminator.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This situation can occur when attempting to attach a block device whose
backend is an empty physical CD-ROM driver. The backend in this case
will go directly from the Initialising state to Closing->Closed.
Previously this would result in a NULL pointer deref on info->gd
(xenbus_dev_fatal does not return as a1a15ac5 seems to expect)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The futex code installs a read only mapping via get_user_pages_fast()
even if the futex op function has to modify user space data. The
eventual fault was fixed up by futex_handle_fault() which walked the
VMA with mmap_sem held.
After the cleanup patches which removed the mmap_sem dependency of the
futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
clean up fault logic) removed the private VMA walk logic from the
futex code. This change results in a stale RO mapping which is not
fixed up.
Instead of reintroducing the previous fault logic we set up the
mapping in get_user_pages_fast() read/write for all operations which
modify user space data. Also handle private futexes in the same way
and make the current unconditional access_ok(VERIFY_WRITE) depend on
the futex op.
Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The problem is that permission checking is skipped if atomic open is
possible, but when exec opens a file, it just opens it O_READONLY which
means EXEC permission will not be checked at that time.
This problem is observed by the following sequence (executed as root):
mount -t nfs4 server:/ /mnt4
echo "ls" >/mnt4/foo
chmod 744 /mnt4/foo
su guest -c "mnt4/foo"
When sending a message to user space using wimax_msg(), if nla_put()
fails, correctly interpret the return code from wimax_msg_alloc() as
an err ptr and return the error code instead of crashing (as it is
assuming than non-NULL means the pointer is ok).
Commit c45d6320 ("fix reference counting of ftdi_private") stopped
ftdi_sio_port_remove() from directly freeing the port-private data, with
the intention if the port was still open, it would be freed when
ftdi_close() is eventually called and releases the last refcount on the
structure.
That's all very well, but ftdi_sio_port_remove() still contains a call
to usb_set_serial_port_data(port, NULL) -- so by the time we get to
ftdi_close() for the port which was unplugged, it _still_ oopses on
dereferencing that NULL pointer, as it did before (and does in 2.6.29).
The fix is just not to clear the private data in ftdi_sio_port_remove().
Then the refcount is properly reduced to zero when the final kref_put()
happens in ftdi_close().
Remove a bogus comment too, while we're at it. And stop doing things
inside "if (priv)" -- it must _always_ be there.
Based loosely on an earlier patch by Daniel Mack, and suggestions by
Alan Stern.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Tested-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If there is a dummy "espdma" or "ledma" parent device above ESP scsi
or LE ethernet device nodes, we have to match the bus as SBUS.
Otherwise the address and size cell counts are wrong and we don't
calculate the final physical device resource values correctly at all.
Commit 5280267c1dddb8d413595b87dc406624bb497946 ("sparc: Fix handling
of LANCE and ESP parent nodes in of_device.c") was meant to fix this
problem, but that only influences the inner loop of
build_device_resources(). We need this logic to also kick in at the
beginning of build_device_resources() as well, when we make the first
attempt to determine the device's immediate parent bus type for 'reg'
property element extraction.
Based almost entirely upon a patch by Friedrich Oslage.
Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The 8169 chip only generates MSI interrupts when all enabled event
sources are quiescent and one or more sources transition to active. If
not all of the active events are acknowledged, or a new event becomes
active while the existing ones are cleared in the handler, we will not
see a new interrupt.
The current interrupt handler masks off the Rx and Tx events once the
NAPI handler has been scheduled, which opens a race window in which we
can get another Rx or Tx event and never ACK'ing it, stopping all
activity until the link is reset (ifconfig down/up). Fix this by always
ACK'ing all event sources, and loop in the handler until we have all
sources quiescent.
Signed-off-by: David Dillow <dave@thedillows.org> Tested-by: Michael Buesch <mb@bu3sch.de> Tested-by: Michael Riepe <michael.riepe@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix locking issue in alb MAC address management; removed
incorrect locking and replaced with correct locking. This bug was
introduced in commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0
("bonding: Convert locks to _bh, rework alb locking for new locking")
Bug reported by Paul Smith <paul@mad-scientist.net>, who also
tested the fix.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changeset ca17584bf2ad1b1e37a5c0e4386728cc5fc9dabc ("mac8390: update
to net_device_ops") broke mac8390 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in mac8390.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c. They seem to be of no value since
COMPAT_NET_DEV_OPS is going away soon.
Tested with a Kinetics EtherPort card.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Check whether the underlying device provides a set of ethtool ops before
checking for individual handlers to avoid NULL pointer dereferences.
Reported-by: Art van Breemen <ard@telegraafnet.nl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.
Reported-By: Vladimir Ivashchenko <hazard@francoudi.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :
begin_of_quotation
2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
patch has at least one critical flaw, and another problem.
rt_intern_hash calculates rthi pointer, which is later used for new entry
insertion. The same loop calculates cand pointer which is used to clean the
list. If the pointers are the same, rtable leak occurs, as first the cand is
removed then the new entry is appended to it.
This leak leads to unregister_netdevice problem (usage count > 0).
Another problem of the patch is that it tries to insert the entries in certain
order, to facilitate counting of entries distinct by all but QoS parameters.
Unfortunately, referencing an existing rtable entry moves it to list beginning,
to speed up further lookups, so the carefully built order is destroyed.
For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
it will also destroy the ordering.
end_of_quotation
Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.
A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.
Signed-off-by: Thomas Chenault <thomas_chenault@dell.com> Tested-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A long-standing feature in tcp_init_metrics() is such that
any of its goto reset prevents call to tcp_init_cwnd().
Signed-off-by: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.
I tried my best to deal with urg hole madness too which happens
here:
if (!sock_flag(sk, SOCK_URGINLINE)) {
++*seq;
...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Frans Pop <elendil@planet.nl> Tested-by: Ian Zimmermann <itz@buug.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When kernel inserts a temporary SA for IKE, it uses the wrong hash
value for dst list. Two hash values were calcultated before: one with
source address and one with a wildcard source address.
Bug hinted by Junwei Zhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The MPC5200 PSC device is wired up to a dedicated interrupt line
which is never shared. This patch removes the IRQF_SHARED flag
from the request_irq() call which eliminates the "IRQF_DISABLED
is not guaranteed on shared IRQs" warning message from the console
output.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes an invalid pointer access in case the receive queue
holds no pointer to the next skb when the queue is empty.
Signed-off-by: Hannes Hering <hering2@de.ibm.com> Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.
Rearrange locking of i_mutex on destination so it's only held while
buffers are copied with the pipe_to_file() actor, and not while
waiting for more data on the pipe.
splice_from_pipe_next() will wait (if necessary) for more buffers to
be added to the pipe. splice_from_pipe_feed() will feed the buffers
to the supplied actor and return when there's no more data available
(or if all of the requested data has been copied).
This is necessary so that implementations can do locking around the
non-waiting splice_from_pipe_feed().
This patch should not cause any change in behavior.
This was an omission from commit 26c3679101dbccc054dcf370143941844ba70531
"fuse: destroy bdi on umount", which moved the bdi_destroy() call from
fuse_conn_put() to fuse_put_super().
KVM optimizes guest port 80 accesses by passthing them through to the host.
Some AMD machines die on port 80 writes, allowing the guest to hard-lock the
host.
Remove the port passthrough to avoid the problem.
Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Tested-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1240) adds the NOGET quirk for three devices from CH
Products: the Pro pedals, the Combatstick joystick, and the Flight-Sim
yoke. Without these quirks, the devices haven't worked for many
kernel releases. Sometimes replugging them after boot-up would get
them to work and sometimes they wouldn't work at all.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Sean Hildebrand <silverwraithii@gmail.com> Reported-by: Sid Boyce <sboyce@blueyonder.co.uk> Tested-by: Sean Hildebrand <silverwraithii@gmail.com> Tested-by: Sid Boyce <sboyce@blueyonder.co.uk> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The check for reaching max_channels is short circuited by 'continuing'
after successfully adding a channel.
[ Impact: make the 'max_channels' module parameter actually have an effect ]
Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry. Check for this case to avoid a NULL dereference.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a size check WRT the manual pages. This was inadvertently broken by
commit 9fe5ad9c8cef9ad5873d8ee55d1cf00d9b607df0 ("flag parameters
add-on: remove epoll_create size param").
When multiply mounting from the same client to the same server, with
different userids, we create a vcnum which should be unique if
possible (this is not the same as the smb uid, which is the handle
to the security context). We were not endian converting additional
(beyond the first which is zero) vcnum properly.
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit c2ec175c39f62949438354f603f4aa170846aabb ("mm: page_mkwrite
change prototype to match fault") exposed a bug in the NFS
implementation of page_mkwrite. We should be returning 0 on success...
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty. This allows the filesystem to have
full control of page dirtying events coming from the VM.
Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.
The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite. It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).
In this window, the VM could write out the page, clearing page-dirty. The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.
It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail. The
filesystem may need to allocate some data structure, for example.
And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.
This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window. This provides the filesystem with a strong
synchronisation against the VM here.
- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
under dirty pages might be susceptible to i_size changing under partial page
at the end of file (we also have a buffer.c-side problem here, but it cannot
be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
page_mkwrite functions themselves.
[ This also moves page_mkwrite another step closer to fault, which should
eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
filesystem calldown and page lock/unlock cycle in __do_fault. ]
[akpm@linux-foundation.org: fix derefs of NULL ->mapping] Cc: Sage Weil <sage@newdream.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
page_mkwrite is called with neither the page lock nor the ptl held. This
means a page can be concurrently truncated or invalidated out from
underneath it. Callers are supposed to prevent truncate races themselves,
however previously the only thing they can do in case they hit one is to
raise a SIGBUS. A sigbus is wrong for the case that the page has been
invalidated or truncated within i_size (eg. hole punched). Callers may
also have to perform memory allocations in this path, where again, SIGBUS
would be wrong.
The previous patch ("mm: page_mkwrite change prototype to match fault")
made it possible to properly specify errors. Convert the generic buffer.c
code and btrfs to return sane error values (in the case of page removed
from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit
without doing anything, and the fault will be retried properly).
This fixes core code, and converts btrfs as a template/example. All other
filesystems defining their own page_mkwrite should be fixed in a similar
manner.
Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags. There should be no functional change.
This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg. virtual_address to the
driver, which might be important in some special cases).
This is required for a subsequent fix. And will also make it easier to
merge page_mkwrite() with fault() in future.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cifs: fix unicode string area word alignment in session setup
The handling of unicode string area alignment is wrong.
decode_unicode_ssetup improperly assumes that it will always be preceded
by a pad byte. This isn't the case if the string area is already
word-aligned.
This problem, combined with the bad buffer sizing for the serverDomain
string can cause memory corruption. The bad alignment can make it so
that the alignment of the characters is off. This can make them
translate to characters that are greater than 2 bytes each. If this
happens we can overflow the allocation.
Fix this by fixing the alignment in CIFS_SessSetup instead so we can
verify it against the head of the response. Also, clean up the
workaround for improperly terminated strings by checking for a
odd-length unicode buffers and then forcibly terminating them.
Finally, resize the buffer for serverDomain. Now that we've fixed
the alignment, it's probably fine, but a malicious server could
overflow it.
A better solution for handling these strings is still needed, but
this should be a suitable bandaid.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cifs: Rename cifs_strncpy_to_host and fix buffer size
There is a possibility for the path_name and node_name buffers to
overflow if they contain charcters that are >2 bytes in the local
charset. Resize the buffer allocation so to avoid this possibility.
Also, as pointed out by Jeff Layton, it would be appropriate to
rename the function to cifs_strlcpy_to_host to reflect the fact
that the copied string is always NULL terminated.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows
Increase size of tmp_buf to possible maximum to avoid potential
overflows. Also moved UNICODE_NAME_MAX definition so that it can be used
elsewhere.
Pointed-out-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cifs: fix buffer size for tcon->nativeFileSystem field
The buffer for this was resized recently to fix a bug. It's still
possible however that a malicious server could overflow this field
by sending characters in it that are >2 bytes in the local charset.
Double the size of the buffer to account for this possibility.
Also get rid of some really strange and seemingly pointless NULL
termination. It's NULL terminating the string in the source buffer,
but by the time that happens, we've already copied the string.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch ensures the correct labeling of incoming connection requests
responses via NetLabel by enabling the recent changes to NetLabel and the
SELinux/Netlabel glue code.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the netlbl_req_setattr() and netlbl_req_delattr() functions
which can be used by LSMs to set and remove the NetLabel security attributes
from request_sock objects used in incoming connection requests.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add the cipso_v4_req_setattr() and cipso_v4_req_delattr() functions to set and
delete the CIPSO security attributes on a request_sock used during a incoming
connection request.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current placement of the security_inet_conn_request() hooks do not allow
individual LSMs to override the IP options of the connection's request_sock.
This is a problem as both SELinux and Smack have the ability to use labeled
networking protocols which make use of IP options to carry security attributes
and the inability to set the IP options at the start of the TCP handshake is
problematic.
This patch moves the IPv4 security_inet_conn_request() hooks past the code
where the request_sock's IP options are set/reset so that the LSM can safely
manipulate the IP options as needed. This patch intentionally does not change
the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options
are not currently implemented, once they are we will have a better idea of
the correct placement for the IPv6 hooks.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Doing it in reverse order causes uevent to be sent before
we have a MAC address, which confuses udev.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The return value of dup2 when oldfd == newfd and the fd isn't valid is
not getting properly sign extended. We end up with 4294967287 instead
of -EBADF.
I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory
(2.6.29-rc5), and Ubuntu 9.04 (2.6.28).
This patch uses a signed int for the error value so it is properly
extended.
Currently, the i2c-algo-pca driver does nothing if the chip enters state
0x30 (Data byte in I2CDAT has been transmitted; NOT ACK has been
received). Thus, the i2c bus connected to the controller gets stuck
afterwards.
I have seen this kind of error on a custom board in certain load
situations most probably caused by interference or noise.
A possible reaction is to let the controller generate a STOP condition.
This is documented in the PCA9564 data sheet (2006-09-01) and the same
is done for other NACK states as well.
Further, state 0x38 isn't handled completely, either. Try to do another
START in this case like the data sheet says. As this couldn't be tested,
I've added a comment to try to reset the chip if the START doesn't help
as suggested by Wolfram Sang.
Signed-off-by: Enrik Berkhan <Enrik.Berkhan@ge.com> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When fetching DDC using i2c algo bit, we were often seeing timeouts
before getting valid EDID on a retry. The VESA spec states 2ms is the
DDC timeout, so when this translates into 1 jiffie and we are close
to the end of the time period, it could return with a timeout less than
2ms.
Change this code to use time_after instead of time_after_eq.
Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
One of the changes between kernels 2.6.28 and 2.6.29 is that a branch profiler
has been added for if() statements. Unfortunately this patch makes the sparse
output unusable with CONFIG_TRACE_BRANCH_PROFILING=y: when branch profiling is
enabled, sparse prints so much false positives that the real issues are no
longer visible. This behavior can be reproduced as follows:
* enable CONFIG_TRACE_BRANCH_PROFILING, e.g. by running make allyesconfig or
make allmodconfig.
* run make C=2
Result: a huge number of the following sparse warnings.
...
include/linux/cpumask.h:547:2: warning: symbol '______r' shadows an earlier one
include/linux/cpumask.h:547:2: originally declared here
...
The patch below fixes this by disabling branch profiling while analyzing the
kernel code with sparse.
This patch is already included in 2.6.30-rc1 -- see also
http://lkml.org/lkml/2009/4/5/120.
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <200904051620.02311.bart.vanassche@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 360782dde00a2e6e7d9fd57535f90934707ab8a8 (hwmon: (w83781d) Stop
abusing struct i2c_client for ISA devices) broke W83782D support for
devices connected on the ISA bus. You will hit a NULL pointer
dereference as soon as you read any device attribute. Other devices,
and W83782D devices on the SMBus, aren't affected.
Reported-by: Michel Abraham Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Michel Abraham Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
a recent fix to e1000 (commit 15b2bee2) caused KVM/QEMU/VMware based
virtualized e1000 interfaces to begin failing when resetting.
This is because the driver in a virtual environment doesn't
get to run instructions *AT ALL* when an interrupt is asserted.
The interrupt code runs immediately and this recent bug fix
allows an interrupt to be possible when the interrupt handler
will reject it (due to the new code), when being called from
any path in the driver that holds the E1000_RESETTING flag.
the driver should use the __E1000_DOWN flag instead of the
__E1000_RESETTING flag to prevent interrupt execution
while reconfiguring the hardware.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The mis-typing exist in dapm controller definitions and dapm route definitions,
so happen mis-matched error when snd_soc_dapm_add_routes().
Signed-off-by: Jinyoung Park <parkjy@mtekvision.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
BIOS on Mac Mini Core2 Duo sets both INPUT and OUTPUT pinctl bits to
the line-in jack, and it confuses the driver as if it's a valid input.
This patch adds the check of OUTPUT bit so that the driver fixes the
invalid pin setup.
This patch (as1234) fixes a bug in the UTF8 -> UTF-16 conversion
routine in the gadget/usbstring library. In a UTF-8 multi-byte
sequence, all bytes after the first should have their high-order
two bits set to 10, not 11.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1238) adds proper reference counting for ftdi_sio's
private data structure. Without it, the driver will free the
structure while it is still in use if the user unplugs the serial
device before closing the device file.
The patch also replaces a slightly dangerous
cancel_delayed_work/flush_scheduled_work pair with
cancel_delayed_work_sync, which is always safer.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Daniel Mack <daniel@caiaq.de> Tested-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When md is loading a bitmap which it knows is out of date, it fills
each page with 1s and writes it back out again. However the
write_page call makes used of bitmap->file_pages and
bitmap->last_page_size which haven't been set correctly yet. So this
can sometimes fail.
Move the setting of file_pages and last_page_size to before the call
to write_page.
This bug can cause the assembly on an array to fail, thus making the
data inaccessible. Hence I think it is a suitable candidate for
-stable.
If we have a raid10 with multiple missing devices, and we recover just
one of these to a spare, then we risk (depending on the bitmap and
array chunk size) clearing bits of the bitmap for which recovery isn't
complete (because a device is still missing).
This can lead to a subsequent "re-add" being recovered without
any IO happening, which would result in loss of data.
This patch takes the safe approach of not clearing bitmap bits
if the array will still be degraded.
This patch is suitable for all active -stable kernels.
If a write intent bitmap covers more than 2TB, we sometimes work with
values beyond 32bit, so these need to be sector_t. This patches
add the required casts to some unsigned longs that are being shifted
up.
This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with
member devices that are larger than 2TB.
Being able to write 'clean' to an 'array_state' of an inactive array
to activate it in 'clean' mode is both unnecessary and inconvenient.
It is unnecessary because the same can be achieved by writing
'active'. This activates and array, but it still remains 'clean'
until the first write.
It is inconvenient because writing 'clean' is more often used to
cause an 'active' array to revert to 'clean' mode (thus blocking
any writes until a 'write-pending' is promoted to 'active').
Allowing 'clean' to both activate an array and mark an active array as
clean can lead to races: One program writes 'clean' to mark the
active array as clean at the same time as another program writes
'inactive' to deactivate (stop) and active array. Depending on which
writes first, the array could be deactivated and immediately
reactivated which isn't what was desired.
So just disable the use of 'clean' to activate an array.
This avoids a race that can be triggered with mdadm-3.0 and external
metadata, so it suitable for -stable.
Reported-by: Rafal Marszewski <rafal.marszewski@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a problem where the generic block based fiemap stuff would not
properly set FIEMAP_EXTENT_LAST on the last extent. I've reworked things
to keep track if we go past the EOF, and mark the last extent properly.
The problem was reported by and tested by Eric Sandeen.
Tested-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-btrfs@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Without this after scanning your device will set
the association ID to something bogus and what is
being reported is multicast/broadcast frame are not
being received. For details see this bug report:
So that a new created IBSS network
doesn't break on the first scan.
It seems to Sujith and me that this
stupid code unnecessary, too.
So remove it...
Reported-by: David Woodhouse <dwmw2@infradead.org> Tested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Alina Friedrichsen <x-alina@gmx.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jouni Malinen <Jouni.Malinen@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Gallatin reported that IRQ and SOFTIRQ times were
sometime not reported correctly on recent kernels, and even
bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4
([PATCH] fix scaled & unscaled cputime accounting) as the first
bad commit.
account_process_tick() was not taking into account timer IRQ
interrupting the idle task servicing a hard or soft irq.
On mostly idle cpu, irqs were thus not accounted and top or
mpstat could tell user/admin that cpu was 100 % idle, 0.00 %
irq, 0.00 % softirq, while it was not.
[ Impact: fix occasionally incorrect CPU statistics in top/mpstat ]
Reported-by: Andrew Gallatin <gallatin@myri.com> Re-reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: rick.jones2@hp.com Cc: brice@myri.com Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <49F84BC1.7080602@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
rndis_wext_link_change() might be called from rndis_command() at
initialization stage and priv->workqueue/priv->work have not been
initialized yet. This causes invalid opcode at rndis_wext_bind on
some brands of bcm4320.
Fix by initializing workqueue/workers in rndis_wext_bind() before
rndis_command is used.
This bug has existed since 2.6.25, reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=12794
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.
But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).
The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
backed by a file. The assumption is made that the page required is
order-0 and "normal" page cache.
On hugetlbfs, this assumption is not true and order-0 pages are
allocated and inserted into the hugetlbfs page cache. This leaks
hugetlbfs page reservations and can cause BUGs to trigger related to
corrupted page tables.
This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
regions.
tick_handle_periodic() can lock up hard when a one shot clock event
device is used in combination with jiffies clocksource.
Avoid an endless loop issue by requiring that a highres valid
clocksource be installed before we call tick_periodic() in a loop when
using ONESHOT mode. The result is we will only increment jiffies once
per interrupt until a continuous hardware clocksource is available.
Without this, we can run into a endless loop, where each cycle through
the loop, jiffies is updated which increments time by tick_period or
more (due to clock steering), which can cause the event programming to
think the next event was before the newly incremented time and fail
causing tick_periodic() to be called again and the whole process loops
forever.
[ Impact: prevent hard lock up ]
Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the BIOS does something obviously stupid, like claiming that the
registers for the IOMMU are at physical address zero, then print a nasty
message and abort, rather than trying to set up the IOMMU and then later
panicking.
It's becoming more and more obvious that trusting this stuff to the BIOS
was a mistake.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the DMAR table identifies that a PCI-PCI bridge belongs to a given
IOMMU, that means that the bridge and all devices behind it should be
associated with the IOMMU. Not just the bridge itself.
This fixes the device_to_iommu() function accordingly.
(It's broken if you have the same PCI bus numbers in multiple domains,
but this function was always broken in that way; I'll be dealing with
that later).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.
ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well. The
start_stack address also leaks potentially useful information.
Cc: Stable Team <stable@kernel.org> Signed-off-by: Jake Edge <jake@lwn.net> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
char bname[5] is too small for the string "X GHz" when the null
terminator is taken into account. Thus, turning on rate debugging
can crash unless we have lucky stack alignment.
Cc: stable@kernel.org Reported-by: Paride Legovini <legovini@spiro.fisica.unipd.it> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, when OOM occurs during rx ring refill, mv643xx_eth will get
into an infinite loop, due to the refill function setting the OOM bit
but not clearing the 'rx refill needed' bit for this queue, while the
calling function (the NAPI poll handler) will call the refill function
in a loop until the 'rx refill needed' bit goes off, without checking
the OOM bit.
This patch fixes this by checking the OOM bit in the NAPI poll handler
before attempting to do rx refill. This means that once OOM occurs,
we won't try to do any memory allocations again until the next invocation
of the poll handler.
While we're at it, change the OOM flag to be a single bit instead of
one bit per receive queue since OOM is a system state rather than a
per-queue state, and cancel the OOM timer on entry to the NAPI poll
handler if it's running to prevent it from firing when we've already
come out of OOM.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On several mv643xx_eth hardware versions, the two 64bit mib counters
for 'good octets received' and 'good octets sent' are actually 32bit
counters, and reading from the upper half of the register has the same
effect as reading from the lower half of the register: an atomic
read-and-clear of the entire 32bit counter value. This can under heavy
traffic occasionally lead to small numbers being added to the upper
half of the 64bit mib counter even though no 32bit wrap has occured.
Since we poll the mib counters at least every 30 seconds anyway, we
might as well just skip the reads of the upper halves of the hardware
counters without breaking the stats, which this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
write_lock(¤t->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.
With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.
If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:
Two threads T1 and T2 and another process P, all share the same
->fs.
T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.
P exits and decrements fs->users.
T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.
T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.
T1 does clone(CLONE_FS /* without CLONE_THREAD */).
T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.
Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.
When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.
Also, we do not need fs->lock to clear fs->in_exec.
* all changes of current->fs are done under task_lock and write_lock of
old fs->lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
same time we decide whether to free it.
* put_fs_struct() is gone
* new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do
execve() and only subthreads share fs_struct. Cleared when finishing exec
(success and failure alike). Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
is in progress.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pure code move; two new helper functions for nfsd and daemonize
(unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
the same code as used to be in callers). unshare_fs_struct()
exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
copy_fs_struct() and exit_fs() don't need exports anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Not because execve races with _that_ are serious - we really
need a situation when final drop of fs_struct refcount is
done by something that used to have it as current->fs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Annotate struct fs_struct's usage count to indicate the restrictions upon it.
It may not be incremented, except by clone(CLONE_FS), as this affects the
check in check_unsafe_exec() in fs/exec.c.
check_unsafe_exec() also notes whether the fs_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
use of get_fs_struct(), which also raises that sharing count.
This might occasionally cause a setuid program not to change euid,
in the same way as happened with files->count (check_unsafe_exec
also looks at sighand->count, but /proc doesn't raise that one).
We'd prefer exec not to unshare fs_struct: so fix this in procfs,
replacing get_fs_struct() by get_fs_path(), which does path_get
while still holding task_lock, instead of raising fs->count.