Eric Paris [Wed, 28 Jul 2010 14:18:38 +0000 (10:18 -0400)]
vfs/fsnotify: fsnotify_close can delay the final work in fput
fanotify almost works like so:
user context calls fsnotify_* function with a struct file.
fsnotify takes a reference on the struct path
user context goes about it's buissiness
at some later point in time the fsnotify listener gets the struct path
fanotify listener calls dentry_open() to create a file which userspace can deal with
listener drops the reference on the struct path
at some later point the listener calls close() on it's new file
With the switch from struct path to struct file this presents a problem for
fput() and fsnotify_close(). fsnotify_close() is called when the filp has
already reached 0 and __fput() wants to do it's cleanup.
The solution presented here is a bit odd. If an event is created from a
struct file we take a reference on the file. We check however if the f_count
was already 0 and if so we take an EXTRA reference EVEN THOUGH IT WAS ZERO.
In __fput() (where we know the f_count hit 0 once) we check if the f_count is
non-zero and if so we drop that 'extra' ref and return without destroying the
file.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
fsnotify: store struct file not struct path
Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file. To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
fsnotify: fsnotify_add_notify_event should return an event
Rather than the horrific void ** argument and such just to pass the
fanotify_merge event back to the caller of fsnotify_add_notify_event() have
those things return an event if it was different than the event suggusted to
be added.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
fanotify: groups can specify their f_flags for new fd
Currently fanotify fds opened for thier listeners are done with f_flags
equal to O_RDONLY | O_LARGEFILE. This patch instead takes f_flags from the
fanotify_init syscall and uses those when opening files in the context of
the listener.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
fsnotify: check to make sure all fsnotify bits are unique
This patch adds a check to make sure that all fsnotify bits are unique and we
cannot accidentally use the same bit for 2 different fsnotify event types.
The mask checks in inotify_update_existing_watch() and
inotify_new_watch() are useless because inotify_arg_to_mask() sets
FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
inotify: force inotify and fsnotify use same bits
inotify uses bits called IN_* and fsnotify uses bits called FS_*. These
need to line up. This patch adds build time checks to make sure noone can
change these bits so they are not the same.
Eric Paris [Wed, 28 Jul 2010 14:18:37 +0000 (10:18 -0400)]
inotify: allow users to request not to recieve events on unlinked children
An inotify watch on a directory will send events for children even if those
children have been unlinked. This patch add a new inotify flag IN_EXCL_UNLINK
which allows a watch to specificy they don't care about unlinked children.
This should fix performance problems seen by tasks which add a watch to
/tmp and then are overrun with events when other processes are reading and
writing to unlinked files they created in /tmp.
https://bugzilla.kernel.org/show_bug.cgi?id=16296
Requested-by: Matthias Clasen <mclasen@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
Tejun Heo [Wed, 19 May 2010 15:36:28 +0000 (01:36 +1000)]
fsnotify: update gfp/slab.h includes
Implicit slab.h inclusion via percpu.h is about to go away. Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
The symbol inotify_max_user_watches is not used outside this
file and should be static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 12 May 2010 15:42:29 +0000 (11:42 -0400)]
fsnotify: initialize mask in fsnotify_perm
akpm got a warning the fsnotify_mask could be used uninitialized in
fsnotify_perm(). It's not actually possible but his compiler complained
about it. This patch just initializes it to 0 to shut up the compiler.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Wed, 21 Apr 2010 20:49:38 +0000 (16:49 -0400)]
fsnotify: call iput on inodes when no longer marked
fsnotify takes an igrab on an inode when it adds a mark. The code was
supposed to drop the reference when the mark was removed but didn't.
This caused problems when an fs was unmounted because those inodes would
clearly not be gone. Thus resulting in the most devistating of messages:
VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
>>> Have a nice day...
Jiri Slaby bisected the problem to a patch in the fsnotify tree. The
code snippets below show my stupidity quite clearly.
Obviously the intent was to capture the inode before it was set to NULL in
fsnotify_destory_inode_mark() so we wouldn't be leaking inodes forever.
Instead we leaked them (and exploded on umount)
Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: userspace interface for permission responses
fanotify groups need to respond to events which include permissions types.
To do so groups will send a response using write() on the fanotify_fd they
have open.
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: permissions and blocking
This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events. This is done using the
new fsnotify secondary queue. No userspace interface is provided actually
respond to or request these events.
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: new fsnotify hooks and events types for access decisions
introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code. This hook is used to allow fsnotify groups to make access
control decisions about events on the system. We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.
Dave Young [Fri, 26 Feb 2010 01:28:57 +0000 (20:28 -0500)]
sysctl extern cleanup: inotify
Extern declarations in sysctl.c should be move to their own head file, and
then include them in relavant .c files.
Move inotify_table extern declaration to linux/inotify.h
Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 8 Feb 2010 17:53:52 +0000 (12:53 -0500)]
fsnotify: use unsigned char * for dentry->d_name.name
fsnotify was using char * when it passed around the d_name.name string
internally but it is actually an unsigned char *. This patch switches
fsnotify to use unsigned and should silence some pointer signess warnings
which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify
code as there are still issues with kstrdup and strlen which would pop
out needless warnings.
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: use merge argument to determine actual event added to queue
fanotify needs to know the actual event added to queues so it can be
correctly checked for return values from userspace. To do this we need to
pass that information from the merger code back to the main even handling
routine. Currently that information is unused, but it will be.
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: intoduce a notification merge argument
Each group can define their own notification (and secondary_q) merge
function. Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event. But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue. This patch just implements a
void ** argument which is passed to the merge function. fanotify can use
this field to pass the new event back to higher layers.
Signed-off-by: Eric Paris <eparis@redhat.com>
for fanotify to properly deal with permissions events
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fsnotify: add group priorities
This introduces an ordering to fsnotify groups. With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important. But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.
eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.) Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.
Eric Paris [Fri, 18 Dec 2009 02:24:34 +0000 (21:24 -0500)]
fanotify: clear all fanotify marks
fanotify listeners may want to clear all marks. They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fanotify: allow ignored_masks to survive modify
Some users may want to truely ignore an inode even if it has been modified.
Say you are wanting a mount which contains a log file and you really don't
want any notification about that file. This patch allows the listener to
do that.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: allow ignored_mask to survive modification
Some inodes a group may want to never hear about a set of events even if
the inode is modified. We add a new mark flag which indicates that these
marks should not have their ignored_mask cleared on modification.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: clear ignored mask on modify
On inode modification we clear the ignored mask for all of the marks on the
inode. This allows userspace to ignore accesses to inodes until there is
something different.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fanotify: allow users to set an ignored_mask
Change the sys_fanotify_mark() system call so users can set ignored_masks
on inodes. Remember, if a user new sets a real mask, and only sets ignored
masks, the ignore will never be pinned in memory. Thus ignored_masks can
be lost under memory pressure and the user may again get events they
previously thought were ignored.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: ignored_mask - excluding notification
The ignored_mask is a new mask which is part of fsnotify marks. A group's
should_send_event() function can use the ignored mask to determine that
certain events are not of interest. In particular if a group registers a
mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
ignored_mask for individual inodes and not send open events for those
inodes.
Eric Paris [Fri, 18 Dec 2009 02:24:33 +0000 (21:24 -0500)]
fsnotify: allow marks to not pin inodes in core
inotify marks must pin inodes in core. dnotify doesn't technically need to
since they are closed when the directory is closed. fanotify also need to
pin inodes in core as it works today. But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest. I claim that these should be liberally sent to the
kernel and should not pin the inode in core. If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.
This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning. We do not pin a mark until it has a non-zero mask entry. If a
listener new sets a mask we never pin the inode.
fanotify: remove outgoing function checks in fanotify.h
A number of validity checks on outgoing data are done in static inlines but
are only used in one place. Instead just do them where they are used for
readability.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify_mark_validate functions are all needlessly declared in headers as
static inlines. Instead just do the checks where they are needed for code
readability.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:29 +0000 (21:24 -0500)]
fanotify: hooks the fanotify_mark syscall to the vfsmount code
Create a new fanotify_mark flag which indicates we should attach the mark
to the vfsmount holding the object referenced by dfd and pathname rather
than the inode itself.
Eric Paris [Fri, 18 Dec 2009 02:24:28 +0000 (21:24 -0500)]
fanotify: should_send_event needs to handle vfsmounts
currently should_send_event in fanotify only cares about marks on inodes.
This patch extends that interface to indicate that it cares about events
that happened on vfsmounts.
Per-mount watches allow groups to listen to fsnotify events on an entire
mount. This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: vfsmount marks generic functions
Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.
fsnotify/vfsmount: add fsnotify fields to struct vfsmount
This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode. They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: clear marks to 0 in fsnotify_init_mark
Currently fsnotify_init_mark sets some fields to 0/NULL. Some users
already used some sorts of zalloc, some didn't. This patch uses memset to
explicitly zero everything in the fsnotify_mark when it is initialized so we
don't have to be careful if fields are later added to marks.
Eric Paris [Fri, 18 Dec 2009 02:24:27 +0000 (21:24 -0500)]
fsnotify: split generic and inode specific mark code
currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Heiko Carstens [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark
Please note that you need the patch below in addition, otherwise the
syscall wrapper stuff won't work on those 32 bit architectures which enable
the wrappers.
When enabled the syscall wrapper defines always take long parameters and then
cast them to whatever is needed. This approach doesn't work for the 32 bit
case where the original syscall takes a long long parameter, since we would
lose the upper 32 bits.
So syscalls with 64 bit arguments are special cases wrt to syscall wrappers
and enp up in the ugliness below (see also sys_fallocate). In addition these
special cased syscall wrappers have the drawback that ftrace syscall tracing
doesn't work on them, since they don't get defined by using the usual macros.
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: fanotify_mark syscall implementation
NAME
fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object
SYNOPSIS
int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
int dfd, const char *pathname)
DESCRIPTION
fanotify_mark() is used to add remove or modify a mark on a filesystem
object. Marks are used to indicate that the fanotify group is
interested in events which occur on that object. At this point in
time marks may only be added to files and directories.
fanotify_fd must be a file descriptor returned by fanotify_init()
The flags field must contain exactly one of the following:
FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
from the mark
The following values can be OR'd into the flags field:
FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)
dfd may be any of the following:
AT_FDCWD: the object will be lookup up based on pathname similar
to open(2)
file descriptor of a directory: if pathname is not NULL the
object to modify will be lookup up similar to openat(2)
file descriptor of the final object: if pathname is NULL the
object to modify will be the object referenced by dfd
The mask is the bitwise OR of the set of events of interest such as:
FAN_ACCESS - object was accessed (read)
FAN_MODIFY - object was modified (write)
FAN_CLOSE_WRITE - object was writable and was closed
FAN_CLOSE_NOWRITE - object was read only and was closed
FAN_OPEN - object was opened
FAN_EVENT_ON_CHILD - interested in objected that happen to
children. Only relavent when the object
is a directory
FAN_Q_OVERFLOW - event queue overflowed (not implemented)
RETURN VALUE
On success, this system call returns 0. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL An invalid value was specified in mask.
EINVAL An invalid value was specified in ignored_mask.
EINVAL fanotify_fd is not a file descriptor as returned by
fanotify_init()
EBADF fanotify_fd is not a valid file descriptor
EBADF dfd is not a valid file descriptor and path is NULL.
ENOTDIR dfd is not a directory and path is not NULL
EACCESS no search permissions on some part of the path
ENENT file not found
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Eric Paris [Fri, 18 Dec 2009 02:24:26 +0000 (21:24 -0500)]
fanotify: fanotify_init syscall implementation
NAME
fanotify_init - initialize an fanotify group
SYNOPSIS
int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);
DESCRIPTION
fanotify_init() initializes a new fanotify instance and returns a file
descriptor associated with the new fanotify event queue.
The following values can be OR'd into the flags field:
FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
Using this flag saves extra calls to fcntl(2) to achieve the same
result.
FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
See the description of the O_CLOEXEC flag in open(2) for reasons why
this may be useful.
The event_f_flags argument is unused and must be set to 0
The priority argument is unused and must be set to 0
RETURN VALUE
On success, this system call return a new file descriptor. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL A non-zero valid was passed in event_f_flags or in priority
ENFILE The system limit on the total number of file descriptors has been reached.
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: do not clone on merge unless needed
Currently if 2 events are going to be merged on the notication queue with
different masks the second event will be cloned and will replace the first
event. However if this notification queue is the only place referencing
the event in question there is no reason not to just update the event in
place. We can tell this if the event->refcnt == 1. Since we hold a
reference for each queue this event is on we know that when refcnt == 1
this is the only queue. The other concern is that it might be about to be
added to a new queue, but this can't be the case since fsnotify holds a
reference on the event until it is finished adding it to queues.
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: merge notification events with different masks
Instead of just merging fanotify events if they are exactly the same, merge
notification events with different masks. To do this we have to clone the
old event, update the mask in the new event with the new merged mask, and
put the new event in place of the old event.
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify:drop notification if they exist in the outgoing queue
fanotify listeners get an open file descriptor to the object in question so
the ordering of operations is not as important as in other notification
systems. inotify will drop events if the last event in the event FIFO is
the same as the current event. This patch will drop fanotify events if
they are the same as another event anywhere in the event FIFO.
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
fanotify: fscking all notification system
fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question. This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification. These are useful for on access scanners or hierachical storage
management schemes.
This patch just implements the basics of the fsnotify functions.
Eric Paris [Fri, 18 Dec 2009 02:24:25 +0000 (21:24 -0500)]
vfs: introduce FMODE_NONOTIFY
This is a new f_mode which can only be set by the kernel. It indicates
that the fd was opened by fanotify and should not cause future fanotify
events. This is needed to prevent fanotify livelock. An example of
obvious livelock is from fanotify close events.
Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?
The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify. Thus when that file is used it will not generate future
events.
fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call. Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
Some fsnotify operations send a struct file. This is more information than
we technically need. We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Fri, 18 Dec 2009 02:24:24 +0000 (21:24 -0500)]
fsnotify: add flags to fsnotify_mark_entries
To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union
vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: put inode specific fields in an fsnotify_mark in a union
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: include vfsmount in should_send_event when appropriate
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: mount point listeners list and global mask
currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.
The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: add groups to fsnotify_inode_groups when registering inode watch
Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation. This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event. This patch adds groups to the global list on
when the first inode mark is added to the group.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: initialize the group->num_marks in a better place
Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list. This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0) This patch moves the initialization stuff and makes it clear when
it is really being held.
Eric Paris [Fri, 18 Dec 2009 02:24:23 +0000 (21:24 -0500)]
fsnotify: rename fsnotify_groups to fsnotify_inode_groups
Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
Audit: only set group mask when something is being watched
Currently the audit watch group always sets a mask equal to all events it
might care about. We instead should only set the group mask if we are
actually watching inodes. This should be a perf win when audit watches are
compiled in.
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
fsnotify_obtain_group was intended to be able to find an already existing
group. Nothing uses that functionality. This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: remove group_num altogether
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added. I no longer think this is a
necessary thing to do and so we remove the group_num.
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: lock annotation for event replacement
fsnotify_replace_event need to lock both the old and the new event. This
causes lockdep to get all pissed off since it dosn't know this is safe.
It's safe in this case since the new event is impossible to be reached from
other places in the kernel.
Eric Paris [Fri, 18 Dec 2009 02:24:22 +0000 (21:24 -0500)]
fsnotify: replace an event on a list
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event. This patch implements the replace functionality of that
process.
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: clone existing events
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller. Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created. Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: per group notification queue merge types
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: send struct file when sending events to parents when possible
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: pass a file instead of an inode to open, read, and write
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: include data in should_send calls
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Eric Paris [Fri, 18 Dec 2009 02:24:21 +0000 (21:24 -0500)]
fsnotify: provide the data type to should_send_event
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Eric Paris [Wed, 23 Dec 2009 04:16:33 +0000 (23:16 -0500)]
inotify: do not spam console without limit
inotify was supposed to have a dmesg printk ratelimitor which would cause
inotify to only emit one message per boot. The static bool was never set
so it kept firing messages. This patch correctly limits warnings in multiple
places.
Eric Paris [Fri, 18 Dec 2009 01:27:10 +0000 (20:27 -0500)]
inotify: do not reuse watch descriptors
Prior to 2.6.31 inotify would not reuse watch descriptors until all of
them had been used at least once. After the rewrite inotify would reuse
watch descriptors. The selinux utility 'restorecond' was found to have
problems when watch descriptors were reused. This patch reverts to the
pre inotify rewrite behavior to not reuse watch descriptors.
Eric Paris [Fri, 18 Dec 2009 01:12:07 +0000 (20:12 -0500)]
fsnotify: use kmem_cache_zalloc to simplify event initialization
fsnotify event initialization is done entry by entry with almost everything
set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things
that need non-zero initialization. Also means we don't have to change
initialization entries based on the config options.