Filesystem Modification Events


This feature was introduced in BeeGFS v7.

The modification event logging facility of BeeGFS uses the metadata servers to collect information
about modified files and directories in the file system. These messages are forwarded to external
applications using a UNIX socket. As an example of such a tool, we provide the 'beegfs-event-listener'
as part of the beegfs-utils package. It collects the event information from the metadata servers,
and prints them as JSON formatted text to STDOUT.

When configured for modification event logging, each metadata server checks for a socket at the
log target path specified in the config file, and tries to deliver modification event packets there.
Each metadata server only collects info about the files it manages itself, so one metadata-event-listener
is needed per metadata server. In case of metadata mirroring, events are only emitted from the primary
server. The secondary should also be equipped with a event listener, since it will become the primary
in case of a fail-over.

The provided tool is just an example - there are many possibilities in developing your own tools,
for example adapters to backup systems. When developing your own software using BeeGFS modification
event logging, you can use the beegfs_file_event_log.hpp provided as part of the beegfs-utils-devel package.
It allows you to read the file modification event stream provided by the metadata server.
As a reference, the source code of the beegfs-modification-event-listener is also provided as part of the package.

Configuration


Client


The metadata server has to rely on the client to forward some information when
doing some actions to be able to complete the modification event messages.
The events of interest can be selected in the client configuration file:

 sysFileEventLogMask = flush,trunc,setattr,close,link-op


For a complete coverage of all possible events, switch on everything, as shown above.
If you only need a subset of event types, others can be removed from the list to
reduce the performance overhead. But usually this is not worthwhile since the overhead is very small.

Metadata server


To enable the event stream, specify the path for the UNIX socket to use in the
configuration file of the metadata server. For example:
  sysFileEventLogTarget = unix:///tmp/beegfslog

If this variable is set, the server will try to write to this socket every time
a filesystem event occurs that is related to this metadata server.

The receiving application has to open the socket at that path.
It is recommended to start the receiving application before the metadata server,
since undeliverable event messages will be discarded. In this case the dropped events
counter included in each message is increased to inform the receiver.

To capture all events of the file system and to get the full picture,
the event output has to be activated on all metadata servers, each with their own
local UNIX socket and receiving application instance. The merging of the multiple
streams is left as a task for the receiving application.

beegfs-event-listener


The beegfs-event-listener program is included in the beegfs-utils package.
It opens a UNIX socket at the specified path and listens for incoming messages.
For example
$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog

Every message is printed as one line of JSON formatted output.

Example

  $ mv /mnt/beegfs0/a /mnt/beegfs0/b

will result in, for example:
{ "VersionMajor": 1, "VersionMinor": 0, "PacketSize": 77, "Dropped": 0, "Missed": 0, "Event": { "Type": "Rename", "Path": "\/a", "EntryId": "0-5A9EB0A7-1", "ParentEntryId": "root", "TargetPath": "\/b", "TargetParentId": "root" } }


The output can easily be parsed by scripts. For example this simple ruby program will
print the event type and the file path for each event:

read-event-log.rb:
#!/usr/bin/env ruby

require "json"

def printEvent(event)
	if event
		print "Event: #{event['Type']} #{event['Path']}\n"
	end
end

while a = gets
	json_data = JSON.parse(a)
	printEvent(json_data['Event'])
end


Use it like this:
/opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog | ./read-event-log.rb


Messages


Every event message consists of the following fields:


For details see /usr/include/beegfs/beegfs_file_event_log.hpp, and the example code at
/usr/share/doc/beegfs-utils-devel/examples/beegfs-event-listener/,
both included in the beegfs-utils-devel package,

Event Types


For most events the target path and target EntryID fields are empty.
Path, EntryId, and ParentEntryId always contain information about the file/directory being worked on.

Path Full path relative to the mountpoint of the file/directory
EntryId The EntryID (similar to an inode number of other systems) of the file/directory
ParentEntryId The EntryID of the parent directory

The following event types exist:
File contents was flushed. File size might have changed.
File was truncated. File size might have changed.
File attributes changed.
File was closed and possibly modified.
New file was created.
New directory was created.
A block or character special file was created.
Directory was removed.
Note: Multiple paths can reference the same EntryID (File Content). Disk space is only freed if the last link has been removed.
Path, EntryId, and ParentEntryId: of the newly created link.
TargetPath and TargetParentEntryID: of the referenced file/directory.
Note: since this is a symbolic link, the target may contain relative and/or non-existing paths.
A hardlink was created.
Path: Path relative to the mount point of the new link
EntryId: The EntryID of the referenced file
ParentEntryId: The EntryID of the parent directory, containing the new link.
Since hardlinks are only supported within the same directory, this is identical to the parent directory of the source.
TargetPath, TargetEntryId: of the link target
A file or directory was renamed or moved.
Path: Thing being moved
EntryID: Its entryID
ParentEntryID: Its parentEntryID
TargetPath, TargetParentId: The path/name moved to and the EntryID of the new parent directory.

Each message contains a "dropped" and a "missed" counter.
The dropped counter is incremented for each message that could not be delivered.
The missed counter counts events that can refer to multiple paths at the same time, e.g. hardlinks.
Decisions on when a full scan of the file system is needed can be made based on the value of these counters.

EntryIDs


BeeGFS uses EntryId to identify files and directories, similar to inodes on normal UNIX file systems.
An EntryID is a string of the following form:
 root|disposal|mdisposal|[0-9A-F]{1,8}-[0-9A-F]{1,8}-[0-9A-F]{1,8}

The three hex numbers can be represented as positive, non-zero integers.
The special cases root, disposal, mdisposal do not appear for normal files and
are for internal bookkeeping only. They can be represented by the integer triple
by including zeros, for example.

Valid XHTML :: Valid CSS: :: Powered by WikkaWiki