Quota Support



Table of Contents (Page)

  1. Introduction
  2. Quota tracking
    1. Requirements and general notes
    2. Enabling quota during a new BeeGFS installation
    3. Enabling quota for an existing BeeGFS installation
    4. Querying quota information
  3. Quota enforcement
    1. Requirements
    2. Enable quota enforcement
    3. Setting quota limits
  4. Project directory quota tracking
 


Introduction


BeeGFS allows the definition of system-wide quotas of disk space allocation and number of chunk files, on a per-user or per-group basis. This can be used to organize users in different access layers with different levels of restriction and also prevent individuals from consuming alone all file system's resources.

The BeeGFS quota management mechanism is composed of two features: quota tracking and quota enforcement. Quota tracking allows the query of the amount of data and the number of chunk files that users and groups are using in the system, without imposing any restriction.

Quota enforcement allows the definition and application of quota limits in the whole system. When this feature is enabled, the BeeGFS management daemon periodically collects quota reports from all storage targets in regular intervals, checks for exceeded quota limits, and informs the rest of the system about which users are no longer allowed to consume more resources.

BeeGFS quota management relies on quota data provided by the underlying file systems of storage server targets. Therefore, the capabilities of such file systems determine which types of quota BeeGFS is able to manage. For example, if the storage targets a version of ZFS prior to 0.7.4, BeeGFS will allow the definition of quotas only for used space, not for number of files, as the latter is not supported by old releases of ZFS. If you use ZFS 0.7.4 or later, the latest version of BeeGFS will allow you define both types of quota.

Quota limits can be configured globally, or separately for each storage pool.
The creation of new files will be prohibited when either the global or the per pool limit is reached.

The following sections explain in more details how these features work and how they can be configured.


Quota tracking


This section provides information on how to enable tracking of used disk space and number of chunk files on the storage targets.

Requirements and general notes


To enable quota tracking, the BeeGFS release of all server and client services must be 2012.10-r11 or higher. Quota tracking is designed to generally work with any underlying local file system on the storage servers that supports user and group quota (reported through the syscall quotactl()), but has only been fully tested with ext4, XFS and ZFS. If ZFS is used as the underlying file system of storage targets, the release of BeeGFS storage services must be at least 2015.03-r10.

Make sure that the local systems of all nodes are correctly configured to query passwd and group databases, by running the commands below. The first command should print the complete list of user IDs. The second one should print the complete list of group IDs.
$ getent passwd
$ getent group

If the commands above do not list all users and groups, you will not be able to use the command "beegfs-ctl --getquota --all" to query used space for all users at once and you will not be able to use "quotaQueryType = system" in file beegfs-mgmtd.conf for quota enforcement. However, there are alternatives to both, which you will find in further sections.

If you are also creating files on the storage targets outside of the BeeGFS storage directory, note that the blocks and inodes occupied by those files will also account as used resources for the corresponding owner user. The reports would also be distorted if multiple storage targets were located within the same local file system instance.

Files stored in the disposal directory (which do not appear under the BeeGFS client mountpoint) also account for the amount of space used by users. Therefore, try to clear the disposal directory if you think that shown used space defers from actually used disk space.

Quota tracking has no requirement concerning metadata targets.

It is important to note that quota limits of number of files concern data chunk files created on storage targets and not files created by end-users under the BeeGFS mount point. It is also important to understand that such quota limits do not concern number of directories created in the system.


Enabling quota during a new BeeGFS installation


Walk through these steps if you are about to setup a new BeeGFS instance that should support quota.

In this example, we assume that /dev/sdb is the underlying disk or RAID array of a storage target, which is mounted to the directory /data.

1) Start by enabling quota support for the underlying file system on the storage targets, as described below for ext4, XFS, and ZFS.

ext4: Enable quota support for ext4:
# Mount device with quota support for users and groups
$ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,...
	
# Create quota database files
$ quotacheck -cug /data
	
# Calculate current quota values
$ quotacheck -vug /data
	
# Enable quota counting
$ quotaon -vug /data

XFS: Enable quota support for XFS:
# Mount device with quota support for users and groups
$ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...

ZFS: Enable quota support for ZFS:
Make sure that the package libzfs2-devel is installed on your system.
On Debian/Ubuntu systems install libzfslinux-dev.
Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.

2) Perform the BeeGFS installation as usual. Before you start the client services, apply the setting below in the configuration file /etc/beegfs/beegfs-client.conf of all client nodes.
quotaEnabled = true

This setting will cause the client to transfer extra user data to the servers, namelly the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.


Enabling quota for an existing BeeGFS installation


Take these steps if you want to enable quota support for an existing BeeGFS instance that was previously used without quota support.

In this example, we assume that /dev/sdb is the underlying disk or RAID array of a storage target, which is mounted to the directory /data.

1) Stop all BeeGFS server and client services.

2) Enable quota support for the underlying file system on the storage targets, as described below for ext4, XFS and ZFS.

ext4: Enable quota support for ext4:
# Mount device with quota support for users and groups
$ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,...
	
# Create quota database files
$ quotacheck -cug /data
	
# Calculate current quota values
$ quotacheck -vug /data
	
# Enable quota counting
$ quotaon -vug /data

XFS: Enable quota support for XFS:
# Mount device with quota support for users and groups
$ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...

ZFS: Enable quota support for ZFS:
Make sure that the package libzfs2-devel is installed on your system.
On Debian/Ubuntu systems install libzfslinux-dev.
Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.

3) Apply the setting below in the configuration file /etc/beegfs/beegfs-client.conf of all client nodes.
quotaEnabled = true

This setting will cause the client to transfer extra user data to the servers, namelly the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.

4) Start all BeeGFS services.

5) Run the following command on one of the client nodes to update the ownership information of the existing data chunk files on the storage servers for quota tracking. This command can take a while to complete, but it is executed only once and the system can be online while the chunk files are being updated.
$ beegfs-fsck --enablequota

This command could be re-executed if you discover later that some clients didn't have option quotaEnabled set to #true#, and you want to update the ownership information of the data chunk files created in the meantime.


Querying quota information


Quota information can be queried with "beegfs-ctl --getquota". The command directly collects quota reports from all storage servers and quota limits from the management service (if defined) and aggregates all the quota information. A table will be printed for each storage pool. Here are some usage examples.


If the underlying file system of the storage targets is ZFS and therefore, quota of number of files is not supported, the values of the column for used files/inodes will be marked with a dash ("-").


Quota enforcement


This section provides information on how to activate quota enforcement in a BeeGFS system.

Requirements


Quota enforcement requires quota tracking to be enabled, as described above. In addition, all server and client services must be running BeeGFS release 2014.01-r10 or higher. Since release 2015.03-r20, all BeeGFS server daemons get the quota enforcement configuration from the management daemon, making this configuration much simpler. Therefore, 2015.03-r20 is the minimal recommended release.


Enable quota enforcement


Take the steps below on each service to enable quota enforcement in the whole system.


Storage Service Setting

If you are using a BeeGFS release prior to 2015.03-r20, take the following steps to activate quota enforcement on the storage servers. In later releases (ever since 2015.03-r20), this configuration needs to be done only on the management service, as explained further below.

1) Set the option below to true in the storage configuration file /etc/beegfs/beegfs-storage.conf.
quotaEnableEnforcement = true

2) Restart the storage service daemon.


Management Service Settings

Take the following steps below to enable quota enforcement in the system. All options presented in this section are found in file /etc/beegfs/beegfs-mgmtd.conf.

1) Quota reports are collected from the storage targets and quota limits checked by the management service in regular intervals. Such interval is set by option quotaUpdateIntervalMin, in minutes (by default: 10 min). A shorter interval will reduce the time until an exceeded limit is noticed, and the quota enforced. Thus, in order to reduce the possibility of a user momentarily exceeding its limits, this interval should be kept as low as possible. On the other hand, constant queries will cause some workload overhead on the system, possibly reducing performance. So, change this option with caution. If you reduce this interval, please consider changing also the type of quota query, as discussed below.
quotaUpdateIntervalMin = 10

2) Configure the type of query performed by the management daemon to get the user and group IDs. The default type of query is "system", in which uids and gids are retrieved from the same source used by commands getent passwd and getent group. This source could be a central LDAP database or another user management system. When the user database system is slow, "system" might not be the best query type.
quotaQueryType = system

The second valid value for quotaQueryType is "range", which allows you to specify intervals of uids and gids in options quotaQueryUIDRange and quotaQueryGIDRange. In this case, all IDs of the user ID range and the group ID range will be queried. Do not define unnecessarily large ranges, as this could decrease query performance. This query type may help increase performance in cases where only a small range of IDs should be queried, instead of all IDs available in the system.
quotaQueryType = range
quotaQueryUIDRange = 1200,2000
quotaQueryGIDRange = 15000,20000

The third valid value for quotaQueryType is "file", which allows you to specify the uids and gids in two text files (one ID per line). The path to the file with the uids is provided in option quotaQueryUIDFile and the path to file with the gids is provided in the option quotaQueryGIDFile. In this case all uids and gids from the files will be queried. This query type is suitable for cases where the IDs are not sequential.
quotaQueryType = file
quotaQueryGIDFile = /etc/beegfs/groupIDs
quotaQueryUIDFile = /etc/beegfs/userIDs

3) Set the following option to true to activate quota enforcement on the system.
quotaEnableEnforcement = true

4) Restart the management service daemon.

5) These changes won't be noticed be the other server services until they are restarted. Therefore, restart the storage service daemons and the metadata service daemons.


Setting quota limits


Quota limits can be set with the command "beegfs-ctl --setquota". Here are some usage examples.

Example file contents for user quota limits (e.g. located at /tmp/user_quota_limits.txt):
2345,1T,500
8999,5G,20
dbadmin,20G,5000

To load the example user quota limits file and apply the user quota limits:
$ beegfs-ctl --setquota --uid --file=/tmp/user_quota_limits.txt




Project directory quota tracking


The BeeGFS quota management mechanism is based on user and group quota. Group quota can be used for project directories by using the setgid flag on a directory ("chmod g+s /mnt/beegfs/project01"). If this flag is set, all files created in the directory will automatically have the group of the directory instead of the primary group of the user who created the file.

With this approach, it is useful to also create a separate group for the project, e.g. a group project01 and apply it to the project directory ("chown root:project01 /mnt/beegfs/project01"). To avoid conflicts with per-user quota limits, the same approach can be used not only for shared project directories, but also for user directories, in which case each user has its own group.

Alternatively, if you want to track used space or number of files based on subdirectory trees, you might want to look at the Robinhood Policy Engine.

Robinhood can run parallel scans of the file system in regular intervals and store the discovered file and directory information in a SQL database. On the one hand, this has the advantage of enabling various queries of the database with fast results. On the other hand, automatic actions for certain events can be defined in Robinhood, e.g. if the defined used space limit for a certain subdirectory tree is exceeded.

As BeeGFS keeps all the metadata for such scans readily available on the metadata servers (usually flash storage), crawling a file system in parallel is fast. To make sure that the SQL database of Robinhood does not reduce the scan speed, it is recommended to have the Robinhood database also on flash storage.



Back to User Guide - Table of Contents
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki