Management of Mirror Buddy Groups

Table of Contents (Page)

  1. General Notes
  2. Define Buddy Groups automatically
  3. Define Buddy Groups manually
  4. List defined Mirror Buddy Groups
  5. Define Stripe Pattern

General Notes

Starting with the 2015.03 release series BeeGFS supports mirroring of data across buddy groups. If you are not aware of that feature at all, please read "About BeeGFS Mirroring" first.

In BeeGFS 6.0, metadata mirroring was introduced. If you are using version 2015.03, which only supports storage mirroring, please see here:
Mirror buddy groups are numeric IDs, just like the numeric IDs of the storage targets. Please note that buddy group IDs don't conflict with target IDs, i.e. they don't need to be distinct from storage target IDs.

There are basically two different ways to define buddy groups. They can be defined manually or you can tell BeeGFS to create them automatically.

Of course, defining groups manually gives you greater control and allows you to create more detailed configuration. For example, the automatic mode won't consider targets that are not equally sized. It also doesn't know about the topology of your system, so if you, for example, want to make sure that members of buddy groups are placed in different physical locations you have to define them manually.

Define Buddy Groups automatically

Automatic creation of buddy groups can be done beegfs-ctl, separately for metadata and for storage servers:
 $ beegfs-ctl --addmirrorgroup --automatic --nodetype=meta

 $ beegfs-ctl --addmirrorgroup --automatic --nodetype=storage

Please see the help of beegfs-ctl for more information on available parameters:
 $ beegfs-ctl --addmirrorgroup --help 

Define Buddy Groups manually

Manual definition of mirror buddy groups can be useful if you want to set custom group IDs or if you want to make sure that the buddies are in different failure domains (e.g. different racks). Manual definition of mirror buddy groups is done with the beegfs-ctl tool. By using the following command, you can create a buddy group with the ID 100, consisting of targets 1 and 2:
 $ beegfs-ctl --addmirrorgroup --nodetype=storage --primary=1 --secondary=2 --groupid=100 

Please see the help of beegfs-ctl for more information on available parameters:
 $ beegfs-ctl --addmirrorgroup --help 

When creating mirror buddy groups for metadata manually and one of them contains the root directory, it is necessary to set this one as primary.

List defined Mirror Buddy Groups

Configured mirror buddy groups can be listed with beegfs-ctl (don't forget to specify the node type):
 $ beegfs-ctl --listmirrorgroups --nodetype=storage

 $ beegfs-ctl --listmirrorgroups --nodetype=meta

It's also possible to list mirror buddy groups alongside other target information:
 $ beegfs-ctl --listtargets --mirrorgroups 

Please see the help of beegfs-ctl for more information on available parameters:
 $ beegfs-ctl --listtargets --help 

Define Stripe Pattern

After defining storage buddy mirror groups in your system, you have to define a data stripe pattern that uses it.

Caveats of Storage Mirroring

Storage buddy mirroring provides protection against many failure modes of a distributed system, such as drives failing, servers failing, networks being unstable or failing, and a number of other modes. It does not provide perfect protection if a system is degraded, mostly only for the degraded part of the system. If any storage buddy group is in degraded state, another failure may cause data loss. Administrative actions can also cause data loss or corruption if the system is in an unstable or degraded state. These actions should be avoided if at all possible, for example by ensuring that no access to the system is possible while the actions are performed.

Setting states of active storage targets
When manually changing the state of a storage target from GOOD to NEEDS_RESYNC, clients accessing files during a period of propagation "see" different versions of the global state. This influences data and file locks. Propagation happens every 30 seconds, so the period will not take longer than a minute. This may happen because the state is not synchronously propagated to all clients, which makes the following sequence of events possible:
  1. An administrator sets the state of an active storage target which is the secondary of a buddy group to NEEDS_RESYNC with beegfs-ctl --startresync.
  2. The state is propagated to the primary of the buddy group. The primary will no longer forward written data to the secondary.
  3. A client writes data to a file residing on the buddy group. The data is not forwarded to the secondary.
  4. A different client reads data from the file. If the client attempts to read from the primary, no data loss occurs. If the client attempts to read from the secondary, which is possible without problems in a stable system, the client will receive stale data.

If the two clients in this example used the file system to communicate, eg by calling flock for the file they share, the second client will not see the expected data. Accesses to the file will only stop considering the secondary as a source once all clients have received the updated state information, which may take up to 30 seconds.

Setting the state of a primary storage target may exhibit the same effects. Setting states for targets that are currently GOOD, and by that triggering a switchover, must be avoided while clients are still able to access data on the target. Propagation of the switchover takes some time during which clients may attempt to access data on the target that was set to non-GOOD. If the access was a write, that write may be lost.

Fsync may fail without setting targets to NEEDS_RESYNC
When fsync is configured to propagate to the storage servers and trigger an fsync on the storage servers, an error during fsync may leave the system in an unpredictable state if the error occurred on the secondary of a buddy group. If the fsync operation failed on the secondary due to a disk error the error may be detected only during the next operation of the secondary. If a failover happens before the error is detected the automatic resync from the new primary (old secondary, which has failed) to the new secondary (old primary) may cause data loss.

Activating Metadata Mirroring

After defining metadata mirror buddy groups, you have to activate medatada mirroring.

Back to Buddy Mirroring Overview
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki