Cache API


Overview

The Cache API was developed during the European Commission funded exascale research project DEEP-ER, so the cache API is also known as BeeGFS DEEP-ER Cache API. One goal of the project was the development of a scalable storage environment for Exascale clusters. Our approach contains a storage design and a new API which helps the application developers to optimize the I/O of their applications.

Architecture

A scalable parallel file system for Exascale clusters needs to reduce communication with central system components (such as a global storage or metadata server that would be accessed by all compute nodes) as much as possible and instead switch to a design that can scale linearly with the number of compute nodes in the system.

The following picture illustrates the new storage system design for Exascale cluster.
Architecture of the DEEP-ER cache.

To avoid as much communication as possible with the centralized layer of globally shared storage servers with traditional hard-drives that provide high capacity for globally shared data (shown at the bottom layer in the picture), a cache layer of per-subdomain storage servers based on storage technologies like SSD or NVMe is added (shown in the middle layer of the picture). The use of new storage technologies (such as NVMe) on the cache layer allows significantly faster access, especially for non-sequential I/O, although traditional hard-drives also work. The cache layer allows applications to write data which is not shared with other subdomains (i.e. non-coherent) and thus provides the ability to scale I/O performance linearly with the number of compute nodes. Both of these layers will be managed through BeeGFS.

Access to data on the global storage layer will be seamless to the applications, while access to the cache layer will require the applications to provide hints to the file system. These hints are required because the file system cannot know (at least not without information management at a centralized instance, which is counter productive to the design) which files need to be coherent across the cluster.

In the DEEP-ER project, from the questionnaire results it is clear that most of the data that is generated from the applications does not need to be coherent and thus the cache layer will be very effective. This is especially true for temporary data, such as checkpoints, which will in the typically only reside on the cache layer and eventually be deleted without the need to ever flush them to the global storage layer.


Installation and Configuration

A working BeeGFS cache API requires a global and a cache file system. The BeeGFS cache library and the BeeGFS cache daemon must be installed.

Prepare Global and Cache File System

The global file system should be a common BeeGFS because it should be persistent. More details about BeeGFS are available on the System Architecture wiki page. Detailed installation instructions are provided on the Installation and Setup Guide wiki page.
The cache BeeGFS could be a common BeeGFS configuration like the global BeeGFS or a dynamic BeeGFS configuration which could be configured by the job queuing system of the cluster. BeeOND should be used for a dynamic BeeGFS. More details about BeeOND and the installation are provided on the BeeOND wiki page.

Installation and Configuration of the Cache API

The package (beegfs-deeper-lib-*) for the cache API is available on our website. The package contains the BeeGFS cache library, the BeeGFS cache daemon, configuration files and an example file. Currently only rhel6 and rhel7 packages are available. On request we provide packages for other Linux versions. The versions of the cache API packages are independent from the used BeeGFS version, so we recommend to use the newest version of the cache API packages.

Install the package using the package manager of your Linux system. The include file is installed in /usr/include/deeper/deeper_cache.h. The shared library is installed to /opt/beegfs/lib/libbeegfs-deeper.so.
The cache API must be configured in the configuration file /etc/beegfs/beegfs-deeper-lib.conf
# The mount point of the DEEP-ER BeeGFS cache.
sysMountPointCache            = /scratch_local/

# The mount point of the global BeeGFS storage.
sysMountPointGlobal           = /scratch/

# The ID of the cache FS. ID must be identical on all nodes of the same cache domain but unique to all other cache domains.
sysCacheID                    = 1


Start the cache daemon on a system with the init system #systemd#:
$ systemctl start beegfs-deeper-cached


Start the cache daemon on a system with the init system #sysv#:
$ /etc/init.d/beegfs-deeper-cached start


API Specification

The API specification:
// valid return values for cache functions
#define DEEPER_RETVAL_SUCCESS            0 // return value for success
#define DEEPER_RETVAL_ERROR             -1 // return value for an error

// additional return values for deeper_cache_[flush|prefetch]_is_finished functions
#define DEEPER_IS_NOT_FINISHED           0 // flush/prefetch is ongoing
#define DEEPER_IS_FINISHED               1 // flush/prefetch is finished

// valid deeper_open_flags
#define DEEPER_OPEN_NONE                 0 // no deeper open flags needed
#define DEEPER_OPEN_FLUSHWAIT            1 // wait until flush is finished on close (synchronous)
#define DEEPER_OPEN_FLUSHONCLOSE         2 // flush the deeper cache to global storage on close
#define DEEPER_OPEN_DISCARD              4 // delete file from deeper cache on close
#define DEEPER_OPEN_FLUSHFOLLOWSYMLINKS  8 // do not create symlinks, copy the destination file/dir

// valid deeper_prefetch_flags
#define DEEPER_PREFETCH_NONE             0 // no deeper prefetch flags needed
#define DEEPER_PREFETCH_WAIT             1 // wait until prefetch is finished (synchronous prefetch)
#define DEEPER_PREFETCH_SUBDIRS          2 // also prefetch all subdirectories
#define DEEPER_PREFETCH_FOLLOWSYMLINKS   4 // do not create symlinks, copy the destination file/dir

// valid deeper_flush_flags
#define DEEPER_FLUSH_NONE                0 // no deeper flush flags needed
#define DEEPER_FLUSH_WAIT                1 // wait until flush is finished (synchronous flush)
#define DEEPER_FLUSH_SUBDIRS             2 // also flush all subdirectories
#define DEEPER_FLUSH_DISCARD             4 // delete file from deeper cache after flush
#define DEEPER_FLUSH_FOLLOWSYMLINKS      8 // do not create symlinks, copy the destination file/dir



///////////////////////////////////////////////////////////////////////////////////////////////////
		  /* BeeGFS extensions to access the partitioned DEEP-ER cache layer */
///////////////////////////////////////////////////////////////////////////////////////////////////

/**
 * Create a new directory on the cache layer of the current cache domain.
 *
 * @param path Path and name of new directory.
 * @param mode Permission bits, similar to POSIX mkdir (S_IRWXU etc.).
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_mkdir(const char *path, mode_t mode);

/**
 * Remove a directory from the cache layer.
 *
 * @param path Path and name of directory, which should be removed.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_rmdir(const char *path);

/**
 * Open a directory stream for a directory on the cache layer of the current cache domain and
 * position the stream at the first directory entry.
 *
 * @param name Path and name of directory on cache layer.
 * @return Pointer to directory stream, NULL and errno set in case of error.
 */
DIR *deeper_cache_opendir(const char *name);

/**
 * Close directory stream.
 *
 * @param dirp Directory stream, which should be closed.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_closedir(DIR *dirp);

///////////////////////////////////////////////////////////////////////////////////////////////////

/**
 * Open (and possibly create) a file on the cache layer.
 *
 * Any following write()/read() on the returned file descriptor will write/read data to/from the
 * cache layer. Data on the cache layer will be visible only to those nodes that are part of the
 * same cache domain.
 *
 * @param path Path and name of file, which should be opened.
 * @param oflag Access mode flags, similar to POSIX open (O_WRONLY, O_CREAT, etc).
 * @param mode The permissions of the file, similar to POSIX open. When oflag contains a file
 *        creation flag (O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC) the mode flag is required in other
 *        cases this flag is ignored.
 * @param deeper_open_flags DEEPER_OPEN_NONE or a combination of the following flags:
 *        DEEPER_OPEN_FLUSHONCLOSE to automatically flush written data to global
 *           storage when the file is closed, asynchronously.
 *        DEEPER_OPEN_FLUSHWAIT to make DEEPER_OPEN_FLUSHONCLOSE a synchronous operation, which
 *        means the close operation will only return after flushing is complete.
 *        DEEPER_OPEN_DISCARD to remove the file from the cache layer after it has been closed.
 * @return File descriptor as non-negative integer on success, DEEPER_RETVAL_ERROR and errno set
 *         in case of error.
 */
int deeper_cache_open(const char* path, int oflag, mode_t mode, int deeper_open_flags);

/**
 * Close a file.
 *
 * @param fildes File descriptor of open file.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_close(int fildes);

///////////////////////////////////////////////////////////////////////////////////////////////////

/**
 * Prefetch a file or directory (including contained files) from global storage to the current
 * cache domain of the cache layer, asynchronously.
 * Contents of existing files with the same name on the cache layer will be overwritten.
 *
 * @param path Path to file or directory, which should be prefetched.
 * @param deeper_prefetch_flags DEEPER_PREFETCH_NONE or a combination of the following flags:
 *        DEEPER_PREFETCH_SUBDIRS to recursively copy all subdirs, if given path leads to a
 *           directory.
 *        DEEPER_PREFETCH_WAIT To make this a synchronous operation.
 *        DEEPER_PREFETCH_FOLLOWSYMLINKS To copy the destination file or directory and do not
 *           create symbolic links when a symbolic link was found.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch(const char* path, int deeper_prefetch_flags);

/**
 * Prefetch a file from global storage to the current cache domain of the cache layer,
 * asynchronously. A CRC checksum of the given file is calculated.
 * Contents of existing files with the same name on the cache layer will be overwritten.
 *
 * @param path Path to file or directory, which should be prefetched.
 * @param deeper_prefetch_flags DEEPER_PREFETCH_NONE or a combination of the following flags:
 *        DEEPER_PREFETCH_WAIT To make this a synchronous operation.
 *        DEEPER_PREFETCH_FOLLOWSYMLINKS To copy the destination file or directory and do not
 *           create symbolic links when a symbolic link was found.
 * @param outChecksum The checksum of the file, it is only used in the synchronous case.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch_crc(const char* path, int deeper_prefetch_flags, unsigned long* outChecksum);

/**
 * Prefetch a file similar to deeper_cache_prefetch(), but prefetch only a certain range, not the
 * whole file.
 *
 * @param path Path to file, which should be prefetched.
 * @param pos Start position (offset) of the byte range that should be flushed.
 * @param num_bytes Number of bytes from pos that should be flushed.
 * @param deeper_prefetch_flags DEEPER_PREFETCH_NONE or the following flag:
  *        DEEPER_PREFETCH_WAIT to make this a synchronous operation.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch_range(const char* path, off_t start_pos, size_t num_bytes, int deeper_prefetch_flags);

/**
 * Wait for an ongoing prefetch operation from deeper_cache_prefetch[_range]() to complete.
 *
 * @param path Path to file, which has been submitted for prefetch.
 * @param deeper_prefetch_flags DEEPER_PREFETCH_NONE or a combination of the following flags:
 *        DEEPER_PREFETCH_SUBDIRS To recursively wait contents of all subdirs, if given path leads
 *           to a directory.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch_wait(const char* path, int deeper_prefetch_flags);

/**
 * Wait for an ongoing prefetch operation from deeper_cache_prefetch_crc() to complete.
 *
 * @param path Path to file, which has been submitted for prefetch.
 * @param deeper_prefetch_flags DEEPER_PREFETCH_NONE. Currently ignored, but maybe later some flags
 *           required.
 * @param outChecksum The checksum of the file.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch_wait_crc(const char* path, int deeper_prefetch_flags, unsigned long* outChecksum);

/**
 * Checks if the prefetch of a the given path is finished. Checks the prefetch operation from
 * deeper_cache_prefetch[_range | _crc]().
 *
 * NOTE: The function doesn't report errors from the prefetch. To get the error is a additional wait
 *       required.
 *
 * @param path Path to file, which has been submitted for prefetch.
 * @param deeper_prefetch_flags The flags which was used for the prefetch.
 * @return DEEPER_IS_NOT_FINISHED If prefetch is ongoing or DEEPER_IS_FINISHED if prefetch is
 *         finished or DEEPER_RETVAL_ERROR in case of error.
 */
int deeper_cache_prefetch_is_finished(const char* path, int deeper_prefetch_flags);

/**
 * Stops an ongoing prefetch operation from deeper_cache_prefetch[_range | _crc]() to complete.
 *
 * @param path Path to file, which has been submitted for prefetch.
 * @param deeper_prefetch_flags The flags which was used for the prefetch.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_prefetch_stop(const char* path, int deeper_prefetch_flags);

/**
 * Flush a file from the current cache domain to global storage, asynchronously. Contents of an
 * existing file with the same name on global storage will be overwritten.
 *
 * @param path Path to file, which should be flushed.
 * @param deeper_flush_flags DEEPER_FLUSH_NONE or a combination of the following flags:
 *        DEEPER_FLUSH_WAIT To make this a synchronous operation, which means return only after
 *           flushing is complete.
 *        DEEPER_FLUSH_SUBDIRS To recursively copy all subdirs, if given path leads to a
 *           directory.
 *        DEEPER_FLUSH_DISCARD To remove the file from the cache layer after it has been flushed.
 *        DEEPER_FLUSH_FOLLOWSYMLINKS To copy the destination file or directory and do not create
 *           symbolic links when a symbolic link was found.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush(const char* path, int deeper_flush_flags);

/**
 * Flush a file from the current cache domain to global storage, asynchronously. A CRC checksum of
 * the given file is calculated.
 * Contents of an existing file with the same name on global storage will be overwritten.
 *
 * @param path Path to file, which should be flushed.
 * @param deeper_flush_flags DEEPER_FLUSH_NONE or a combination of the following flags:
 *        DEEPER_FLUSH_WAIT To make this a synchronous operation, which means return only after
 *           flushing is complete.
 *        DEEPER_FLUSH_DISCARD To remove the file from the cache layer after it has been flushed.
 *        DEEPER_FLUSH_FOLLOWSYMLINKS To copy the destination file or directory and do not create
 *           symbolic links when a symbolic link was found.
 * @param outChecksum The checksum of the file, it is only used in the synchronous case.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush_crc(const char* path, int deeper_flush_flags, unsigned long* outChecksum);

/**
 * Flush a file similar to deeper_cache_flush(), but flush only a certain range, not the whole file.
 *
 * @param path Path to file, which should be flushed.
 * @param pos Start position (offset) of the byte range that should be flushed.
 * @param num_bytes Number of bytes from pos that should be flushed.
 * @param deeper_flush_flags DEEPER_FLUSH_NONE or a combination of the following flags:
 *        DEEPER_FLUSH_WAIT to make this a synchronous operation.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush_range(const char* path, off_t start_pos, size_t num_bytes, int deeper_flush_flags);

/**
 * Wait for an ongoing flush operation from deeper_cache_flush[_range]() to complete.
 *
 * @param path Path to file, which has been submitted for flush.
 * @param deeper_flush_flags DEEPER_FLUSH_NONE or a combination of the following flags:
 *        DEEPER_FLUSH_SUBDIRS To recursively wait contents of all subdirs, if given path leads
 *           to a directory.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush_wait(const char* path, int deeper_flush_flags);

/**
 * Wait for an ongoing flush operation from deeper_cache_flush_crc() to complete.
 *
 * @param path Path to file, which has been submitted for flush.
 * @param outChecksum The checksum of the file.
 * @param deeper_flush_flags DEEPER_FLUSH_NONE. Currently unused, but maybe later some flags
 *           required.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush_wait_crc(const char* path, int deeper_flush_flags,
   unsigned long* outChecksum);

/**
 * Checks if the flush of a the given path is finished. Checks the flush operation from
 * deeper_cache_flush[_range | _crc]().
 *
 * NOTE: The function doesn't report errors from the flush. To get the error is a additional wait
 *       required.
 *
 * @param path Path to file, which has been submitted for flush.
 * @param deeper_flush_flags The flags which was used for the flush.
 * @return DEEPER_IS_NOT_FINISHED If flush is ongoing or DEEPER_IS_FINISHED if flush is finished or
 *         DEEPER_RETVAL_ERROR in case of error.
 */
int deeper_cache_flush_is_finished(const char* path, int deeper_flush_flags);

/**
 * Stops an ongoing flush operation from deeper_cache_flush[_range | _crc]() to complete.
 *
 * @param path Path to file, which has been submitted for flush.
 * @param deeper_flush_flags The flags which was used for the flush.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_flush_stop(const char* path, int deeper_flush_flags);
///////////////////////////////////////////////////////////////////////////////////////////////////

/**
 * Return the stat information of a file or directory of the cache domain.
 *
 * @param path To a file or directory on the global file system.
 * @param out_stat_data The stat information of the file or directory.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_stat(const char *path, struct stat *out_stat_data);

///////////////////////////////////////////////////////////////////////////////////////////////////

/**
 * Return a unique identifier for the current cache domain.
 *
 * @param out_cache_id pointer To a buffer in which the ID of the current cache domain will be
 *        stored on success.
 * @return DEEPER_RETVAL_SUCCESS on success, DEEPER_RETVAL_ERROR and errno set in case of error.
 */
int deeper_cache_id(const char* path, uint64_t* out_cache_id);

/**
 * Checks if the required API version of the application is compatible to current API version.
 *
 * @param required_major_version The required major API version of the user application.
 * @param required_minor_version The minimal required minor API version of the user application.
 * @return DEEPER_RETVAL_SUCCESS if the required version and the API version are compatible, if not
 *         DEEPER_RETVAL_ERROR is returned.
 */
int deeper_cache_check_api_version(const unsigned required_major_version, const unsigned required_minor_version);



Code Example

The header file of the cache API is located in the default system include path and will be found automatically during the build of your program. The shared library of the cache API is added to the library lookup path of your Linux system so it should be found by your linker and at runtime.

Example build command:
$ g++ $SOURCE_FILE -o $BINARY_NAME -I /usr/include/ -L /opt/beegfs/lib/ -lbeegfs-deeper -dl -lz -rdynamic


A code example with a asynchronous prefetch and a synchronous flush:
#include <deeper/deeper_cache.h>

int main(int argc, char** argv)
{
   if(argc != 2)
   {
	  std::cout << "Usage: " << argv[0] << " $MOUNT_GLOBAL_BEEGFS" << std::endl;
	  exit(-1);
   }

   std::string globalFS(argv[1]);
   std::string dir(globalFS + "/dir");
   std::string file(dir + "/file");


   // create a directory on global FS with POSIX syscalls
   int funcError = mkdir(dir.c_str(), MODE_FLAG);
   if(funcError)
   {
	  std::cout << "mkdir: can not create directory: " << dir << " errno: " << errno
		 << std::endl;
	  exit(-1);
   }

   // create a file on global FS with POSIX syscalls
   int globalFD = open(file.c_str(),OPEN_FLAGS , MODE_FLAG);
   if(globalFD == -1)
   {
	  std::cout << "open: can not create file: " << file << " errno: " << errno << std::endl;
	  exit(-1);
   }


   // do something with the file in the global FS
   // ...

   close(globalFD);


   // create the directory on cache FS
   funcError = deeper_cache_mkdir(dir.c_str(), MODE_FLAG);
   if(funcError == DEEPER_RETVAL_ERROR)
   {
	  std::cout << "deeper_cache_mkdir: can not create directory: " << dir << " errno: " << errno
		 << std::endl;
	  exit(-1);
   }

   // prefetch the file from global FS to the cache FS
   funcError = deeper_cache_prefetch(file.c_str(), DEEPER_PREFETCH_NONE);
   if(funcError == DEEPER_RETVAL_ERROR)
   {
	  std::cout << "deeper_cache_prefetch: can not prefetch file: " << file << " errno: " << errno
		 << std::endl;
	  exit(-1);
   }

   // wait until the prefetch is finished
   funcError = deeper_cache_prefetch_wait(file.c_str(), DEEPER_PREFETCH_NONE);
   if(funcError == DEEPER_RETVAL_ERROR)
   {
	  std::cout << "deeper_cache_prefetch_wait: can not wait for prefetch of file: " << file
		 << " errno: " << errno << std::endl;
	  exit(-1);
   }

   // open file in the cache FS
   int cacheFD = deeper_cache_open(file.c_str(), O_RDWR, MODE_FLAG, DEEPER_OPEN_NONE);
   if(cacheFD == -1)
   {
	  std::cout << "deeper_cache_open: can not open file: " << file << " errno: " << errno
		 << std::endl;
	  exit(-1);
   }

   // do something with the file in the cache FS
   // ...

   // close file in the cache FS
   funcError = deeper_cache_close(cacheFD);
   if(funcError == DEEPER_RETVAL_ERROR)
   {
	  std::cout << "deeper_cache_close: can not close file: " << file << " errno: " << errno
		 << std::endl;
	  exit(-1);
   }

   // flush the file from the cache FS to the global FS and wait until the flush is finished
   funcError = deeper_cache_flush(file.c_str(), DEEPER_FLUSH_WAIT);
   if(funcError == DEEPER_RETVAL_ERROR)
   {
	  std::cout << "deeper_cache_flush: can not flush (DEEPER_FLUSH_WAIT) file: " << file
		 << " errno: " << errno << std::endl;
	  exit(-1);
   }
}



Back to BeeGFS APIs Overview
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki