BeeGFS 8: Data Management and Beyond

BeeGFS 8 is our most ambitious release yet, built with three goals in mind:

Expand data management capabilities.
Modernize the file system and lay the foundation for future enhancements.
Simplify system administration and third-party integration.

Rather than rehash the release notes, this post shares the thinking behind BeeGFS 8, from the technical decisions we made to the principles that guided them.

Establishing the Ground Rules

Every ambitious release risks becoming bloated and brittle. To keep BeeGFS 8 focused, we established three ground rules that guided our decision making. These rules helped us stay grounded and user-centric, even as separate teams of developers worked on building out different capabilities.

First, at ThinkParQ we don’t just build the buzziest parallel file system in the world, we obsess about finding the simplest solutions to managing data at scale. Some storage systems try to be everything to everyone, and end up delivering the holy grail of mediocrity. The BeeGFS way is to find the simplest solution to complex problems, which led to our first rule:

Integration over reinvention.

Second, while major version updates are exciting for developers, a chance to challenge assumptions about how our software fundamentally works, as file system maintainers we don’t have the luxury of moving fast and breaking things, ever. The second ground rule was obvious:

Don’t break the core file system.

Lastly, we’re proud BeeGFS has a reputation for being the easiest to use parallel file system in the world. But with a major version comes the expectation of shiny new functionality which tends to add complexity. So we established a final rule:

Keep making the user’s life easier.

Integrate Don’t Reinvent Data Management

While there are plenty of existing tools for cataloguing, moving, and otherwise managing data, we hear users want more of these capabilities built into their storage system. At the same time, we recognize there are excellent hierarchical storage management (HSM) solutions out there. The world doesn’t need yet another one. That’s why, instead of trying to replace those systems, we focused on making BeeGFS easier to integrate with existing solutions.

We started by doing what we do best: moving bits around, fast. For simple staging of data and establishing “rsync-like” relationships between any POSIX-compliant file systems, we’ve introduced our new Copy tool. For more advanced use cases, Remote Storage Targets allow directories in BeeGFS to be linked to external storage providers (like an S3 bucket), with support for bidirectional sync and tracking of which files have been offloaded or restored. Both features are managed through the BeeGFS control tool (CTL) and use a pool of worker machines to distribute data transfers, parallelizing the copying of both multiple files and individual large files across multiple machines.

We’re also building out what we call our “data management API”. In the inaugural 8.0 release, we started by revamping our file system modification events, including adding new event types, fields, and exposing the event stream over gRPC, with the flexibility to add new protocols in the future. Throughout the 8 release series we will continue to expand this API and provide richer integration with existing HSM software.

Modernize Without Breaking the Core File System

While not breaking the core may seem like an obvious goal, by clearly establishing it as a rule, it became a guiding light for how new functionality was designed, implemented, and tested.

Minimizing Changes While Maximizing Impact

Take Remote Storage Targets. We could’ve implemented them directly inside the metadata and storage services. But when core stability is top of mind, you start thinking creatively about how to make the smallest possible changes to the most critical parts of the system.

This led to a design where most new functionality was implemented as standalone services that scale independently of the core file system. Only minimal changes were made to the metadata service, extensions of what it already does, like associating entries with remote targets, tracking whether file contents are locally available or offloaded, and adding a persistent on-disk modification event buffer.

Rewriting with Guard Rails

Of course some parts of BeeGFS did get a rewrite from the ground up. The management service (now in Rust) and the CTL tool (now in Go) were completely rewritten and modernized from the inside out. These now use protocol buffers and gRPC for network communication, and in the case of the management incorporate SQLite for on-disk storage instead of a proprietary data format.

But with any major rewrite comes the risk for regressions. To minimize that risk, we used our existing integration test suite as part of our acceptance tests, ensuring behavior remained consistent between BeeGFS 7 and 8. We only modified or removed tests when changes were intentional, and we held the new components to the same (or higher) standards as their predecessors.

Naturally we didn’t stop at functional correctness. We also cooked up some wild new tests, like spinning up 10,000 clients, to sanity-check that the new system components scale the way our users expect.

Keep Calm and Patch On

There are three things certain in life: death, taxes, and the fact that all software has a few bugs lurking in the corner. As we open the floodgates to our broader user base, we know a few corner cases are bound to surface. With BeeGFS 8, we’re embracing semantic versioning to patch issues quickly and predictably without compromising stability.

It Can Always Bee Easier (to use)

The old beegfs-ctl is dead, long live the new beegfs tool! This is the most noticeable change to our user experience, and we hope you love it as much as we do. The new command line tool provides a modern CLI with a structured command/sub-command hierarchy and sane defaults so, in most cases, running commands just works. In addition to the original commands from beegfs-ctl, we’ve added new commands to simplify health checks, investigate network latency, collect support bundles, integrate data management functionality, and more.

Outside CTL, you’ll find a number of subtle changes that improve the BeeGFS experience, notably static string IDs have been replaced with dynamically reconfigurable aliases making it easy to associate nodes and targets with physical machines and devices based on user-defined naming conventions. We also removed the beegfs-helperd service, streamlining client setup and removing extra steps.

Summary

BeeGFS 8 introduces major new capabilities while establishing the design philosophy that will guide the rest of the 8.x series. This release expands BeeGFS’s reach without bloating the core, modernizes critical components without compromising stability, and makes the world’s easiest parallel file system even easier to use.

Please kick the tires and let us know what you think. Whether you’re trying out BeeGFS for the first time or upgrading from 7.x, we’d love your feedback, bug reports, or feature ideas. Start a discussion or open an issue on GitHub!

Author
Joe McCormick, Senior Software Engineer, ThinkParQ