The Globus Replica Management API

Replica management is an important issue for a number of scientific applications. Consider a data set that contains one petabyte (one thousand million megabytes) of experimental results for a particle physics application. While the complete data set may exist in one or possibly several physical locations, it is likely that few universities, research laboratories or individual researchers will have sufficient storage to hold a complete copy. Instead, they will store copies of the most relevant portions of the data set on local storage for faster access. Replica Management is the process of keeping track of where portions of the data set can be found.

A working document of the Global Grid Forum, An Architecture for Replica Management in Grid Computing Environments, describes this concept in detail.

Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica management capabilities for data grids.

Implementation

Our replica management implementation involves a software API (globus_replica_management), an associated library, and a command-line tool providing the same functionality.

The globus_replica_management library provides client functions that allow files to be registered with the replica management service, published to replica locations, and moved among multiple locations. The library uses the Globus Replica Catalog and GridFTP technologies to accomplish this work.

The Globus Replica Management API and library are further described in our working documentation, A Replica Management Service for High-Performance Data Grids.

Availability

Our data grid software is currently available to the public as components of the Globus Toolkit 2.0 release. Prior to this release, the software was tested and evaluated for more than a year by several external project teams who are using our technologies to build data grids for their own use.