The Globus Replica Catalog

Replica management is an important issue for a number of scientific applications. Consider a data set that contains one petabyte (one thousand million megabytes) of experimental results for a particle physics application. While the complete data set may exist in one or possibly several physical locations, it is likely that few universities, research laboratories or individual researchers will have sufficient storage to hold a complete copy. Instead, they will store copies of the most relevant portions of the data set on local storage for faster access. Replica Management is the process of keeping track of where portions of the data set can be found.

A working document of the Global Grid Forum, An Architecture for Replica Management in Grid Computing Environments, describes this concept in detail.

The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more copies of the files on physical storage systems.


Our implementation of the Globus Replica Catalog involves a software API, an associated library, and a command-line tool providing the same functionality.

The globus_replica_catalog library provides client functions that allow manipulation of data in a replica catalog. In this implementation, the library runs against a standard LDAP directory server. (There are currently numerous widely-avaliable commercial and open source LDAP servers that can be used with this library.) 

Despite its current LDAP-based implementation, the API has been constructed to be independent of LDAP, so future implementations could use other access protocols or storage mechanisms (e.g., SQL).

The Globus Replica Catalog is further described in our working documentation, Getting Started with the Globus Replica Catalog.


Our data grid software is currently available to the public as components of the Globus Toolkit 2.0 release. Prior to this release, the software was tested and evaluated for more than a year by several external project teams who are using our technologies to build data grids for their own use.