Components for Grid Data Management


  1. Basic Data Management Mechanisms
  2. Components for Moving and Transferring Data
  3. Components for Optimizing Data Access
  4. Components for Virtual Data

The deployment of sensor nets and satellite imaging systems and increases in the resolution of these imaging systems has resulted in the capture of an enormous amount of raw data. This data, combined with the increasing availability of computing power, leads in turn to mountains of data resulting from analysis. The demand for data storage and management systems has never been greater. At the same time, it isn't enough to simply store data and retrieve it: it must be made available to partners in collaborative projects, optimized for speedy access in different geographic locations, cataloged with descriptive information for easy retrieval, and made available to computation jobs running on the Grid.

Building on the availability of high-capacity storage systems and networks, the Grid community has produced a set of components for working with and managing data on the Grid.

Related solutions: The Solutions section of this website provides examples of these components being used in scientific projects. See especially the Moving Data Fast on the TeraGrid and Large-scale Data Replication for LIGO solutions.

Basic Data Management Mechanisms

Several components in the Grid space are aimed specifically at providing uniform Grid interfaces to various types of data.

  • GridFTP - A uniform, secure, high-performance interface to file-based storage systems on the Grid
  • OGSA-DAI - An OGSA interface for accessing XML and relational data stores
  • Metadata Catalog Service (MCS) - A stand-alone metadata catalog service with an OGSA service interface

Components for Moving and Transferring Data

These tools specialize in moving and transfering data between Grid systems. Each tool meets specialized application or user needs and some also provides interfaces to specialized storage systems.

  • globus-url-copy - A command-line tool for requesting GridFTP transfers
  • Reliable File Transfer (RFT) Service - An OGSA service that allows clients to request data transfers and then "disconnect" while the transfer takes place
  • UberFTP - A text-based interactive client for GridFTP
  • GSI-SCP/SFTP - Popular OpenSSH tools that support Grid authentication

Components for Optimizing Data Access

These tools help optimize the use of storage systems for specialized user communities.

  • Replica Location Service (RLS) - A distributed mechanism for keeping track of the locations of replicated data on a Grid
  • NeST - A "storage appliance" that provides remote access to local data when computation jobs are running
  • DataCutter - A system that uses data filters and streams to segment datasets in efficient ways on a Grid

Note: The Chimera virtual data catalog described in the Computation section is another component related to this area.