A Monitoring System for the Earth System Grid*

The Earth System Grid (ESG) provides climate studies scientists with access to large datasets that are important for their work. These datasets are generated by computational models of the Earth's climate, and they require massive computational power to produce. They are highly-valuable community resources. Most scientists work with portions ("subsets") of this data at any given time, and their analysis work requires them to obtain local copies of subsets drawn from the massive storage systems that contain the complete datasets.

The ESG data is housed in data centers operated by national organizations. ESG users register with a Web portal (www.earthsystemgrid.org) and can then use that portal to discover, browse, and request subsets of datasets. The requested data is retrieved from appropriate storage systems and downloaded to the user's workstation.

The ESG infrastructure is a distributed system made up of physical devices and software services, including:

  • Archival storage systems and disk storage systems at several sites
  • Storage resource managers (SRMs) and GridFTP servers that provide access to the storage systems
  • Metadata catalog services that contain descriptive information about the data
  • Replica location services that keep track of copies of datasets when they're made
  • The Web portal that provides the user interface to the system

These components are integrated to allow ESG users to find and download the data they need based on queries that use terms familiar to them.

The ESG team needed a way to monitor the status of their system components in order to detect and notify interested parties of failures. This need was met using Grid technology. More specifically, it was satisfied with help from the Globus Toolkit's Monitoring and Discovery System (MDS), including the Index Service and Trigger Service.

The resulting monitoring system provides the ESG team with the following benefits.

  • The system is scalable and it adapts when services are added or removed.
  • Status and performance information is made available for use by other system components via a Web service interface.
  • Test results are returned in XML format and can be accessed and used by other software programs.
    • The current system provides a binary UP/DOWN status value and a free form comment.
    • The free form comment can contain, for example, service statistics noting the recent performance or usage history of the system components.
    • Complex XPath queries can be run against these results to support the triggering capability.
  • Web-based visualizers format the status information for review by human operators.

ESG Visualizer

This is the "entry page" for checking out how the various services are doing: status (whether the service is UP or DOWN) and links to more information from the Archiver and Details visualizers. View the ESG Visualizer at http://dc-user.isi.edu:40080/monitor.html. A snapshot of the ESG Visualizer is provided below in Figure 1.


Figure 1. Screenshot of the ESG Visualizer

The first column of the ESG Visualizer shows the names of the services that are being monitored system-wide. "Web Portal" is, of course, the www.earthsystemgrid.org web server interface. The RSL Server entries are replica location services that catalog the files located in each of the ESG data storage systems. The OGSA-DAI entry is a Web service interface to ESG metadata and data access services. The second column is the URL of the service (network location information) with a hyperlink to status details. The third column is a simple UP/DOWN status indicator, and the fourth column is a link to historical data on the service's status.

Details Visualizer

From the ESG Visualizer, the URL link takes you to the Details Visualizer, which provides detailed results from the test script used to determine the UP/DOWN status.


Figure 2. Screenshot of the Details Visualizer

Archive Visualizer

Also from the ESG Visualizer, the History link takes you to the Archive Visualizer, which displays the historical data ESG specified (percentage of uptime, the timestamp of the reading, status of the server, reporter (test script) and URL.)


Figure 3. Screenshot of the Archive Visualizer

Each line of the table in the Archive Visualizer is an archive entry showing the status of the service at a given point in time. The first column is the time at which the archive entry was recorded. The second column is a simple UP/DOWN indicator. The third and fourth columns are the name of the test script that was run to produce the status and the UserID that ran the test script.

Detailed Information

The following links provide more detail about the ESG Monitoring System.

* This website is based on a white paper written in September 2004 by Ben Clifford and Shishir S. Bharathi.