GT 4.2.1 Glossary


Aggregator Framework

A software framework used to build services that collect and aggregate data. WS MDS Services (such as the Index and Trigger services) are built on the Aggregator Framework, and are sometimes called Aggregator Services.

aggregator services

Services that are built on the Aggregator Framework, such as the WS MDS Index Service and Trigger Service.

aggregator source

A Java class that implements an interface (defined as part of the Aggregator Framework) to collect XML-formatted data. WS MDS contains three aggregator sources: the query aggregator source, the subscription aggregator source, and the execution aggregator source.


Apache Open Source Java based build tool.

Apache Axis

The SOAP engine implementation used within the Globus Toolkit. See the Apache Axis website for details.

Apache Commons

See for more details.


GNU Open Source package used to automatically configure the source code package.


GNU Open Source used to automatically generate Makefile in files.

Axis C++

Open Source SOAP implementation used by C-Hosting. See for more details.


batch scheduler

See the definition for scheduler

Bloom filter

Compression scheme used by the Replica Location Service (RLS) that is intended to reduce the size of soft state updates between Local Replica Catalogs (LRCs) and Replica Location Index (RLI) servers. A Bloom filter is a bit map that summarizes the contents of a Local Replica Catalog (LRC). An LRC constructs the bit map by applying a series of hash functions to each logical name registered in the LRC and setting the corresponding bits.

Bouncy Castle Crypto APIs (what is this for?)


Certificate Authority ( CA )

An entity that issues certificates. [fixme - flesh out]

CA Certificate

The CA's certificate. This certificate is used to verify signature on certificates issued by the CA. GSI typically stores a given CA certificate in /etc/grid-security/certificates/<hash>.0, where <hash> is the hash code of the CA identity.

CA Signing Policy

The CA signing policy is used to place constraints on the information you trust a given CA to bind to public keys. Specifically it constrains the identities a CA is trusted to assert in a certificate. In GSI the signing policy for a given CA can typically be found in /etc/grid-security/certificates/<hash>.signing_policy, where <hash> is the hash code of the CA identity.


A public key plus information about the certificate owner bound together by the digital signature of a CA. In the case of a CA certificate, the certificate is self signed, i.e. it was signed using its own private key.

Certificate Revocation List (CRL)

A list of revoked certificates generated by the CA that originally issued them. When using GSI, this list is typically found in /etc/grid-security/certificates/<hash>.r0, where <hash> is the hash code of the CA identity.

certificate subject

An identifier for the certificate owner, e.g. "/DC=org/DC=doegrids/OU=People/CN=John Doe 123456". The subject is part of the information the CA binds to a public key when creating a certificate.


A process that sends commands and receives responses. Note that in GridFTP, the client may or may not take part in the actual movement of data.


Axis client-side WSDD configuration file. It contains information about the type mappings, the transport and other handlers.

client/server transfer

In a client/server transfer, there are only two entities involved in the transfer, the client entity and the server entity. We use the term entity here rather than process because in the implementation provided in GT4, the server entity may actually run as two or more separate processes.

The client will either move data from or to his local host. The client will decide whether or not he wishes to connect to the server to establish the data channel or the server should connect to him (MODE E dictates who must connect).

If the client wishes to connect to the server, he will send the PASV (passive) command. The server will start listening on an ephemeral (random, non-privileged) port and will return the IP and port as a response to the command. The client will then connect to that IP/Port.

If the client wishes to have the server connect to him, the client would start listening on an ephemeral port, and would then send the PORT command which includes the IP/Port as part of the command to the server and the server would initiate the TCP connect. Note that this decision has an impact on traversing firewalls. For instance, the client's host may be behind a firewall and the server may not be able to connect.

Finally, now that the data channel is established, the client will send either the RETR “filename” command to transfer a file from the server to the client (GET), or the STOR “filename” command to transfer a file from the client to the server (PUT).

command line interface (CLI)

A mechanism for interacting with a computer operating system or software by typing commands to run programs, as opposed to using a mouse pointer on a graphical user interface (GUI).


Both FTP and GridFTP are command/response protocols. What this means is that once a client sends a command to the server, it can only accept responses from the server until it receives a response indicating that the server is finished with that command. For most commands this is not a big deal. For instance, setting the type of the file transfer to binary (called "I" for "image in the protocol"), simply consists of the client sending TYPE I and the server responding with 220 OK. Type set to I. However, the SEND and RETR commands (which actually initiate the movement of data) can run for a long time. Once the command is sent, the client’s only options are to wait until it receives the completion reply, or kill the transfer.


When speaking of GridFTP transfers, concurrency refers to having multiple files in transit at the same time. They may all be on the same host or across multiple hosts. This is equivalent to starting up “n” different clients for “n” different files, and having them all running at the same time. This can be effective if you have many small files to move. The Reliable File Transfer (RFT) service utilizes concurrency to improve its performance.


A job scheduler mechanism supported by GRAM. See for more information.


Also referred to as the "hosting environment." Provides a common runtime environment for web services. It manages the execution of services and resources, and manages their lifecycles. Provides security and data persistence infrasturcure, and other functionality such as managed threading and registry.

A default "standalone" container is provided with a default GT installation.

control channel

The Communication link (TCP) over which commands and responses flow.

Low bandwidth; encrypted and integrity protected by default.


The combination of a certificate and the matching private key.


Concurrent Version System (CVS)

Source code repository used by the Globus Toolkit.


data channel

Communication link(s) over which the actual data of interest flows.

High Bandwidth; authenticated by default; encryption and integrity protection optional.

dual channel protocol

[fixme - this term does not appear anywhere in the gridftp docs] GridFTP uses two channels:

  • One of the channels, called the control channel, is used for sending commands and responses. It is low bandwidth and is encrypted for security reasons.
  • The second channel is known as the data channel. Its sole purpose is to transfer the data. It is high bandwidth and uses an efficient protocol.

By default, the data channel is authenticated at connection time, but no integrity checking or encryption is performed due to performance reasons. Integrity checking and encryption are both available via the client and libraries.

Note that in GridFTP (not FTP) the data channel may actually consist of several TCP streams from multiple hosts.


End Entity Certificate (EEC)

A certificate belonging to a non-CA entity, e.g. you, me or the computer on your desk.

execution aggregator source

An Aggregator Source (included in WS MDS) that executes an administrator-supplied program to collect information and make it available to an Aggregator Service such as the Index Service.

Exolab Core

extended block mode (MODE E)

MODE E is a critical GridFTP components because it allows for out of order reception of data. This in turn, means we can send the data down multiple paths and do not need to worry if one of the paths is slower than the others and the data arrives out of order. This enables parallelism and striping within GridFTP. In MODE E, a series of “blocks” are sent over the data channel. Each block consists of:

  • an 8 bit flag field,
  • a 64 bit field indicating the offset in the transfer,
  • and a 64 bit field indicating the length of the payload,
  • followed by length bytes of payload.

Note that since the offset and length are included in the block, out of order reception is possible, as long as the receiving side can handle it, either via something like a seek on a file, or via some application level buffering and ordering logic that will wait for the out of order blocks.



Pre-OGSI Globus description term that uniquely encompasses Machine Architecture, OS, Compiler and other attributes into a single term, for example: gcc32dbgpthr for a threaded Linux debug distribution.


Terms used to refer to a Unix fork, supported MJS scheduler mechanism.


GAA configuration file

A file that configures the Generic Authorization and Access control GAA libraries. When using GSI, this file is typically found in /etc/grid-security/gsi-gaa.conf.


A cluster monitoring tool (re: WS MDS). See


The GAR (Grid ARchive) file is a single file which contains all the files and information that the container needs to deploy a service. See the Java WS Core Developer's Guide for details.


A command line program used to submit jobs to a GRAM4 service. See the the GRAM4 Commandline page.

Grid Resource Allocation and Management (GRAM) (GRAM)

Comprises a set of WSRF-compliant Web services to locate, submit, monitor, and cancel jobs on Grid computing resources.

grid map file

A file containing entries mapping certificate subjects to local user names. This file can also serve as a access control list for GSI enabled services and is typically found in /etc/grid-security/grid-mapfile. For more information see the Gridmap section here.

grid security directory

The directory containing GSI configuration files such as the GSI authorization callout configuration and GAA configuration files. Typically this directory is /etc/grid-security. For more information see this.

Grid Security Infrastructure (GSI)

GSI stands for Grid Security Infrastructure and is used to describe the original infrastructure of GT security, which is comprised of SSL, PKI and proxy certificates.

GSI authorization callout configuration file

A file that configures authorization callouts to be used for mapping and authorization in GSI enabled services. When using GSI this file is typically found in /etc/grid-security/gsi-authz.conf.



A monitoring service for Condor Pools. See

host certificate

An EEC belonging to a host. When using GSI this certificate is typically stored in /etc/grid-security/hostcert.pem. For more information on possible host certificate locations see the GSI C Developer's Guide.

host credentials

The combination of a host certificate and its corresponding private key.

hosting environment

See container.


Index Service

An aggregator service in WS MDS that serves as a registry similar to UDDI, but much more flexible. Indexes collect information and publish that information as WSRF resource properties.

information provider

A "helper" software component that collects or formats resource information, for use in WS MDS by an aggregator source or by a WSRF service when creating resource properties.



Java-generated documentation files. The Globus Toolkit distribution uses this tool to automatically generate API documentation from the code itself.

JAX-RPC (re: what in GT?)


Java Naming and Directory Interface (JNDI) API are used to access a central transient container registry. The registry is mainly used for discovery of the ResourceHome implementations. However, the registry can also be used store and retrieve arbitrary information. The jndi-config.xml files are used to populate the registry. See the JNDI Tutorial for details.


It is an XML-based configuration file used to populate the container registry accessible via the JNDI API. See in the Java WS Core Developer's Guide] for details.

job description

Term used to describe a GRAM4 job for GT4.

job scheduler

See the term scheduler.


Java testing framework.

Java Virtual Machine (JVM)

The Java Runtime under which OGSI modules run.



Libtool is a GNU library support script that abstracts shared library interface. Used by GSI/Sysconfig. Libtool hides the complexity of using shared libraries behind a portable interface.

For more information, see

See also GSI.

Local Replica Catalog (LRC)

Stores mappings between logical names for data items and the target names (often the physical locations) of replicas of those items. Clients query the LRC to discover replicas associated with a logical name. Also may associate attributes with logical or target names. Each LRC periodically sends information about its logical name mappings to one or more RLIs.

See also RLI.

Log4J (re: what in gt4?)

logical file name

A unique identifier for the contents of a file.

logical name

A unique identifier for the contents of a data item.


A job scheduler mechanism supported by GRAM.

For more information, see



A program that typically operates at a higher level than a job scheduler (typically, above the GRAM level). It schedules and submits jobs to GRAM services.

Managed Executable Job Service (MEJS)


Managed Job Factory Service (MJFS)


Managed Multi Job Service (MMJS)


MMJS subjob

One of the executable jobs in a multijob rendezvous.

Message Passing Interface (MPI)

The Message Passing Interface (MPI) is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users.

For more information, see

MODE command

In reality, GridFTP is not one protocol, but a collection of several protocols. There is a protocol used on the control channel, but there is a range of protocols available for use on the data channel. Which protocol is used is selected by the MODE command. Four modes are defined: STREAM (S), BLOCK (B), COMPRESSED (C) in RFC 959 for FTP, and EXTENDED BLOCK (E) in GFD.020 for GridFTP. There is also a new data channel protocol, or mode, being defined in the GGF GridFTP Working group which, for lack of a better name at this point, is called MODE X.

See also extended block mode (MODE E).

See also stream mode (MODE S).


A job that is itself composed of several executable jobs; these are processed by the MMJS subjob.

See also MMJS subjob.

multijob rendezvous

A mechanism used by GRAM to synchronize between job processes in a multiprocess job and between.


network end points

A network endpoint is generally something that has an IP address (a network interface card). It is a point of access to the network for transmission or reception of data. Note that a single host could have multiple network end points if it has multiple NICs installed (multi-homed). This definition is necessary to differentiate between parallelism and striping.

Network File System (NFS)

The Network File System (NFS) provides remote access to shared file systems across networks.

For more information, see


OpenJMS (re: what in gt4?)

OpenLDAP Open Source Lightweight Directory Access Protocol in the C Language. h


SSL implementation used by GSI. Stands for Open Source Secure Sockets Layer. Distribution in the C Language. For more information, see

operation provider

A reusable Java component that implements one or more web service functions. A web service can be composed of one or more operation providers. See the in the Java WS Core Developer's Guide for details.



When speaking about GridFTP transfers, parallelism refers to having multiple TCP connections between a single pair of network endpoints. This is used to improve performance of transfers on connections with light to moderate packet loss.

Portable Batch System (PBS)

A job scheduler mechanism supported by GRAM. For more information, see

physical file name

The address or the location of a copy of a file on a storage system.

private key

The private part of a key pair. Depending on the type of certificate the key corresponds to it may typically be found in $HOME/.globus/userkey.pem (for user certificates), /etc/grid-security/hostkey.pem (for host certificates) or /etc/grid-security/<service>/<service>key.pem (for service certificates).

For more information on possible private key locations see this.

proxy certificate

A short lived certificate issued using a EEC. A proxy certificate typically has the same effective subject as the EEC that issued it and can thus be used in its place. GSI uses proxy certificates for single sign on and delegation of rights to other entities.

For more information about types of proxy certificates and their compatibility in different versions of GT, see

proxy credentials

The combination of a proxy certificate and its corresponding private key. GSI typically stores proxy credentials in /tmp/x509up_u<uid> , where <uid> is the user id of the proxy owner.

public key

The public part of a key pair used for cryptographic operations (e.g. signing, encrypting).



query aggregator source

An aggregator source (included in WS MDS) that polls a WSRF service for resource property information.


Replica Location Index (RLI)

Collects information about the logical name mappings stored in one or more Local Replica Catalogs (LRCs) and answers queries about those mappings. Each RLI periodically receives updates from one or more LRCs that summarize their contents.

Replica Location Service (RLS)

A distributed registry that keeps track of where replicas exist on physical storage systems. The job of the RLS is to maintain associations, or mappings, between logical names for data objects and one or more target or physical names for replicas. Users or services register data items in the RLS and query RLS servers to find replicas.


As part of the WSRF strategy of keeping the web service and the state information separate from each other, the state information is kept in a separate entity called a resource.

resource properties

A resource is composed of zero or more resource properties which describe the resource. For example, a resource can have the following three resource properties: Filename, Size, and Descriptors. The resource properties are defined in the web service's WSDL interface description.

Resource Specification Language (RSL)

Term used to describe a GRAM job for GT2 and GT3. (Note: This is not the same as RLS - the Replica Location Service)


In Java WS Core, resources are managed and discovered via ResourceHome implementations. The ResourceHome implementations can also be responsible for creating new resources, performing operations on a set of resources at a time, etc. ResourceHomes are configured in JNDI and are associated with a particular web service.

RLS attribute

Descriptive information that may be associated with a logical or target name mapping registered in a Local Replica Catalog (LRC). Clients can query the LRC to discover logical names or target names that have specified RLS attributes.



For more information, see


Utilized by GSI. Open Source Simple Authentication and Security Layer in the C Language. For more information, see


Term used to describe a job scheduler mechanism to which GRAM interfaces. It is a networked system for submitting, controlling, and monitoring the workload of batch jobs in one or more computers. The jobs or tasks are scheduled for execution at a time chosen by the subsystem according to an available policy and availability of resources. Popular job schedulers include Portable Batch System (PBS), Platform LSF, and IBM LoadLeveler.

scheduler adapter

The interface used by GRAM to communicate/interact with a job scheduler mechanism. In GT 4.x, this is both the perl submission scripts and the SEG program.

Scheduler Event Generator (SEG)

The Scheduler Event Generator (SEG) is a program which uses scheduler-specific monitoring modules to generate job state change events. Depending on scheduler-specific requirements, the SEG may need to run with privileges to enable it to obtain scheduler event notifications. As such, one SEG runs per scheduler resource. For example, on a host which provides access to both PBS and fork jobs, two SEGs, running at (potentially) different privilege levels will be running. One SEG instance exists for any particular scheduled resource instance (one for all homogeneous PBS queues, one for all fork jobs, etc). The SEG is implemented in an executable called the globus-scheduler-event-generator, located in the Globus Toolkit's libexec directory.


A process that receives commands and sends responses to those commands. Since it is a server or service, and it receives commands, it must be listening on a port somewhere to receive the commands. Both FTP and GridFTP have IANA registered ports. For FTP it is port 21, for GridFTP it is port 2811. This is normally handled via inetd or xinetd on Unix variants. However, it is also possible to implement a daemon that listens on the specified port. This is described more fully in in the Architecture section of the GridFTP Developer's Guide.


Axis server-side WSDD configuration file. It contains information about the services, the type mappings and various handlers.


See web service.

service certificate

A EEC for a specific service (e.g. FTP or LDAP). When using GSI this certificate is typically stored in /etc/grid-security/<service>/<service>cert.pem. For more information on possible service certificate locations, see this.

service credentials

The combination of a service certificate and its corresponding private key.


For more information, see


SOAP provides a standard, extensible, composable framework for packaging and exchanging XML messages between a service provider and a service requester. SOAP is independent of the underlying transport protocol, but is most commonly carried on HTTP. See the SOAP specifications for details.

standalone container

A simple HTTP server that passes requests to the SOAP engine and it can only serve .wsdl and .xsd files. Included with a standard installation of the Globus Toolkit for use as the build hosting environment for testing or extremely basic deployments.

stream mode (MODE S)

The only mode normally implemented for FTP is MODE S. This is simply sending each byte, one after another over the socket in order, with no application level framing of any kind. This is the default and is what a standard FTP server will use. This is also the default for GridFTP.


When speaking about GridFTP transfers, striping refers to having multiple network endpoints at the source, destination, or both participating in the transfer of the same file. This is normally accomplished by having a cluster with a parallel shared file system. Each node in the cluster reads a section of the file and sends it over the network. This mode of transfer is necessary if you wish to transfer a single file faster than a single host is capable of. This also tends to only be effective for large files, though how large depends on how many hosts and how fast the end-to-end transfer is. Note that while it is theoretically possible to use NFS for the shared file system, your performance will be poor, and would make using striping pointless.

subscription aggregator source

An aggregator source (included in WS MDS) that collects data from a WSRF service via WSRF subscription/notification.

superuser do (sudo)

Allows a system administrator to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while logging the commands and arguments. See for more information.


target name

The address or location of a copy of a data item on a storage system.

third party transfers

In the simplest terms, a third party transfer moves a file between two GridFTP servers.

The following is a more detailed, programmatic description.

In a third party transfer, there are three entities involved. The client, who will only orchestrate, but not actually take place in the data transfer, and two servers one of which will be sending data to the other. This scenario is common in Grid applications where you may wish to stage data from a data store somewhere to a supercomputer you have reserved. The commands are quite similar to the client/server transfer. However, now the client must establish two control channels, one to each server. He will then choose one to listen, and send it the PASV command. When it responds with the IP/port it is listening on, the client will send that IP/port as part of the PORT command to the other server. This will cause the second server to connect to the first server, rather than the client. To initiate the actual movement of the data, the client then sends the RETR “filename” command to the server that will read from disk and write to the network (the “sending” server) and will send the STOR “filename” command to the other server which will read from the network and write to the disk (the “receiving” server).

See Also client/server transfer.

transport-level security

Uses transport-level security (TLS) mechanisms.

Trigger Service

An aggregator service (in WS MDS) that collects information and compares that data against a set of conditions defined in a configuration file. When a condition is met, or triggered, the specified action takes place (for example, an email is sent to a system administrator when the disk space on a server reaches a threshold).

trusted CAs directory

The directory containing the CA certificates and signing policy files of the CAs trusted by GSI. Typically this directory is /etc/grid-security/certificates. For more information see this.


Universally Unique Identifier (UUID)

Identifier that is immutable and unique across time and space.

user certificate

A EEC belonging to a user. When using GSI, this certificate is typically stored in $HOME/.globus/usercert.pem. For more information on possible user certificate locations, see this.

user credentials

The combination of a user certificate and its corresponding private key.


web service



In WS MDS, WebMDS is a web-based interface to WSRF resource property information that can be used as a user-friendly front-end to the Index Service or other WSRF services.

Web Services Addressing (WSA)

The WS-Addressing specification defines transport-neutral mechanisms to address web services and messages. Specifically, it defines XML elements to identify web service endpoints and to secure end-to-end endpoint identification in messages. See the W3C WS Addressing Working Group for details.

Web Services Deployment Descriptor (WSDD)

An Axis XML-based configuration file.

Web Services Description Language (WSDL)

WSDL is an XML document for describing Web services. Standardized binding conventions define how to use WSDL in conjunction with SOAP and other messaging substrates. WSDL interfaces can be compiled to generate proxy code that constructs messages and manages communications on behalf of the client application. The proxy automatically maps the XML message structures into native language objects that can be directly manipulated by the application. The proxy frees the developer from having to understand and manipulate XML. See the WSDL 1.1 specification for details.

Web Services Description Language for Java Toolkit (WSDL4j)

WSDL4J allows the creation, representation, and manipulation of WSDL documents. For more information, see

Web Services Interoperability Basic Profile (WS-I Basic Profile)

The WS-I Basic Profile specification is a set of recommendations on how to use the different web services specifications such as SOAP, WSDL, etc. to maximize interoperability.

Web Services Invocation Framework (WSIF)

For more information, see

Web Services Notification (WSN)

The WS-Notification family of specifications define a pattern-based approach to allowing Web services to disseminate information to one another. This framework comprises mechanisms for basic notification (WS-Notification), topic-based notification (WS-Topics), and brokered notification (WS-BrokeredNotification). See the OASIS Web Services Notification (WSN) TC for details.

Web Services Resource Framework (WSRF)

Web Services Resource Framework (WSRF) is a specification that extends web services for grid applications by giving them the ability to retain state information while at the same time retaining statelessness (using resources). The combination of a web service and a resource is referred to as a WS-Resource. WSRF is a collection of different specifications that manage WS-Resources.

This framework comprises mechanisms to describe views on the state (WS-ResourceProperties), to support management of the state through properties associated with the Web service (WS-ResourceLifetime), to describe how these mechanisms are extensible to groups of Web services (WS-ServiceGroup), and to deal with faults (WS-BaseFaults).

For more information, go to: and OASIS Web Services Notification (WSRF) TC .



XSLT stylesheet processor. For more information, see


Advanced parsing. For more information, see


Native XML database. For more information, see


Extensible Markup Language (XML) is standard, flexible, and extensible data format used for web services. See the W3C XML site for details.


Provides an XML persistance layer using the postgresql database. For more information, see


For more information, see


For more information, see


For more information, see

XML Path Language (XPath)

XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document. See the XPath specification for details.