Chapter 11. WS GRAM Admin Guide

Table of Contents

1. Building and Installing
1.1. Installation Requirements
1.1.1. Transport Level Security (TLS)
1.1.2. Functioning sudo
1.1.3. Local Scheduler
1.1.4. Scheduler Adapter
1.1.5. GridFTP
1.1.6. Reliable File Transfer Service (RFT)
2. Configuring
2.1. Typical Configuration
2.1.1. Configuring sudo
2.1.2. Configuring Scheduler Adapters
2.2. Non-default Configuration
2.2.1. Non-default Credentials
2.2.2. Non-default GridFTP server
2.2.3. Non-default container port
2.2.4. Non-default gridmap
2.2.5. Non-default RFT deployment
2.3. Locating configuration files
2.4. Web service deployment configuration
2.5. JNDI application configuration
2.5.1. Common job factory configuration
2.5.2. Local resource manager configuration
2.6. Security descriptor
2.7. GRAM and GridFTP file system mapping
2.8. Scheduler-Specific Configuration Files
2.9. Disabling an already installed scheduler adapter
2.10. WS GRAM auto-registration with default WS MDS Index Service
2.10.1. Configuring resource properties
2.11. Registering WS GRAM manually with default WS MDS Index Service via Third Party
2.12. Job Description Document Substitution Variables (updates for 4.0.5+)
2.12.1. Changes in WS GRAM beginning with GT version 4.0.5
2.13. Configuring and submitting jobs to WS-GRAM using Condor-G
3. Configuring New Features for 4.0.5+
3.1. Audit Logging (4.0.5+ only)
3.2. SoftEnv Support (4.0.5+ only)
3.3. Job Description Extensions Support (4.0.5+, update pkg available)
3.4. Local RFT Invocations (4.0.5+ only)
4. Deploying
4.1. Deploying in Tomcat
5. Testing
6. Security Considerations
7. Troubleshooting
8. Usage statistics collection by the Globus Alliance

1. Building and Installing

WS GRAM is built and installed as part of a default GT 4.0 installation. For basic installation instructions, see the GT 4.0 System Administrator's Guide.

As part of the WS GRAM service setup, please review our guide for optimal scalability and performance recommendations.

1.1. Installation Requirements

1.1.1. Transport Level Security (TLS)

In order to use WS GRAM, the container must be started with Transport Level security (which is the default). The -nosec option should not be used with globus-start-container.

1.1.2. Functioning sudo

WS GRAM requires that the sudo command is installed and functioning on the service host where WS GRAM software will execute.

Authorization rules will need to be added to the sudoers file to allow the WS GRAM service account to execute (without a password) the scheduler adapter in the accounts of authorized GRAM users. For configuration details, see the Configuring sudo section.

Platform Note: On AIX, sudo is not installed by default, but it is available as source and rpm here: AIX 5L Toolbox for Linux Applications.

1.1.3. Local Scheduler

WS GRAM depends on a local mechanism for starting and controlling jobs. Included in the WS GRAM software is a Fork scheduler, which requires no special software installed to execute jobs on the local host. However, to enable WS GRAM to execute and manage jobs to a batch scheduler, the scheduler software must be installed and configured prior to configuring WS GRAM.

1.1.4. Scheduler Adapter

WS GRAM depends on scheduler adapters to translate the WS GRAM job description document into commands understood by the local scheduler, as well as to monitor the jobs.

Scheduler adapters included in the GT 4.0 release are: PBS, Condor, LSF

Other third party scheduler adapters available for GT 4.0.x releases:

For configuration details, see the Configuring scheduler adapters section.

1.1.5. GridFTP

Though staging directives are processed by RFT (see next section), RFT uses GridFTP servers underneath to do the actual data movement. As a result, there must be at least one GridFTP server that shares a file system with the execution nodes. There is no special process to get staged files onto the execution node before the job executable is run. See the Non-default GridFTP server section of this admin guide for details on how to configure WS GRAM for your GridFTP servers used in your execution environment.

1.1.6. Reliable File Transfer Service (RFT)

WS GRAM depends on RFT to perform file staging and cleanup directives in a job description. For configuration details, see the RFT admin guide.

[Important]Important

Jobs requesting these functions will fail if RFT is not properly setup.

2. Configuring

2.1. Typical Configuration

2.1.1. Configuring sudo

When the credentials of the service account and the job submitter are different (multi user mode), then GRAM will prepend a call to sudo to the local adapter callout command.

[Important]Important

If sudo is not configured properly, the command, and thus job, will fail.

As root, here are the two lines to add to the /etc/sudoers file for each GLOBUS_LOCATION installation, where /opt/globus/GT4.0.5 should be replaced with the GLOBUS_LOCATION for your installation:

# Globus GRAM entries
globus  ALL=(username1,username2) \
   NOPASSWD: /opt/globus/GT4.0.5/libexec/globus-gridmap-and-execute \
   -g /etc/grid-security/grid-mapfile /opt/globus/GT4.0.5/libexec/globus-job-manager-script.pl *
globus  ALL=(username1,username2) \
  NOPASSWD: /opt/globus/GT4.0.5/libexec/globus-gridmap-and-execute 
  -g /etc/grid-security/grid-mapfile /opt/globus/GT4.0.5/libexec/globus-gram-local-proxy-tool *

The globus-gridmap-and-execute program is used to ensure that GRAM only runs programs under accounts that are in the grid-mapfile. In the sudo configuration, it is the first program called. It looks up the account in the grid-mapfile and then runs the requested command. It is redundant if sudo is properly locked down. This tool could be replaced with your own authorization program.

2.1.2. Configuring Scheduler Adapters

The WS GRAM scheduler adapters included in the release tarball are: PBS, Condor and LSF. To install, follow these steps (using PBS as our example):

    
    % cd $GLOBUS_LOCATION\gt4.0.0-all-source-installer
    
    % make gt4-gram-pbs
    
    % make install
            

Make sure the scheduler commands are in your path (qsub, qstat, pbsnodes).

For PBS, another setup step is required to configure the remote shell for rsh access:

    % cd $GLOBUS_LOCATION/setup/globus
    
    % ./setup-globus-job-manager-pbs --remote-shell=rsh
            

The last thing is to define the GRAM and GridFTP file system mapping for PBS. A default mapping in this file is created to allow simple jobs to run. However, the actual file system mappings for your compute resource should be entered to ensure:

  • files staging is performed correctly

  • jobs with erroneous file path directives are rejected

Done! You have added the PBS scheduler adapters to your GT installation.

[Note]Note

For future GT builds with scheduler adapters: scheduler adapters can be enabled by adding --enable-wsgram-pbs to the configure line when building the entire toolkit. For example:

    % configure --prefix=$GLOBUS_LOCATION --enable-wsgram-pbs ...
    % make
    % make install
            

2.2. Non-default Configuration

2.2.1. Non-default Credentials

To run the container using just a user proxy, instead of host creds, edit the $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml file, and either comment out the credentials section...

    <?xml version="1.0" encoding="UTF-8"?>
    <securityConfig xmlns="http://www.globus.org">
    <!--
    <credential>
    <key-file value="/etc/grid-security/containerkey.pem"/>
    <cert-file value="/etc/grid-security/containercert.pem"/>
    <credential>
    -->
    <gridmap value="/etc/grid-security/grid-mapfile"/>
    <securityConfig>
            

or replace the credentials section with a proxy file location...

    <?xml version="1.0" encoding="UTF-8"?>
    <securityConfig xmlns="http://www.globus.org">
    <proxy-file value="<PATH TO PROXY FILE>"/>
    <gridmap value="/etc/grid-security/grid-mapfile"/>
    <securityConfig>
            

Running in personal mode (user proxy), another GRAM configuration setting is required. For GRAM to authorize the RFT service when performing staging functions, it needs to know the subject DN for verification. Here are the steps:

    % cd $GLOBUS_LOCATION/setup/globus
    % ./setup-gram-service-common --staging-subject=
    "/DC=org/DC=doegrids/OU=People/CN=Stuart Martin 564720"
            

You can get your subject DN by running this command:

    % grid-cert-info -subject
            

2.2.2. Non-default GridFTP server

By default, the GridFTP server is assumed to run as root on localhost:2811. If this is not true for your site, then change it by editing the GridFTP host and/or port in the GRAM and GridFTP file system mapping config file:

$GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml
.

2.2.3. Non-default container port

By default, the globus services will assume 8443 is the port the Globus container is using. However the container can be run under a non-standard port, for example:

    % globus-start-container -p 4321
            

2.2.4. Non-default gridmap

If you wish to specify a non-standard gridmap file in a multi-user installation, two basic configurations need to be changed:

  • $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml

    • As specified in the gridmap config instructions, add a <gridmap value="..."/> element to the file as appropriate.

  • /etc/sudoers

    • Change the file path after all -g options:

      -g /path/to/grid-mapfile

Example config for global_security_descriptor.xml:

    ...
    
    <gridmap value="/opt/grid-mapfile"/>
    
    ...
            

Example config for sudoers:

    ...
    
    # Globus GRAM entries
    globus  ALL=(username1,username2) 
    NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute 
    -g /opt/grid-mapfile
    /opt/globus/GT4.0.0/libexec/globus-job-manager-script.pl *
    globus  ALL=(username1,username2) 
    NOPASSWD: /opt/globus/GT4.0.0/libexec/globus-gridmap-and-execute 
    -g /opt/grid-mapfile
    /opt/globus/GT4.0.0/libexec/globus-gram-local-proxy-tool *
    
    ...
            

2.2.5. Non-default RFT deployment

RFT is used by GRAM to stage files in and out of the job execution environment. In the default configuration, RFT is hosted in the same container as GRAM and is assumed to have the same service path and standard service names. This need not be the case. For example, the most likely alternative scenario is that RFT would be hosted separately in a container on a different machine. In any case, both the RFT and the Delegation Service endpoints need to be adjustable to allow this flexibility. The following options can be passed to the setup-gram-service-common script to affect these settings:

    --staging-protocol=<protocol>
    --staging-host=<host>
    --staging-port=<port>
    --staging-service-path=<RFT and Delegation factory service path>
    --staging-factory-name=<RFT factory service name>
    --staging-delegation-factory-name=<name of Delegation factory service used by RFT>
        

For example:

    % setup-gram-service-common \
    --staging-protocol=https
    --staging-host=machine.domain.net
    --staging-delegation-factory-name=machine.domain.net
        

will internally cause the GRAM service code to construct the following EPR addresses.

    http://machine.domain.net:8444/wsrf/services/ReliableFileTransferFactoryService
    http://machine.fakedomain.net:8444/wsrf/services/DelegationFactoryService
        

[Important]Important

Jobs submitted by Condor-G will fail if RFT is not located in the same container as GRAM4, because Condor-G assumes that RFT is located in the same container like GRAM4.

2.3. Locating configuration files

All the GRAM service configuration files are located in subdirectories of the $GLOBUS_LOCATION/etc directory. The names of the GRAM configuration directories all start with gram-service. For example, with a default GRAM installation, the command line:

    % ls etc | grep gram-service

gives the following output:

    gram-service
    gram-service-Fork
    gram-service-Multi

2.4. Web service deployment configuration

The file $GLOBUS_LOCATION/etc/gram-service/server-config.wsdd contains information necessary to deploy and instantiate the GRAM services in the Globus container.

Three GRAM services are deployed:

  • ManagedExecutableJobService: service invoked when querying or managing an executable job

  • ManagedMultiJobService: service invoked when querying or managing a multijob

  • ManagedJobFactoryService: service invoked when submitting a job

Each service deployment information includes:

  • name of the Java service implementation class

  • path to the WSDL service file

  • name of the operation providers that the service reuses for its implementation of WSDL-defined operations

  • etc...

More information about the service deployment configuration information can be found here.

2.5. JNDI application configuration

The configuration of WSRF resources and application-level services not related to service deployment is contained in JNDI files. The JNDI-based GRAM configuration is of two kinds: common job factory and local resource manager.

2.5.1. Common job factory configuration

The file $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml contains configuration information that is common to every local resource manager.

More precisely, the configuration data it contains pertains to the implementation of the GRAM WSRF resources (factory resources and job resources), as well as initial values of WSRF resource properties that are always published by any Managed Job Factory WSRF resource.

The data is categorized by service, because according to WSRF, in spite of the service/resource separation of concern, a given service will use only one XML Schema type of resource. In practice, it is clearer to categorize the configuration resource implementation by service, even if, theoretically speaking, a given resource implementation could be used by several services. For more information, refer to the Java WS Core documentation.

Here is the breakdown, in JNDI objects, of the common configuration data categorized by service. Each XYZHome object contains the same Globus Core-defined information for the implementation of the WSRF resource, such as the Java implementation class for the resource (resourceClass datum), the Java class for the resource key (resourceKeyType datum), etc.

  • ManagedExecutableJobService

    • ManagedExecutableJobHome: configures the implementation of resources for the service.

  • ManagedMultiJobService

    • ManagedMultiJobHome: configures the implementation of resources for the service

  • ManagedJobFactoryService

    • FactoryServiceConfiguration: encapsulates configuration information used by the factory service. Currently it identifies the service to associate with a newly created job resource in order to create an endpoint reference and return it.

    • ManagedJobFactoryHome: implementation of resources for the service resourceClass

    • FactoryHomeConfiguration: contains GRAM application-level configuration data, i.e., values for resource properties common to all factory resources. For example, the path to the Globus installation, host information (such as CPU type), manufacturer, operating system name and version, etc.

2.5.2. Local resource manager configuration

When a SOAP call is made to a GRAM factory service in order to submit a job, the call is actually made to a GRAM service-resource pair, where the factory resource represents the local resource manager to be used to execute the job.

There is one directory, gram-service-<manager>/, for each local resource manager supported by the GRAM installation.

For example, let us assume that the command line:

    % ls etc | grep gram-service-

gives the following output:

    gram-service-Fork
    gram-service-LSF
    gram-service-Multi

In this example, the Multi, Fork and LSF job factory resources have been installed. Multi is a special kind of local resource manager which enables the GRAM services to support multijobs.

The JNDI configuration file located under each manager directory contains configuration information for the GRAM support of the given local resource manager, such as the name that GRAM uses to designate the given resource manager. This is referred to as the GRAM name of the local resource manager.

For example, $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml contains the following XML element structure:

    <service name="ManagedJobFactoryService">
        <!-- LRM configuration:  Fork -->
        <resource
            name="ForkResourceConfiguration"
            type="org.globus.exec.service.factory.FactoryResourceConfiguration">
            <resourceParams>
                [...]
                <parameter>
                    <name>
                        localResourceManagerName
                    </name>
                    <value>
                        Fork
                    </value>
                </parameter>           
                <!-- Site-specific scratchDir
Default: ${GLOBUS_USER_HOME}/.globus/scratch
                <parameter>
                    <name>
                        scratchDirectory
                    </name>
                    <value>
                        ${GLOBUS_USER_HOME}.globus/scratch
                    </value>
                </parameter>           
                -->
            </resourceParams>
        </resource>        
    </service>

In the example above, the name of the local resource manager is Fork. This value can be used with the GRAM command line client in order to specify which factory resource to use when submitting a job. Similarly, it is used to create an endpoint reference to the chosen factory WS-Resource when using the GRAM client API.

In the example above, the scratchDirectory is set to ${GLOBUS_USER_HOME}/.globus/scratch. This is the default setting. It can be configured to point to an alternate file system path that is common to the compute cluster and is typically less reliable (auto purging), while offering a greater amount of disk space (thus "scratch").

2.6. Security descriptor

The file $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml contains the Core security configuration for the GRAM ManagedJobFactory service, which includes:

  • default security information for all remote invocations, such as:

    • the authorization method, based on a gridmap file (in order to resolve user credentials to local user names)

    • limited proxy credentials will be rejected

  • security information for the createManagedJob operation

The file $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml contains the Core security configuration for the GRAM job resources:

  • The default is to only allow the identity that called the createManagedJob operation to access the resource.

Note that, by default, two gridmap checks are done during an invocation of WS-GRAM:

  1. One gridmap check is done by the container as configured by the gridmap element in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml

  2. Another check is done by WS-GRAM when it calls the Perl modules which are used for job submission to the underlying local resource manager. This is configured by the authz element which is, by default, set to gridmap in $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml and $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml. This check is done for additional security reasons to make sure that a potentially hacked globus user account still can only act on behalf of the users who are defined in a grid-mapfile.

The second gridmap check can be avoided by removing the authz element from both WS-GRAM security descriptors. This however does not mean, that no authorization check is done. The container still checks if the client is authorized as defined in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml but there's no further authorization check before calling the Perl modules. It's up to the GT4 container administrator to decide whether or not to have that additional authorization check. Note that a change in the sudo configuration is required in that case because globus-gridmap-and-execute will not be executed. /opt/globus/GT4.0.5 should be replaced with the GLOBUS_LOCATION for your installation:

    # Globus GRAM entries
    globus  ALL=(username1,username2) 
    NOPASSWD: /opt/globus/GT4.0.5/libexec/globus-job-manager-script.pl *
    globus  ALL=(username1,username2) 
    NOPASSWD: /opt/globus/GT4.0.5/libexec/globus-gram-local-proxy-tool *
[Note]Note

GRAM does not override the container security credentials defined in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml. These are the credentials used to authenticate all service requests.

2.7. GRAM and GridFTP file system mapping

The file $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml contains information to associate local resource managers with GridFTP servers. GRAM uses the GridFTP server (via RFT) to perform all file staging directives. Since the GridFTP server and the Globus service container can be run on separate hosts, a mapping is needed between the common file system paths of these two hosts. This enables the GRAM services to resolve file:/// staging directives to the local GridFTP URLs.

Below is the default Fork entry. Mapping a jobPath of / to an ftpPath of / will allow any file staging directive to be attempted.

    <map>
        <scheduler>Fork</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/</jobPath>
           <ftpPath>/</ftpPath>
        </mapping>
    </map>

For a scheduler, where jobs will typically run on a compute node, a default entry is not provided. This means staging directives will fail until a mapping is entered. Here is an example of a compute cluster with PBS installed that has two common mount points between the front end host and the GridFTP server host.

    <map>
        <scheduler>PBS</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/pvfs/mount1/users</jobPath>
           <ftpPath>/pvfs/mount2/users</ftpPath>
        </mapping>
        <mapping>
           <jobPath>/pvfs/jobhome</jobPath>
           <ftpPath>/pvfs/ftphome</ftpPath>
        </mapping>
    </map>

The file system mapping schema doc is here.

2.8. Scheduler-Specific Configuration Files

In addition to the service configuration described above, there are scheduler-specific configuration files for the Scheduler Event Generator modules. These files consist of name=value pairs separated by newlines. These files are:

Table 11.1. Scheduler-Specific Configuration Files

$GLOBUS_LOCATION/etc/globus-fork.conf

Configuration for the Fork SEG module implementation. The attributes names for this file are:

log_path

Path to the SEG Fork log (used by the globus-fork-starter and the SEG). The value of this should be the path to a world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-fork.log. This file must be readable by the account that the SEG is running as.

$GLOBUS_LOCATION/etc/globus-condor.conf

Configuration for the Condor SEG module implementation. The attributes names for this file are:

log_path

Path to the SEG Condor log (used by the Globus::GRAM::JobManager::condor perl module and Condor SEG module. The value of this should be the path to a world-readable and world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-condor.log

$GLOBUS_LOCATION/etc/globus-pbs.conf

Configuration for the PBS SEG module implementation. The attributes names for this file are:

log_path

Path to the SEG PBS logs (used by the Globus::GRAM::JobManager::pbs perl module and PBS SEG module. The value of this should be the path to the directory containing the server logs generated by PBS. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.

$GLOBUS_LOCATION/etc/globus-lsf.conf

Configuration for the LSF SEG module implementation. The attributes names for this file are:

log_path

Path to the SEG LSF log directory. This is used by the LSF SEG module. The value of this should be the path to the directory containing the server logs generated by LSF. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.

2.9. Disabling an already installed scheduler adapter

When WS-GRAM is initialized during startup of the GT container the JNDI configuration is checked for configured scheduler adapters. If you want to disable an already installed scheduler adapter you have to make sure that it's removed from the JNDI configuration. The following explains a way how this could be done:

Say, you installed support for PBS and want to disable PBS now. A listing of the WS-GRAM related directories in $GLOBUS_LOCATION/etc will look like this:

[martin@osg-test1 ~]$ cd $GLOBUS_LOCATION/etc && ls | grep gram-service
gram-service
gram-service-Fork
gram-service-Multi
gram-service-PBS

All you have to do is to remove gram-service-PBS, or better, create an archive before removing it in case you want to enable PBS support at a later time again. After doing that the output of the above command could look like this:

[martin@osg-test1 ~]$ cd $GLOBUS_LOCATION/etc && ls | grep gram-service
gram-service
gram-service-Fork
gram-service-Multi
gram-service-PBS.tar.gz

After restarting the GT server users won't be able to submit jobs to PBS anymore.

2.10. WS GRAM auto-registration with default WS MDS Index Service

With a default GT 4.0.1+ installation, the WS GRAM service is automatically registered with the default WS MDS Index Service running in the same container for monitoring and discovery purposes.

[Note]Note

If you are still using GT 4.0.0, we strongly recommend upgrading to the latest version to take advantage of this capability.

However, if must use GT 4.0.0, or if this registration was turned off and you want to turn it back on, this is how it is configured:

There is a JNDI resource defined in $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml as follows :

 
    <resource name="mdsConfiguration" 
    
    type="org.globus.wsrf.impl.servicegroup.client.MDSConfiguration">
        <resourceParams>
            <parameter> 
            <name>reg</name>
            <value>true</value>
            </parameter>
            <parameter> 
            <name>factory</name>
            <value>org.globus.wsrf.jndi.BeanFactory</value>
            </parameter>
        </resourceParams>
    </resource>
    

To configure the automatic registration of WS GRAM to the default WS MDS Index Service, change the value of the parameter <reg> as follows:

  • true turns on auto-registration; this is the default in GT 4.0.1+.

  • false turns off auto-registration; this is the default in GT 4.0.0.

2.10.1. Configuring resource properties

By default, the GLUECE: resource property (which contains GLUE data) is sent to the default Index Service:

You can configure which resource properties are sent in WS GRAM's registration.xml file, $GLOBUS_LOCATION/etc/gram-service/registration.xml. The following is the relevant section of the file (as it is set by default):

    <Content xsi:type="agg:AggregatorContent"
    xmlns:agg="http://mds.globus.org/aggregator/types">
    
        <agg:AggregatorConfig xsi:type="agg:AggregatorConfig">
        
        <agg:GetResourcePropertyPollType
            xmlns:glue="http://mds.globus.org/glue/ce/1.1">
        <!-- Specifies that the index should refresh information
        every 60000 milliseconds (once per minute) -->
        <agg:PollIntervalMillis>60000</agg:PollIntervalMillis>
        
        <!-- specifies the resource property that should be
        aggregated, which in this case is the GLUE cluster
        and scheduler information RP -->
        
        <agg:ResourcePropertyName>glue:GLUECE</agg:ResourcePropertyName>
        
        </agg:GetResourcePropertyPollType>
        </agg:AggregatorConfig> 
        <agg:AggregatorData/>
    </Content>
        

2.11. Registering WS GRAM manually with default WS MDS Index Service via Third Party

If a third party needs to register an WS GRAM service manually, see Registering with mds-servicegroup-add in the WS MDS Aggregator Framework documentation.

2.12. Job Description Document Substitution Variables (updates for 4.0.5+)

By default only four variables can be used in the job description document which are resolved to values in the service. These are

  • GLOBUS_USER_HOME

  • GLOBUS_USER_NAME

  • GLOBUS_SCRATCH_DIR

  • GLOBUS_LOCATION

2.12.1. Changes in WS GRAM beginning with GT version 4.0.5

To enable communities to define their own system-wide variables and enable their users to use them in their job descriptions, a new generic variable/value config file was added where these variables can be defined. If a job description document contains one of these variables, that file will be used to resolve any matching variables.

A new service parameter in the JNDI container registry defines the path to the variable mapping file. The mapping is done for each scheduler. This file is checked periodically (you may configure the frequency) to see if it has changed. If so, it is reread and the new content replaces the old.

For example, the Fork scheduler has the following entries in $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml which can be configured to determine the location and the refresh period of the variable mapping file:

<parameter>
  <name>
    substitutionDefinitionsFile
  </name>
  <value>
    /root/vdt-stuff/globus/etc/gram-service-Condor/substitution definition.properties
  </value>
</parameter>
<parameter>
  <name>
    substitutionDefinitionsRefreshPeriod
  </name>
  <value>
    <!-- MINUTES -->
    480
  </value>
</parameter>
[Important]Important

If you use variables in the job description document that are not defined in the variable mapping file, the following error occurs during job submission: 'No value found for RSL substitution variable <variableName>'

2.13. Configuring and submitting jobs to WS-GRAM using Condor-G

Condor-G provides command-line tools to run large job submissions to WS-GRAM, to monitor and to destroy them. The following link gives a good introduction on how to configure Condor-G and how to submit jobs to WS-GRAM:

https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4

3. Configuring New Features for 4.0.5+

3.1. Audit Logging (4.0.5+ only)

You can find information about Audit Logging in WS GRAM Audit Logging section (available only with GT versions 4.0.5+).

3.2. SoftEnv Support (4.0.5+ only)

You can find information about SoftEnv support in WS GRAM SoftEnv Support section (available only with GT versions 4.0.5+).

3.3. Job Description Extensions Support (4.0.5+, update pkg available)

You can find information about Job Description Extensions Support in WS GRAM Job Description Extensions Support section (available with GT versions 4.0.5+, update package available for previous versions).

3.4.  Local RFT Invocations (4.0.5+ only)

A new option has been added to WS GRAM to make "local" invocations to RFT instead of Web Service calls. This has shown to improve performance for the WS GRAM service when calling RFT for all file staging and cleanup directives. The default configuration for WS GRAM remains using Web Service calls to RFT.

To configure local method calls from GRAM to RFT, make the following configuration change to $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml:

        
      <parameter>
      <name>
      enableLocalInvocations
      </name>
      <value>
      true
      </value>
      </parameter>
    

More can be read about local invocations here.

4. Deploying

WS GRAM is deployed as part of a standard toolkit installation. Please refer to the GT 4.0 System Administrator's Guide for details.

4.1. Deploying in Tomcat

WS GRAM has been tested to work without any additional setup steps when deployed into Tomcat. Please see the Java WS Core admin guide section on deploying GT4 services into Tomcat for instructions. Also, for details on tested containers, see the WS GRAM release notes.

[Note]Note

Currently only a single deployment is supported because of a limitation in the execution of the Scheduler Event Generator. One must set GLOBUS_LOCATION before starting Tomcat.

5. Testing

See the WS GRAM User's Guide for information about submitting a test job.

6. Security Considerations

No special security considerations exist at this time.

7. Troubleshooting

When I submit a streaming or staging job, I get the following error: ERROR service.TransferWork Terminal transfer error: [Caused by: Authentication failed[Caused by: Operation unauthorized(Mechanism level: Authorization failed. Expected"/CN=host/localhost.localdomain" target but received "/O=Grid/OU=GlobusTest/OU=simpleCA-my.machine.com/CN=host/my.machine.com")

  • Check $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml to see if it uses localhost or 127.0.0.1 instead of the public hostname (in the example above, my.machine.com). Change these uses of the loopback hostname or IP to the public hostname as neccessary.

Fork jobs work fine, but submitting PBS jobs with globusrun-ws hangs at "Current job state: Unsubmitted"

  1. Make sure the log_path in $GLOBUS_LOCATION/etc/globus-pbs.conf points to locally accessible scheduler logs that are readable by the user running the container. The Scheduler Event Generator (SEG) will not work without local scheduler logs to monitor. This can also apply to other resource managers, but is most comonly seen with PBS.

  2. If the SEG configuration looks sane, try running the SEG tests. They are located in $GLOBUS_LOCATION/test/globus_scheduler_event_generator_*_test/. If Fork jobs work, you only need to run the PBS test. Run each test by going to the associated directory and run ./TESTS.pl. If any tests fail, report this to the gram-dev@globus.org mailing list.

  3. If the SEG tests succeed, the next step is to figure out the ID assigned by PBS to the queued job. Enable GRAM debug logging by uncommenting the appropriate line in the $GLOBUS_LOCATION/container-log4j.properties configuration file. Restart the container, run a PBS job, and search the container log for a line that contains "Received local job ID" to obtain the local job ID.

  4. Once you have the local job ID, you can find out if the PBS status is being logged by checking the latest PBS logs pointed to by the value of "log_path" in $GLOBUS_LOCATION/etc/globus-pbs.conf.

    If the status is not being logged, check the documentation for your flavor of PBS to see if there's any futher configuration that needs to be done to enable job status logging. For example, PBS Pro requires a sufficient -e <bitmask> option added to the pbs_server command line to enable enough logging to satisfy the SEG.

  5. If the correct status is being logged, try running the SEG manually to see if it is reading the log file properly. The general form of the SEG command line is as follows:

        $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s pbs -t <timestamp>
        

    The timestamp is in seconds since the epoch and dictates how far back in the log history the SEG should scan for job status events. The command should hang after dumping some status data to stdout.

    If no data appears, change the timestamp to an earlier time.

    If nothing ever appears, report this to the gram-user@globus.org mailing list.

  6. If running the SEG manually succeeds, try running another job and make sure the job process actually finishes and PBS has logged the correct status before giving up and cancelling globusrun-ws. If things are still not working, report your problem and exactly what you have tried to remedy the situtation to the gram-user@globus.org mailing list.

The job manager detected an invalid script response

  • Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).

When restarting the container, I get the following error: Error getting delegation resource

  • Most likely this is simply a case of the delegated credential expiring. Either refresh it for the affected job or destroy the job resource. For more information, see delegation command-line clients.

The user's home directory has not been determined correctly

  • This occurs when the administrator changed the location of the users' home directory and did not restart the GT4 container afterwards. Beginning with version 4.0.3, WS-GRAM determines a user's home directory only once in the lifetime of a container (when the user submits the first job). Subsequently, submitted jobs will use the cached home directory during job execution.

8. Usage statistics collection by the Globus Alliance

The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).

  • job creation timestamp (helps determine the rate at which jobs are submitted)
  • scheduler type (Fork, PBS, LSF, Condor, etc...)
  • jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
  • fileStageIn present in RSL flag (to determine if the staging in of files is used)
  • fileStageOut present in RSL flag (to determine if the staging out of files is used)
  • fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
  • CleanUp-Hold requested flag (to determine if streaming is being used)
  • job type (Single, Multiple, MPI, or Condor)
  • gt2 error code if job failed (to determine common scheduler script errors users experience)
  • fault class name if job failed (to determine general classes of common faults users experience)

If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.

Also, please see our policy statement on the collection of usage statistics.