GT 2.4: GRAM RSL Parameters

Below is the list of RSL paramters that the GRAM jobmanager attempts to locate in the RSL specification and a definition of their use.  Parameters marked with an asterisk (*) are new since version 2.0.

(directory=value)
Specifies the path of the directory the jobmanager will use as the default directory for the requested job.

Default: Current working directory (as set by the gatekeeper).
 
(executable=value) *GASS enabled     * required parameter *
The name of the executable file to run on the remote machine.  If the value is a GASS URL, the file is transferred to the remote gass cache before executing the job and removed after the job has terminated.

Default: None
 
(arguments=value [value] [value] ...)
The command line arguments for the executable.  Use quotes, if a space is required in a single argument.

Example:  ( arguments= "a and b" ccc d )
              argv[1]="a and b"
              argv[2]="ccc"
              argv[3]="d"

Default: NULL

(stdin=value)
*GASS enabled
The name of the file to be used as standard input for the executable on the remote machine.  If the value is a GASS URL, the file is transferred to the remote gass cache before executing the job and removed after the job has terminated.

Default: /dev/null
 
(stdout=value) *GASS enabled
The name of the remote file to store the standard output from the job.  If the value is a GASS URL, the standard output from the job is transferred dynamically during the execution of the job.
 
Default: /dev/null
(stderr=value*GASS enabled
The name of the remote file to store the standard error from the job.  If the value is a GASS URL, the standard error from the job is transferred dynamically during the execution of the job.

Default: /dev/null
 
(count=value)
The number of executions of the executable.

Default: 1
 
(environment=(var value) [(var value)] ...)
The environment variables that will be defined for the executable in addition to default set that is given to the job by the jobmanager.

Examples:  ( environment= (VAR_A value_a) )
               ( environment= (JOE mama)(PI 3.1415) )
The C-shell equivalent of the above examples:
    setenv VAR_A value_a
    setenv JOE mama
    setenv PI 3.1415
The Bourne shell equivalent of the above examples:
    VAR_A=value_a; export VAR_A
    JOE=mama; export JOE
    PI=3.1415; export PI

Default: NULL

(maxTime=value)
The maximum walltime or cputime for a single execution of the executable.  Walltime or cputime is selected by the GRAM scheduler being interfaced.  The units is in minutes.  The value will go through an atoi() conversion in order to get an integer.

Default: None.  Accepts local scheduler default if any.
 
(maxWallTime=value)
Explicitly set the maximum walltime for a single execution of the executable.  The units is in minutes.   The value will go through an atoi() conversion in order to get an integer.  If the GRAM scheduler cannot set walltime, then an error will be returned.

Default: None.  Accepts local scheduler default if any.
 
(maxCpuTime=value)
Explicitly set the maximum cputime for a single execution of the executable.  The units is in minutes.   The value will go through an atoi() conversion in order to get an integer.  If the GRAM scheduler cannot set cputime, then an error will be returned.

Default: None.  Accepts local scheduler default if any.
 
(jobType=single|multiple|mpi|condor)
This specifies how the jobmanager should start the job.
  single -  Even if the count > 1, only start 1 process or thread
  multiple -  start count processes or threads
  mpi -  use the appropriate method (e.g. mpirun on SGI Origin or POE on IBM SP) to start a program compiled with a vendor-provided MPI library.   Program is started with count nodes.
  condor -  starts condor jobs in the "condor" universe.  (default is vanilla)

Default: multiple
 
(gramMyJob=independent|collective)
This specifies how the gram myjob interface will behave in the started processes.
  independent -  Even if the count > 1, only start 1 process or thread
  collective -  gram_myjob_count() will return count for each of the processes.  gram_myjob_rank() will return a unique value between 0 and count-1 for each of the processes.

Default: collective
 
(queue=value)
Target the job to a queue (class) name as defined by the scheduler at the defined (remote) resource.

Default: None
 
(project=value)
Target the job to be allocated to a project account as defined by the scheduler at the defined (remote) resource.

Default: None
 
(hostCount=value)
Only applies to clusters of SMP computers, such as newer IBM SP systems. Defines the number of nodes ("pizza boxes") to distribute the "count" processes across.

Default: None
 
(dryRun=yes|no)
If dryrun = yes then the jobmanager will not submit the job for execution and will return success.

Default: no
 
(minMemory=value)
Specify the minumum amount of memory required for this job.  Units are in Megabytes.

Default: None
 
(maxMemory=value)
Specify the maximum amount of memory required for this job.  Units are in Megabytes.

Default: None
 
(save_state=yes|no) *
Causes the jobmanager to save job state/information to a persistent file on disk. If the jobmanager crashes, the client can later start up a new jobmanager that can take over watching of the job.

Default: No
 
(two_phase=<int>) *
Implement a two-phase commit for job submission and completion.

For job submission, the jobmanager will respond to the initial job request with a WAITING_FOR_COMMIT error. It will then wait for a signal from the client before doing the actual job submission. The integer supplied is the number of seconds the jobmanager should wait before timing out. If the jobmanager
times out before receiving a commit signal (or the client issues a cancel), the jobmanager will clean up the job's files and exit (after sending a FAILED callback).

For job completion, after the jobmanager sends a DONE or FAILED callback (the final callback), it will wait for a commit signal from the client. If it receives one, it cleans up and exits as usual. If it times out and save_state was enabled, it will leave all the job's files in-place and exit (assuming the client is down and will attempt a job restart later). The timeout value can be extended via a signal.  When one of the below errors occurs, the jobmanager doesn't not delete the job state file when it exits. Since it can be restarted in these cases, it doesn't wait for the commit signal after sending the FAILED callback.

GLOBUS_GRAM_PROTOCOL_ERROR_COMMIT_TIMED_OUT
GLOBUS_GRAM_PROTOCOL_ERROR_TTL_EXPIRED
GLOBUS_GRAM_PROTOCOL_ERROR_JM_STOPPED
GLOBUS_GRAM_PROTOCOL_ERROR_USER_PROXY_EXPIRED

Default: None
 
(restart=<old JM contact>) *
Start a new jobmanager but instead of submitting a new job, start watching over an existing job. The jobmanager will search for the job state file created by the original jobmanager (requires that save_state was enabled in the original submission). If it finds the file and successfully reads it, it will become the new watcher of the job, sending callbacks on status and streaming stdout/err if appropriate. It will return FAILED to the request with an error code indicating that it is the case that the old jobmanager is still alive (via a timestamp in the state file). If stdout/err was being streamed over the network, new stdout and stderr attributes can be specified in the restart RSL and the jobmanager will stream to the new locations (useful when output is going to a GASS server started by the client that's listening on a dynamic port, and the client was restarted). The new jobmanager will return a new contact string that should be used to communicate with it. If a jobmanager is restarted multiple times, any of the previous contact strings can be given for the restart attribute.

Default: None
 
(stdout_position=<int>) *
(stderr_position=<int>)
Can be specified as part of a job restart RSL. Specifies where in the file streaming should be restarted from for streamed output.

Default: None
 
(remote_io_url=<url base>) *
Writes the given url to a file and puts GLOBUS_REMOTE_IO_URL=<path to file> in the job's environment. If specified as part of a job restart RSL, updates the contents of the file. This is intended for jobs that want to access files from the client via GASS, but the port the GASS server is listening on can change if the client crashes and recovers.
 
So using the environment variable GLOBUS_REMOTE_IO_URL the job could read the contents of the file in order to get the url_base.  The url base will look something like this:
https://ept.mcs.anl.gov:43744
Then the job could fetch a number of files via GASS or globus-url-copy using the url base.  If your job was a script it could use the url base to transfer a program and then run it.
For example:
  globus-url-copy http://ept.mcs.anl.gov:43744/bin/ls \
                        file:/some/dir/transferred_ls
  /some/dir/transferred_ls

Default: None