GridFTP: User's Guide

Overview
>Using globus-url-copy
Usage examples
Interactive clients for GridFTP

Using globus-url-copy

This client is a basic URL-to-URL copy.

Contents:

Before you begin

YOU MUST HAVE A CERTIFICATE TO USE globus-url-copy!

1

First, as with all things Grid, you must have a valid proxy certificate to run globus-url-copy. 

If you do not have a certificate, you must obtain one. 

If you are doing this for testing in your own environment, the Simple CA provided with the Globus Tookit should suffice. 

If not, you must contact the Virtual Organization (VO) with which you are associated to see from whom you should request a certificate. 

One common source is the DOE Science Grid CA, although you must confirm whether or not the resources you wish to access will accept their certificates. 

Instructions for proper installation of the certificate should be provided from the source of the certificate.

2

Now that you have a certificate, you must generate a temporary proxy.  Do this by running:

grid-proxy-init 

Further documentation for grid-proxy-init can be found here.

3 You are now ready to use globus-url-copy! See the following sections for syntax and command line options.

Syntax

The basic syntax for globus-url-copy is:

globus-url-copy [optional command line switches] Source_URL Destination_URL 

where:

[optional command line switches]
See Command line options below for a list of available options.
<sourceURL>

Specifies the original URL of the file(s) to be copied.

If this is a directory, all files within that directory will be copied.

<destURL>

Specifies the URL where you want to copy the files.

If you want to copy multiple files, this must be a directory.

Note: Any url specifying a directory must end with /

URL prefixes

As of GT 3.2, we support the following URL prefixes:

  • file:// (on a local machine only)
  • ftp://
  • gsiftp://
  • http://
  • https://

By default, globus-url-copy is expecting the same kind of host certificates that globusrun expects from gatekeepers.

Note: We do not provide an interactive client similar to the generic FTP client provided with Linux.  See Interactive Client for information on an interactive client developed by NCSA / NMI / TeraGrid .

URL formats

URLs can be any valid URL as defined by RFC 1738 that have a protocol we support.  In general, they have the following format:

protocol://[host]:[port]/path 

For example:

gsiftp://myhost.mydomain.com:2812/data/foo.dat
Fully specified.
http://myhost.mydomain.com/mywebpage/default.html
Port not specified so uses protocol default, 80 in this case.
file:///foo.dat
Host not specified so it uses your local host, port not specified as before.
file:/foo.dat
This is also valid, but is not recommended because...???

Note: For FTP URLs, it is legal to specify a user name and password in the URL as follows:

ftp://myname:mypassword@myhost.mydomain.com/foo.dat 

This is highly discouraged as you will be sending your username and password in plain text over the network.  For servers provided in the Globus Toolkit, username and password is not a permitted authentication method and so this format will result in an error (??? what error ???).  The exception to this is anonymous FTP access (how does this work in globus-url-copy).

Command line options

(** denotes new feature in GT 3.2)

Informational Options
-help | -usage 

Prints help.

-version 

Prints the version of this program.

-versions 

Prints the versions of all modules that this program uses.

** -q | -quiet 

Suppresses all output for successful operation.

-vb | -verbose 

During the transfer, displays:

  • number of bytes transferred
  • performance since the last update (currently every 5 seconds)
  • average performance for the whole transfer.
-dbg | -debugftp 

Debugs FTP connections and prints the entire control channel protocol exchange to STDERR. 

Very useful for debugging.  Please provide this any time you are requesting assistance with a globus-url-copy problem.

Utility / Ease of Use  Options
-a | -ascii 

Converts the file to/from ASCII format to/from local file format.

-b | -binary

Does not apply any conversion to the files. This option is turned on by default.

** -f <filename>

Reads a list of URL pairs from a filename.

Each line should contain:

<sourceURL> <destURL> 

Enclose URLs with spaces in double quotes ("). Blank lines and lines beginning with # will be ignored.

** -r | -recurse

Copies files in subdirectories

-notpt | -no-third-party-transfers

Turns third-party transfers off (on by default). 

Site firewall and/or software configuration may prevent a connection between the two servers (a third party transfer).  If this is the case, globus-url-copy will "relay" the data.  It will do a GET from the source and a PUT to the destination. 

This obviously causes a performance penalty, but will allow you to complete a transfer you otherwise could not do.

Reliability Options
** -rst | -restart 

Restarts failed FTP operations.

** -rst-retries <retries>

Specifies the maximum number of times to retry the operation before giving up on the transfer.

Use 0 for infinite.

The default value is 5.

** -rst-interval <seconds>

Specifies the interval in seconds to wait after a failure before retrying the transfer.

Use 0 for an exponential backoff. 

The default value is 0.

** -rst-timeout <seconds>

Specifies the maximum time after a failure to keep retrying. 

Use 0 for no timeout.

The default value is 0.

Performance Options
-tcp-bs <size> | -tcp-buffer-size <size>

Specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels. 

This is critical to good performance over the WAN.  Use the bandwidth-delay product as your buffer size.

-p <parallelism> | -parallel <parallelism>

Specifies the number of parallel data connections that should be used. 

This is one of the most commonly used options.

-bs <block size> | -block-size <block size>

Specifies the size (in bytes) of the buffer to be used by the underlying transfer methods.

Security Related Options
-s <subject> | -subject <subject>

Specifies a subject to match with both the source and destination servers .

-ss <subject> | -source-subject <subject>

Specifies a subject to match with the source server.

-ds <subject> | -dest-subject <subject>

Specifies a subject to match with the destination server.

-nodcau | -no-data-channel-authentication

Turns off data channel authentication for FTP transfers (the default is to authenticate the data channel). 

We do not recommend this option as it is a security risk.

** -dcsafe | -data-channel-safe

Sets data channel protection mode to SAFE.

Otherwise known as integrity or checksumming

Guarantees that the data channel has not been altered, though a malicious party may have observed the data. 

Rarely used as there is a substantial performance penalty.

** -dcpriv | -data-channel-private

Sets data channel protection mode to PRIVATE. 

The data channel is encrypted and checksummed. 

Guarantees that the data channel has not been altered and, if observed, it won't be understandable. 

VERY rarely used due to the VERY substantial performance penalty.

Notes about globus-url-copy

  1. A globus-url-copy using the gsiftp protocol, with no options (using all the defaults) will do a binary, stream mode (which implies no parallelism) transfer, with whatever the host default TCP buffer size is, <feel like there should be a verb here> encrypted and checksummed control channel, and authenticated data channel.
  2. GridFTP (as well as normal FTP) defines multiple wire protocols, or MODES, for the data channel. 

    Most normal FTP servers only implement stream mode, i.e. the bytes flow in order over a single TCP connection.  GridFTP defaults to this mode so that it is compatible with normal FTP servers. 

    However, GridFTP has another MODE, called Extended Block Mode, or MODE E.  This mode sends the data over the data channel in blocks.  Each block consists of 8 bits of flags, a 64 bit integer indicating the offset from the start of the transfer, and a 64 bit integer indicating the length of the block in bytes, followed by a payload of length bytes.  Because the offset and length are provided, out of order arrival is acceptable, i..e, the 10th block could arrive before the 9th because you know explicitly where it belongs.  This allows us to use multiple TCP channels.  If you use the -p | -parallelism option, globus-url-copy automatically puts the servers into MODE E.

    Note: Putting -p 1 is not the same as no -p at all.  Both will use a single stream, but the default will use stream mode and -p 1 will use MODE E.
  3. For more information on TCP buffer sizes and related information, try <here>.
  4. If you run a GridFTP server by hand, you will need to explicitly specify the subject name to expect.  You can use the -ss flag to set the sourceURL subject, and -ds to set the destURL subject.  If you use -s alone, it will set both to be the same.  You can see an example of this usage under the Verification section of this guide.  Please note: This is the unusual case of using this client.  Most times you only need to specify both URLs.