Please note that these documents are for an OBSOLETE version of the Globus Toolkit. For more information see 5.2 End of Life

GT 5.2.2 Release Notes: GridFTP


1. Component Overview

GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks. The GridFTP protocol is based on FTP, the highly-popular Internet file transfer protocol. We have selected a set of protocol features and extensions defined already in IETF RFCs and added a few additional features to meet requirements from current data grid projects.

2. Feature Summary

Features new in GT 5.2.2:

  • New support for default disk and network stacks: globus-gridftp-server options -dc-default and -fs-default.
  • Added server support for setting file modification time.
  • Added ability (MLSC) to stream directory listings over the control channel.

Features that continue to be supported from previous versions

  • Chrooting GridFTP server
  • Synchronize datasets
  • Improved failure restart capability in globus-url-copy
  • Stall detection
  • Load balancing in globus-url-copy
  • GridFTP over UDT
  • SSH security for GridFTP control channel
  • Running the GridFTP server with GFork GridFTP
  • Multicasting / Network overlays (EXPERIMENTAL)
  • Netlogger's bottleneck detection for GridFTP transfers (EXPERIMENTAL)
  • GSI security: This is the PKI based, de facto standard security system used in Grid applications. Kerberos is also possible but is not supported and can be difficult to use due to divergence in the capabilities of GSI and Kerberos.
  • Third-party transfers: Very common in Grid applications, this is where a client mediates a transfer between two servers (both likely at remote sites) rather than between the server and itself (called a client/server transfer).
  • Cluster-to-cluster data movement or Striping: GridFTP can do coordinated data transfer by using multiple computer nodes at the source and destination.
  • Partial file access: Regions of a file may be accessed by specifying an offset into the file and the length of the block desired.
  • Reliability/restart: The receiving server periodically (the default is 5 seconds, but this can be changed) sends “restart markers” to the client. This marker is a messages specifying what bytes have been successfully written to the disk. If the transfer fails, the client may restart the transfer and provide these markers (or an aggregated equivalent marker), and the transfer will pick up where it left off. This can include “holes” in the file.
  • Large file support: All file sizes, lengths, and offsets are 64 bits in length.
  • Data channel reuse: Data channel can be held open and reused if the next transfer has the same source, destination, and credentials. This saves the time of connection establishment, authentication, and delegation. This can be a huge performance difference when moving lots of small files.
  • Integrated instrumentation (Performance Markers).
  • Logging/audit trail (Extensive Logging in the server).
  • Parallel transfers (Multiple TCP streams between a pair of hosts).
  • TCP Buffer size control (Protocol supports Manual and Automatic; Only Manual Implemented).
  • Server-side computation (Extended Retrieve (ERET) / Extended Store (ESTO) commands).
  • Based on Standards: RFC 959, RFC 2228, RFC 2389, IETF Draft MLST-16 , GGF GFD.020.

Other Supported Features

  • On the client side we provide a scriptable tool called globus-url-copy. This tool can take advantage of all the GridFTP protocol features and can also do protocol translation between FTP, HTTP, HTTPS, and POSIX file IO on the client machine.
  • We also provide a set of development libraries and APIs for developers wishing to add GridFTP functionality to their application.

Deprecated Features

  • None

3. Summary of Changes in GridFTP

3.1. New Features: GridFTP

  • GT-15: Add explicit CWD command to client API
  • GT-164: add a hybrid split/single mode which only creates backend connections if client requests stripes.
  • GT-172: Extend DSI interface to allow DSI-defined ftp commands

3.2. Improvements: GridFTP

None.

4. Fixed Bugs for GridFTP

  • GT-3: gridftp server incorrectly handles relative path configuration values
  • GT-9: Failure in globus_ftp_client_operationattr_set_authorization() results in using freed memory
  • GT-152: MFMT / SITE UTIME not working properly on my mac. gt 5.0.5.
  • GT-165: Threaded server has a race condition with parallel data channels and loading crls
  • GT-166: Threaded server data channel connection error
  • GT-167: UMD Criterion: EGI_GENERIC_SEC_1 (Writable files)
  • GT-182: gridftp truncates pathnames over 4096 chars, misleading errors
  • GT-195: GridFTP acts as wrong user when user doesn't exist
  • GT-230: hybrid mode leaks memory for each transfer after switching to striped
  • GT-241: wrong SIGINT handling in globus-url-copy
  • GT-243: Split or striped mode frontends needlessly disconnect and reconnect to backends
  • GT-244: GridFTP server memory leaks
  • GT-254: Gridftp server uses dynamic string as sprintf argument

5. Known Problems in GridFTP

None.

6. Technology dependencies

GridFTP depends on the following GT components:

  • Non-WS (General) Authentication & Authorization
  • C Common Libraries
  • XIO

GridFTP depends on the following 3rd party software:

  • OpenSSL (version is included in release)

7. Tested platforms

  • Linux

    • CentOS 5, 6 i386, x86_64
    • Debian 6, 7 (testing) i386, x86_64
    • Fedora 15, 16 i386, x86_64
    • Red Hat Enterprise Server 5, 6 i386, x86_64
    • Scientific Linux 5, 6 i386, x86_64
    • Ubuntu 10.04LTS, 10.10, 11.04, 11.10, 12.04 (testing) i386, x86_64

  • Mac OS X

    • Mac OS X 10.7 (Lion)

  • Solaris

    • Solaris 11 x86_64

Tested platforms for GridFTP While the above list includes platforms on which we have tested GridFTP, it does not imply support for a specific platform. However, we are interested in hearing reports of success or bug reports on any platform.

8. Backward compatibility summary

Protocol changes since GT 5.2.1

  • None

API changes since GT 5.2.1

  • None

Exception changes since GT 5.2.1

  • Not Applicable (GridFTP is not Java-based)

Schema changes since GT 5.2.1

  • Not Applicable (GridFTP is not SOAP-based)

9. Associated Standards

Associated standards for GridFTP:

10. For More Information

See GridFTP for more information about this component.

Glossary

C

client

A process that sends commands and receives responses. Note that in GridFTP, the client may or may not take part in the actual movement of data.

client/server transfer

In a client/server transfer, there are only two entities involved in the transfer, the client entity and the server entity. We use the term entity here rather than process because in the implementation provided in GT5, the server entity may actually run as two or more separate processes.

The client will either move data from or to his local host. The client will decide whether or not he wishes to connect to the server to establish the data channel or the server should connect to him (MODE E dictates who must connect).

If the client wishes to connect to the server, he will send the PASV (passive) command. The server will start listening on an ephemeral (random, non-privileged) port and will return the IP and port as a response to the command. The client will then connect to that IP/Port.

If the client wishes to have the server connect to him, the client would start listening on an ephemeral port, and would then send the PORT command which includes the IP/Port as part of the command to the server and the server would initiate the TCP connect. Note that this decision has an impact on traversing firewalls. For instance, the client's host may be behind a firewall and the server may not be able to connect.

Finally, now that the data channel is established, the client will send either the RETR “filename” command to transfer a file from the server to the client (GET), or the STOR “filename” command to transfer a file from the client to the server (PUT).