Frequently Asked Questions about the Data Grid

Below are some of the most common questions that we receive regarding our work related to the Data Grid.

  1. Why does the Globus Project recommend ncftp and wu-ftpd and why did you select these two as the ones to which you added GSI security?
  2. Is it your plan to use ncftp and wu-ftpd as the basis of your data transfer services, or are you developing new services from scratch? If we use ncftp and wu-ftpd now, will it be easier to port to your enhanced tools in the future?
  3. Is there any code available for your data transfer services that we can use now, with or without documentation?

Why does the Globus Project recommend ncftp and wu-ftpd and why did you select these two as the ones to which you added GSI security?

A universal data transfer protocol is a key piece of our Data Grid strategy.  One of the first goals for our Data Grid work was to convince our collaborators to begin using a common protocol in their applications and services.

There is currently no reference implementation of our GridFTP protocol.   Consequently, we needed an interim set of tools that would serve our collaborators until the GridFTP implementation was available.  We needed to provide high-quality, robust, production-oriented tools.  The GridFTP protocol is a superset of a GSI-enabled FTP protocol.  Thus, a GSI-enabled FTP protocol made sense.

When considering which FTP client and server to use for this purpose, our first requirement was that we needed to have access to the source code and permission to redistribute modified versions. This eliminated many commercial and some non-commercial options, but left a few others on the table.

So we asked several system administrators, "If we were to add GSI to some existing FTP clients and servers, which ones should we pick?" The answer we received was that we should use the ncftp client (www.ncftp.com) and wu-ftpd server (www.wu-ftpd.org), because they are widely used and trusted. This is particularly true of wu-ftpd. The system administrators we talked to said that wu-ftpd was really the only FTP server that they trusted and liked.

By selecting ncftp and wu-ftpd, we met our need for a very high-quality, production-oriented FTP client and server that would help us in our goal of convincing our collaborators (and their system administrators) to ubiquitously deploy tools that speak the "gsiftp" protocol.

Is it your plan to use ncftp and wu-ftpd as the basis of your data transfer services, or are you developing new services from scratch? If we use ncftp and wu-ftpd now, will it be easier to port to your enhanced tools in the future?

The wu-ftpd server and ncftp client are file transfer applications. We've enhanced them to use some (but not all) of the protocol extensions described in our white paper. We do not intend to add the full set of protocol extensions to ncftp or wu-ftpd.

The focus of our work is not on developing stand-alone GridFTP tools and applications. Our focus is on delivering a data transfer capability that can be used in Grid applications that require high-performance data transfer, so that application developers won’t have to develop their own code--or worse, their own protocols! We have proposed a specific protocol (GridFTP) for high-performance data transfer on the Grid, and now we are developing a suite of programs, tools, and libraries that will allow and encourage Grid application developers to use the protocol in their applications.

We are developing our GridFTP libraries from scratch, and we are developing some custom client and server programs that use these libraries. Why?

  • Our data transfer libraries and programs are not intended to replace gsi-ncftp or gsi-wuftpd. Rather, they are additions to a family of libraries and programs which all speak the common GridFTP protocol. This family currently includes: gsi-ncftp, gsi-wuftpd, a GSI-enabled HPSS ftpd, a GSI-enabled Unitree ftpd, the Globus GridFTP libraries, GridFTP-enabled GASS tools and libraries, and custom client and server programs built on the Globus libraries. Each of these items uses a core of the GridFTP protocol, and some of them implement additional features of the protocol in a way that is backward compatible.
  • While the ncftp and wu-ftpd programs are high quality, well-known programs, they were designed for general purpose use, not for the specific needs of Grid applications. Adding all of the protocol extensions that we require would be difficult to do with the existing base of code.
  • Many of our target applications require libraries, as opposed to fully packaged programs. Libraries are needed in order to implement very customized behavior. As an obvious example, some applications need to transfer data without storing it in a filesystem. The ncftp and wu-ftpd programs always obtain and deliver their data to and from files.
  • Our goal is to obtain stunning data transfer performance. This will undoubtedly require a great deal of tuning. We will need complete control over a carefully architected code base to obtain that performance.

Again, these new libraries and tools will all be part of a family of tools which implement a common protocol. The result will be an ability to pick clients and servers based on exactly the behaviors and performance that you require.

You can use the ncftp and wu-ftpd programs with GSI security now and continue using them in the future. If you want to be able to use other GridFTP features that we've explained in our white paper (striped data transfers, for example), you will need to update your applications to use tools and libraries that implement those capabilities.

Is there any code available for your data transfer services that we can use now, with or without documentation?

Our Data Grid deliverables page (http://www.globus.org/datagrid/deliverables/) shows the availability of all of our expected programs, tools, libraries, and documentation. The gsi-ncftp and gsi-wuftpd programs are available now.

The API documentation for the Globus GridFTP libraries is also available in draft form on the Globus website, so you can begin coding to the API now. Libraries that implement the APIs (and a fuller range of protocol features) are currently being alpha tested by a small group of external users in preparation for a public beta release.

Please bear in mind that it will take some time to fully optimize the transfer code. For example, obtaining maximal striped transfer performance may require us to add multi-threading support to our basic socket I/O libraries.

Early-adopters are currently using the alpha code in real applications that are informing our work as we complete the code. As a result of this collaboration, we will be able to deliver stunning performance in the not-too-distant future.