Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing
An emerging class of data-intensive applications involve the geographically dispersedextraction of complex scientific information from very large collections of measured orcomputed data. Such applications arise, for example, in experimental physics, where thedata in question is generated by accelerators, and in simulation science, where the data isgenerated by supercomputers. So-called Data Grids provide essential infrastructure forsuch applications, much as the Internet provides essential services for applications suchas e-mail and the Web. We describe here two services that we believe are fundamental toany Data Grid: reliable, high-speed transport and replica management. Our high-speedtransport service, GridFTP, extends the popular FTP protocol with new features requiredfor Data Grid applications, such as striping and partial file access. Our replicamanagement service integrates a replica catalog with GridFTP transfers to provide for thecreation, registration, location, and management of dataset replicas. We present thedesign of both services and also preliminary performance results. Our implementationsexploit security and other services provided by the Globus Toolkit.