Indiana University Bloomington
IUMSC   Indiana University Molecular Structure Center


  • Michael McRobbie, Craig Stewart, John C. Huffman, Randall Bramley

For much of its history the primary focus of the computational sciences has been on the speed of numerical computation and the hardware, algorithms, and software required to maximize this speed. The needs of data-intensive science received secondary consideration. The steady proliferation of scientific instruments that generate vast amounts of data is demanding a new generation of facilities constructed specifically to support the needs of modern instrument-driven data-intensive science. Such facilities must address the full data life cycle: data capture and remote data reduction; high-speed data transfer; real time data analysis and processing; data storage; data retrieval; data analysis and postprocessing; data visualization; and the use of remote data stores.

Thanks to a successful NSF Major Research Instrumentation Grant, Indiana University will create such a facility (The AVIDD Facility) a distributed facility for managing, Analyzing and Visualizing Instrument-Driven Data flows. It will be distributed among Indiana University's campuses, and integrated with very high bandwidth using IU's new Optical Fiber Infrastructure (OFI). AVIDD will leverage and complement IU's existing high performance computing, storage, and networking facilities, while providing a new facility devoted principally to data-intensive science. AVIDD will consist of these three main components:

  • Data analysis cluster. Data analysis, reduction, and processing will be supported by a geographically distributed cluster built on commodity components. This cluster will have an aggregate 6 TB disk, 192 processors, minimally 192 GB RAM, and a theoretical maximum of 265 GFLOPS. One portion of the cluster will be located in Bloomington (IUB); the other at Indianapolis (IUPUI). A third cluster, a Linux cluster now located at IUB, will be located in Gary (IUN).
  • Data storage facility. This will be include a 6 TB disk cache, integrated with IU's massive data storage system, to provide for seamless transfer of data between disk and tape.
  • Visualization and data presentation environments. The AVIDD facility will create a network of Display Walls: three 8'x6' Display Walls suitable for conferencing and group instruction, and numerous smaller visualization systems that will support high quality 3D visualization, collaborative research, and distance education.

AVIDD will be run as a unified system under a single scheduler, with a single file namespace and transparent movement of files within AVIDD. This distributed facility will provide a local resource close to researchers on each campus, while at the same time providing for economy of scale in attacking very large problems and providing resources to meet dynamic real time analysis and interactive visualization needs. Use of AVIDD in the curriculum at IU and in presentations at national conferences will disseminate knowledge about the new and important techniques developed for handling large data. The main thrust of AVIDD is closely aligned with the just released NSF Distributed Terascale Facility (DTF) solicitation. AVIDD will complement the DTF in its emphasis on data sources, real time end-user interactions with data, and visualization. The creation of AVIDD will result in advances in computer sciences at IU in data-centric and grid computing; new techniques for effective use of high-output instruments; and new knowledge in the many disciplines represented in this proposal. AVIDD will provide researchers with the advanced IT infrastructure to perform analyses more effectively, and to perform research and analyses not now possible.

The recent proliferation, in number and across disciplines, of instruments capable of producing data at very high rates requires a fresh approach to computational science and engineering applications in the sciences. For example, X-Ray diffractometers (XRDs) are among the most important tools available for understanding the three dimensional structure of molecules, and are essential in chemistry, molecular biology, physics and other disciplines. The single-crystal XRDs commonly found in university research facilities are currently capable of collecting nearly 10 gigabytes (GB) of data each day. However, the majority of the instrument time is actually devoted to data transfer, not data collection. Highly valuable nationally shared synchrotron sites such as the Advanced Photon Source (APS) at Argonne National Laboratories produce data at extremely high rates and create an even greater imbalance between the capability to produce data and the capability to manage and understand the data produced. The challenges of handling data from XRDs are not unique.

Read the News Release

Indiana University
Indiana University Molecular Structure Center. Chemistry, A421, Indiana University, 800 E, Kirkwood Ave., Bloomington, IN 47405-7102, 812.855.6821
Privacy Policy | © The Trustees of Indiana University, Copyright Complaints