Data-Intensive Computing Initiative (DICI)
Hybrid Algorithms for Networked System Analysis
Technical Contacts: Daniel Chavarría-Miranda; Keqi Tang; Andrés Márquez
Executive Summary
This project addresses the processing of real-time, scientific streaming data on hybrid high-performance computing (HPC) systems (microprocessor/Field Programmable Gate Arrays [FPGAs]) by developing techniques to handle the real-time constraints imposed by a streaming data source by using FPGAs and real-time software techniques. The techniques will be prototyped for an application in mass spectrometry (multiplexed time-of-flight mass spectrometry [MX-TOF-MS]). The project will also explore the feasibility of integrating non-real-time, higher-level processing of streaming data, such as online database lookups, for added value to the experimental workflow.
Accomplishments / Highlights
- The project is targeting a PNNL-developed multiplexing technique developed by Mikhail Belov at the Environmental Molecular Sciences Laboratory (EMSL) that addresses drawbacks of the Hadamard Transform Ion Mobility MS (HT-IM-MS) multiplexing technique.
- The project is constructing software and FPGA design prototypes on the Cray XD-1 platform to test ideas regarding data streaming:
- A simple FPGA design that generates 64-bit numerical data was constructed to measure the effective bandwidth achieved between an attached FPGA and the SMP memory subsystem on a Cray XD-1 node.
- This VHDL design achieved an effective bandwidth of 1.3 GBps with a large transfer size, which is expected under streaming conditions.
- The project is developing an FPGA design from a software prototype for MX-TOF-MS (developed by Belov at EMSL). The FPGA design will compute the Inverse Transform required to demultiplex a composite signal into its individual components.
Collaboration
- High-speed data streaming between Cray XD-1 nodes will be used to simulate the data capture on a mass spectrometer and its processing. To accomplish this, the project will leverage research conducted for the SciDAC Center for Technology for Advanced Scientific Component Software (TASCS) Common Component Architecture (CCA): a high-speed event transport interface which uses one-sided communication. With this technology, a data rate of 1GBps between nodes has been achieved.
- We are collaborating with EMSL personnel to ensure that our FPGA designs are compatible with the instrumentation board technology to be used in conjunction with TOF mass spectrometers.
Demonstration
- Our project comprises the first phase of the Scientific Discovery and Insight demonstration.
- Our prototype in the context of a hybrid HPC cluster will enable us to determine computation and data rate requirements for processing streaming TOF MS data.
Impacts
The use of HPC systems for streaming data processing and analysis is a novel application that can provide the potential for much improved experimental feedback. To deal with increasing data rates and the sophisticated analysis requirements by domain scientists, it is necessary to migrate streaming processing from the realm of embedded systems to powerful HPC systems with parallel processing and large bandwidth capabilities. This project will apply the large processing capabilities of HPC systems, as well as their abundant internal bandwidth, to real-time streaming applications.


