Data-Intensive Computing Initiative (DICI)
Adaptive Workflow in Data-Intensive Environments
Technical Contacts: Alan Chappell; George Chin, Jr.
Executive Summary
Researchers and analysts need assistance building complex applications that use available data and computational resources. This research addresses that need by building workflow description and management capabilities that interact with the MeDICI architecture. These capabilities will enable users to easily create applications through a composition of existing analytic components. The unique capabilities developed by this project will:
- include workflow description tools that support visual/graphical creation of complex workflows comprising pipelines that connect heterogeneous computation and data elements.
- efficiently manage state information to enable effective and efficient fault handling and "what if" or robustness experiments.
- enable complete workflows to be reconstructed so that their results can be validated.
Accomplishments / Highlights
- A suite of scientific workflows to use as technology-assessment use cases.
- Workflow description language is Business Process Execution Language (BPEL), a modern standard with rapidly expanding industry acceptance.
- Demonstration that scientific workflow use cases are expressible in BPEL.
- Selection of open source ActiveBPEL workflow engine.
Collaboration
- The Adaptive Workflow project is strongly coupled with the Data-Intensive Computing Initiative (DICI) MeDICI middleware and Data Virtualization projects.
- Example use cases have been drawn from physical science projects in the Initiative and other projects at PNNL.
Demonstration
The MeDICI platform will tightly integrate the MeDICI middleware, Data Virtualization, and Adaptive Workflow DICI projects. Together, they will form the underlying platform for all the demonstrations of the Initiative.
Impacts
The adaptive workflow system resulting from this work will enable researchers, analysts, and decision makers in data-intensive environments to make more effective use of available massive data sets, high-performance and special purpose computing facilities, and collaborative inputs from other complex systems. By enabling the cost-effective creation of composite applications, the system also addresses the growing analytic needs of researchers and agencies.
