Data Intensive Computing
A Data Virtualization Architecture
Principal Investigators: Eric Stephan, Karen Schuchardt, Ian Gorton
Challenge
Scientists working in high performance computing and large-scale complex science need data reliability, verifiability, repeatability, and data-sharing capability.
Approach
- Define provenance architecture
- Define provenance model
- Identify technologies to support science
- Test and refine architecture using
- Collaborate with science community through real-world use cases
Impact
This research will result in data virtualization components, services, and application programmatic interfaces (APIs) that together create a data access and integration platform for MeDICi and adaptive workflow. Research teams will be enabled to use system science approaches to address critical scientific and national security challenges.
Collaborations
- MeDICi
- CS&M Scientific Metadata Services
- Computational and Experimental Large-Scale Biology
- SciDAC II
- Computational biologists, bioinformaticists
Accomplishments
- Provided SPARQL extensions to support more robust query capabilities.
- Further investigate provenance storage strategies against real-world problems that involve can scale in terms of throughput coupled with large data sets.
- Defining methods to tailor provenance for each application.
- Defining provenance browser tool for MeDICI. (May 2008)


