Skip to Main Content U.S. Department of Energy
Data Intensive Computing

Data Intensive Computing

A Data Virtualization Architecture

Principal Investigators: Eric Stephan, Karen Schuchardt, Ian Gorton

Challenge

Scientists working in high performance computing and large-scale complex science need data reliability, verifiability, repeatability, and data-sharing capability.

Approach

  • Define provenance architecture
  • Define provenance model
  • Identify technologies to support science
  • Test and refine architecture using
  • Collaborate with science community through real-world use cases

Impact

This research will result in data virtualization components, services, and application programmatic interfaces (APIs) that together create a data access and integration platform for MeDICi and adaptive workflow. Research teams will be enabled to use system science approaches to address critical scientific and national security challenges.

Collaborations

  • MeDICi
  • CS&M Scientific Metadata Services
  • Computational and Experimental Large-Scale Biology
  • SciDAC II
  • Computational biologists, bioinformaticists

Accomplishments

  • Provided SPARQL extensions to support more robust query capabilities.
  • Further investigate provenance storage strategies against real-world problems that involve can scale in terms of throughput coupled with large data sets.
  • Defining methods to tailor provenance for each application.
  • Defining provenance browser tool for MeDICI. (May 2008)
see caption
Provenance Architecture
see caption
Provenance Model

Data Intensive Computing

Research Areas

Demonstrations

Highlights

Medici Technology to be Highlighted in Special Issue of Scientific Computing

USCD Director Describes How Global Platform "OptIPuter" Opens New Frontiers

Research Projects

Projects Overview