Skip to Main Content U.S. Department of Energy
Data Intensive Computing

Data Intensive Computing

Interactive Hypothesis Identification and Evaluation with Data Intensive Visual Analytics and Algorithms

Challenge:

Understanding genomic data from community samples (i.e. metagenomes) is a driving need for revolutionary advances in biotechnology. But making sense of the data avalanche that accompanies metagenomes is generally intractable, because most analytical tools are geared toward spreadsheet analysis on small datasets. Multiple genome analysis and metagenome analysis will need more powerful tools that leverage high performance computing (HPC) and visual analytics approaches.

Approach:

We demonstrate a human-in-the-loop workflow that combines high performance computing (ScalaBLAST and SHOT), efficient post-processing and advanced visualization capabilities (Starlight) within a single integration framework (MeDICi). This is based on a two-pass method where the first pass provides a visual representation of multigenome information from well-characterized species allowing a user to formulate a specific hypothesis to test on metagenome sequence data in the second pass. At each stage, HPC applications are guided based on the user's query— rapidly calculating the underlying sequence similarities needed as the basis for more sophisticated analysis.

The goal is to provide a framework for characterizing the space of functions that can be performed by a microbial community by relating its collective molecular components to those from well-characterized microbial isolates.

Impact:

This capability can simultaneously advance the state-of-the-art in several fields: biology, visual analytics, high performance computing, and scientific workflows.

see caption

Data Intensive Computing

Research Areas

Demonstrations

Highlights

Medici Technology to be Highlighted in Special Issue of Scientific Computing

USCD Director Describes How Global Platform "OptIPuter" Opens New Frontiers

Research Projects

Projects Overview