NIAID Systems Biology for Infectious Diseases Research Program Processed Data

Project Image

Principal Investigator


Reusable Digital Data Lifecycle Downloads for Omics-Lethal Human Virus Systems Biology

The Omics-LHV Systems Virology project was one of four projects, funded by the NIAID Systems Biology for Infectious Diseases Research Program (funded from 2008-2013), was established in developing and validating predictive models of infectious disease initiation, progression, and anticipated outcomes. Research models derived from experimental study datasets provide a systems-wide host/pathogen molecular interaction networks during infection, using integrated datasets generated from a combination of "omics" technologies, and serve to support a deeper understanding of viral infection complexity and the biological, biochemical, and biophysical molecular processes within microbial organisms as well as their interaction with the host.

Research Highlights

"Virologist’s Coronavirus Paper in Top 50" - Sims’s research among most-downloaded articles on SARS-CoV-2

Reusable FAIRsharing DOI Repository Standards

Primary Data Archive

Mass Spectrometry - proteomics, metabolomics, lipidomics (reusable formats: raw, mzML, xml)

Sequencing - transcriptomics, microarray (reusable formats: txt, txt.gz, xml)

Secondary Data Archive

Integrated Multi-Omics - transcriptomics, proteomics, metabolomics, lipidomics + experimental sample metadata

  • PNNL DataHub: 10.17616/R31NJN56 (reusable formats: csv, txt, xlsx, psf, xml, json)

Accessible Secondary DOI Digital Data Downloads

    Secondary host-associated viral dataset downloads contain one or more qualitative and/or quantitative statistically processed (normalization data transformation) expression analysis file(s) resulting from primary viral experimental study designs leveraging unique high-resolution Omics capabilities. Proteomic, metabolomic, lipidomic, and/or transcriptomics dataset downloads each have a direct relationship to a primary sample data submission corresponding to a Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) viral infection comprised of comprehensive time course experimental sample collections.


    1. SM001 DataHub DOI: 10.25584/LHVSM001/1661974
    2. SM003 DataHub DOI: 10.25584/LHVSM003/1661975
    3. SM012 DataHub DOI: 10.25584/LHVSM012/1661983
    4. SCL005 DataHub DOI: 10.25584/LHVSCL005/1661964
    5. SCL006 DataHub DOI: 10.25584/LHVSCL006/1661965
    6. SCL008 DataHub DOI: 10.25584/LHVSCL008/1661966
    7. SCL009 DataHub DOI: 10.25584/LHVSCL009/1661967
    8. SCL012 DataHub DOI: 10.25584/LHVSCL012/1661970
    9. SHAE002 DataHub DOI: 10.25584/LHVSHAE002/1661971
    10. SHAE003 DataHub DOI: 10.25584/LHVSHAE003/1661972
    11. SHAE004 DataHub DOI: 10.25584/LHVSHAE004/1661973

    Linked Open Primary Data Downloads

    Primary Sequencing Data

    The Gene Expression Omnibus (GEO) is a public domain community data repository, supported by the NIH, promoting the free exchange of MIAME-compliant gene expression profile and array-based data. 

    Primary transcriptome Agilent and Affymetrix microarray experimental data collections (txt and txt.gz) and associated metadata are openly available and have been submitted to the NCBI as BioProject (PRJNA) accessions and link to primary publication data accessions where possible. 

    Primary Mass Spectrometry Data

    The Mass Spectrometry Interactive Virtual Environment (MassIVE), PRIDE PRoteomics IDEntifications (PRIDE), and the PeptideAtlas are all public domain community data repositories promoting the free exchange of mass spectrometry data. 

    Primary mass spectrometry proteome, metabolome, and lipidome data (raw and mzML) and corresponding parameter files (xml), including those used for accurate mass and time (AMT) tag database generation, are openly available have been uploaded to the PeptideAtlas (PASS), PRIDE (PRD), and MassIVE (MSV) data repository accessions, linking to primary publication data accessions where possible.

    Primary Experimental Study Design Metadata

    The Virus Pathogen Database and Analysis Resource (ViPR) is a comprehensive and highly curated repository of genome and protein sequence records and annotations for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. 

    Primary structured experimental methods, protocols, and metadata supporting derived quantification raw data collections, including fold change values and p-values (.xls,.txt), are available at the Virus Pathogen Resource (ViPR) repository under the assigned digital object identifiers.

    Acknowledgment of Federal Funding

    The data described here was funded in whole or in part by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under Contract No. HHSN272200800060C. Omics data generated by the Systems Virology Proteomics, Metabolomics, and Lipidomics Core were performed at Pacific Northwest National Laboratory (PNNL) in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the Department of Energy’s (DOE) Office of Biological and Environmental Research located in Richland, WA. PNNL is operated by the Battelle Memorial Institute for the DOE under contract number DE-AC05-76RLO1830. 

    Project Page Citation

    Anderson LN, McDermott J, Waters K, Sims A, Baric R. (2021). Omics Lethal Human Viruses Project, Modeling Host Responses to Severe Acute Respiratory Syndrome (SARS) Infection Post-Processed Data Package DOIs. United States. DOI:

    Project status

    Datasets (11)
    Publications (4)
    People (4)

    Tom Metz is a Principal Investigator within the Integrative Omics group at PNNL and the Metabolomics Team Lead for a group of scientists that focuses on development and applications of high throughput metabolomics and lipidomics methods to various biological questions. He has worked to develop state...

    Dr. Jason McDermott, senior research scientist, has extensive research experience in molecular and structural virology and data resource design, data integration and prediction of biological networks, bridging experimental and computational biology. Currently, his research interests include data...

    Lindsey Anderson’s research has been dedicated to the identification and characterization of novel, targeted and non-targeted, functional metabolic interactions using a high-throughput systems biology and computational biology approach. Her expertise in functional metabolism and multidisciplinary...

    Biography Bobbie-Jo Webb-Robertson has 20 years of experience in the statistics and data science fields. She currently serves as the chief scientist of computational biology in the Biological Sciences Division at PNNL. Her research portfolio is largely related to the biomedical field and primarily...

    Data Sources (3)
    Institutions (1)