Reusable Digital Data Lifecycle Downloads for Omics-Lethal Human Virus Systems Biology
The Omics-LHV Systems Virology project was one of four projects, funded by the NIAID Systems Biology for Infectious Diseases Research Program (funded from 2008-2013), was established in developing and validating predictive models of infectious disease initiation, progression, and anticipated outcomes. Research models derived from experimental study datasets provide a systems-wide host/pathogen molecular interaction networks during infection, using integrated datasets generated from a combination of "omics" technologies, and serve to support a deeper understanding of viral infection complexity and the biological, biochemical, and biophysical molecular processes within microbial organisms as well as their interaction with the host.
"Virologist’s Coronavirus Paper in Top 50" - Sims’s research among most-downloaded articles on SARS-CoV-2
Reusable FAIRsharing DOI Repository Standards
Primary Data Archive
Mass Spectrometry - proteomics, metabolomics, lipidomics (reusable formats: raw, mzML, xml)
- PeptideAtlas: 10.25504/FAIRsharing.dvyrsz
- MassIVE: 10.25504/FAIRsharing.LYsiMd
- PRIDE: 10.25504/FAIRsharing.e1byny
Sequencing - transcriptomics, microarray (reusable formats: txt, txt.gz, xml)
Secondary Data Archive
Integrated Multi-Omics - transcriptomics, proteomics, metabolomics, lipidomics + experimental sample metadata
- PNNL DataHub: 10.17616/R31NJN56 (reusable formats: csv, txt, xlsx, psf, xml, json)
Accessible Secondary DOI Digital Data Downloads
Secondary host-associated viral dataset downloads contain one or more qualitative and/or quantitative statistically processed (normalization data transformation) expression analysis file(s) resulting from primary viral experimental study designs leveraging unique high-resolution Omics capabilities. Proteomic, metabolomic, lipidomic, and/or transcriptomics dataset downloads each have a direct relationship to a primary sample data submission corresponding to a Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) viral infection comprised of comprehensive time course experimental sample collections.
- SM001 DataHub DOI: 10.25584/LHVSM001/1661974
- SM003 DataHub DOI: 10.25584/LHVSM003/1661975
- SM012 DataHub DOI: 10.25584/LHVSM012/1661983
- SCL005 DataHub DOI: 10.25584/LHVSCL005/1661964
- SCL006 DataHub DOI: 10.25584/LHVSCL006/1661965
- SCL008 DataHub DOI: 10.25584/LHVSCL008/1661966
- SCL009 DataHub DOI: 10.25584/LHVSCL009/1661967
- SCL012 DataHub DOI: 10.25584/LHVSCL012/1661970
- SHAE002 DataHub DOI: 10.25584/LHVSHAE002/1661971
- SHAE003 DataHub DOI: 10.25584/LHVSHAE003/1661972
- SHAE004 DataHub DOI: 10.25584/LHVSHAE004/1661973
Linked Open Primary Data Downloads
Primary Sequencing Data
Primary transcriptome Agilent and Affymetrix microarray experimental data collections (txt and txt.gz) and associated metadata are openly available and have been submitted to the NCBI as BioProject (PRJNA) accessions and link to primary publication data accessions where possible.
- SM001 BioProject Accession: PRJNA149057 | GEO Accession: GSE33266 | Platform: GPL4134
- SM003 BioProject Accession: PRJNA215773 | GEO Accession: GSE50000 | Platform: GPL7202
- SM012 BioProject Accession: PRJNA213462 | GEO Accession: GSE49262 | Platform: GPL7202
- SCL005 BioProject Accession: PRJNA149059 | GEO Accession: GSE33267 | Platform: GPL4133
- SCL006 BioProject Accession: PRJNA163617 | GEO Accession: GSE37827 | Platform: GPL6480
- SCL008 BioProject Accession: PRJNA208996 | GEO Accession: GSE48142 | Platform: GPL6480
- SHAE002 BioProject Accession: PRJNA208498 (PRJNA208495) | GEO Accession: GSE47960 (GSE47963) | Platform: GPL6480
- SHAE003 BioProject Accession: PRJNA208497 (PRJNA208495) | GEO Accession: GSE47961 (GSE47963) | Platform: GPL6480
- SHAE004 BioProject Accession: PRJNA208499 (PRJNA208495) | GEO Accession: GSE47962 (GSE47963) | Platform: GPL6480
Primary Mass Spectrometry Data
The Mass Spectrometry Interactive Virtual Environment (MassIVE), PRIDE PRoteomics IDEntifications (PRIDE), and the PeptideAtlas are all public domain community data repositories promoting the free exchange of mass spectrometry data.
Primary mass spectrometry proteome, metabolome, and lipidome data (raw and mzML) and corresponding parameter files (xml), including those used for accurate mass and time (AMT) tag database generation, are openly available have been uploaded to the PeptideAtlas (PASS), PRIDE (PRD), and MassIVE (MSV) data repository accessions, linking to primary publication data accessions where possible.
- SM001 PeptideAtlas Accession: PASS00433 (proteome)
- SCL005 PeptideAtlas Accession: PASS00430 (proteome)
- SCL006 PeptideAtlas Accession: PASS00431 (proteome)
- SCL008 PeptideAtlas Accession: PASS00432 (proteome)
- SCL009 PRIDE Accession: PRD000594 (metabolome)
- SCL012 MassIVE Accession: MSV000078780, MSV000078781 (lipidome)
Primary Experimental Study Design Metadata
The Virus Pathogen Database and Analysis Resource (ViPR) is a comprehensive and highly curated repository of genome and protein sequence records and annotations for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program.
Primary structured experimental methods, protocols, and metadata supporting derived quantification raw data collections, including fold change values and p-values (.xls,.txt), are available at the Virus Pathogen Resource (ViPR) repository under the assigned digital object identifiers.
- SM001 ViPR DOI: 10.35083/SYNF-Z480 (transcriptome); 10.35083/0DXW-4N36 (proteome)
- SM003 ViPR DOI: 10.35083/HPPS-E642 (transcriptome)
- SM012 ViPR DOI: 10.35083/TGNN-6R57 (transcriptome)
- SCL005 ViPR DOI: 10.35083/HX08-H242 (transcriptome); 10.35083/MJPG-M062 (proteome)
- SCL006 ViPR DOI: 10.35083/BEBN-5D77 (transcriptome); 10.35083/W27D-AH13 (proteome)
- SCL008 ViPR DOI: 10.35083/0NSE-XM32 (transcriptome); 10.35083/40RE-T825 (proteome)
- SHAE002 ViPR DOI: 10.35083/FHRG-FQ84 (transcriptome)
- SHAE003 ViPR DOI: 10.35083/NJ24-TX75 (transcriptome)
- SHAE004 ViPR DOI: 10.35083/V4PB-NP34 (transcriptome)
Acknowledgment of Federal Funding
The data described here was funded in whole or in part by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under Contract No. HHSN272200800060C. Omics data generated by the Systems Virology Proteomics, Metabolomics, and Lipidomics Core were performed at Pacific Northwest National Laboratory (PNNL) in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the Department of Energy’s (DOE) Office of Biological and Environmental Research located in Richland, WA. PNNL is operated by the Battelle Memorial Institute for the DOE under contract number DE-AC05-76RLO1830.
Project Page Citation
Anderson LN, McDermott J, Waters K, Sims A, Baric R. (2021). Omics Lethal Human Viruses Project, Modeling Host Responses to Severe Acute Respiratory Syndrome (SARS) Infection Post-Processed Data Package DOIs. United States. https://data.pnnl.gov/group/nodes/project/13308. DOI: https://doi.org/10.25584/LHVSARS/1813912.