Category
Description
This data package contains processed LC-MS proteomics results and analysis scripts associated with the paper "Assessing Degenerate Peptide Resolution Methods using a Ground Truth Dataset". In this study, we designed an artificial microbial community to create a ground truth, simulated metaproteomic dataset which intentionally includes degenerate peptides to enable evaluation of different protein inference methods. This data product includes peptide identifications and confidence scores from our ground truth and validation datasets, as well as the code used for analysis.
Contents (1.22GB)
Data
- 4998_Ground_Truth_Dataset
- FASTAs: Organism FASTA files digested in silico to create the reference library.
- e_data_4998.csv: Peptide expression data by sample.
- e_meta_4998.csv: Peptide metadata, including protein mappings.
- f_data_4998.csv: Experiment metadata, including sample groupings.
- scores_4998.csv: Peptide identification confidence scores from the search tool, MS-GF+.
- msnid_4998.RDS: Peptide identifications in MSnID format, saved as an R data object.
- 5765_Validation_Dataset
- (the same files as above, for the validation dataset)
Code
- peptide_analysis_figures_all_BJM_Oct2025.Rmd: Script used for analysis and figure generation.
- R_session_info.txt: Environment information, including software package versions.
- Rmarkdown_complete.Rdata: R environment containing all objects present after running the analysis script. Included to avoid rerunning code chunks that take several hours.
License
This data product is licensed under a Creative Commons Zero (“CC0”) Public Domain Dedication Waiver (https://creativecommons.org/publicdomain/zero/1.0/) in accordance with PNNL DataHub policy (https://data.pnnl.gov/policy).
Last updated BJM 02 November 2025.