Knowledge Graph of RB-Tnseq Data from Fitness Browser (KP-DP1)

Dataset Image

download Download

Description

Dataset Description

This dataset is a Knowledge Graph (KG) projection is a graph of a subset of RB-Tnseq experimental data and associated biological and experimental metadata from the Fitness Browser. The KG is hosted as a Neo4j graph database, and contains three trained neural network models:

  • A multi-layer perceptron (MLP) with learned gene embeddings which predicts fitness using a gene-experiment-media triplet
  • A MLP + Graph Attention Transformer (GAT) which injects graph structure into the MLP by operating over gene–protein–function and
    media–chemical subgraphs, integrated via a gated residual mechanism. The objective function is link regression, which predicts the fitness effect of a gene-experiment-media relationship.
  • A MLP + GraphSAGE encoder model using a similar architecture as the MLP + GAT, where the objective function is also link regression.

The resulting graph supports two complementary inference modes: (1) symbolic graph traversal to surface candidate gene–environment and gene–chemical associations, and (2) learned inference using heterogeneous graph neural networks that propagate information across biological and environmental neighborhoods. 

Data Download Reference Citation:

Winston, Anthony; Donald, Sam; Purohit, Sumit; Patel, Kaizad; Egbert, Robert; & Waters, Katrina M (2026). Knowledge Graph of RB-Tnseq Data from Fitness Browser (KP-DP1).

Accessible Digital Data Downloads

The repository contains the following files:

  • neo4j.dump: subset of the entire knowledge graph (built from the 10 target organisms)
  • Readme: walking through the install guide 

Total Download Size: <to do>

Linked Primary Data

The Fitness Browser dataset can be found here: https://fit.genomics.lbl.gov/cgi-bin/myFrontPage.cgi

Funding Acknowledgments

The research data described here was funded in whole or in part by the Predictive Phenomics Initiative (PPI) at Pacific Northwest National Laboratory (PNNL). This work was conducted under the Laboratory Directed Research and Development Program at PNNL. PNNL is a multiprogram national laboratory operated by Battelle for the DOE under Contract No. DE-AC05-76RL01830.

Citation Policy

In efforts to enable discovery, reproducibility, and reuse of PPI-funded project dataset citations in accordance with best practices (as outlined by the FORCE11 Data Citation Principles), we ask that all reuse of project data and metadata download materials acknowledge all primary and secondary dataset citations and corresponding journal articles where applicable.

Data Licensing

Creative Commons Attribution 4.0 International (CC BY 4.0)

English
Projects (1)