This repository contains data and code for the experiments run in the paper "Understanding Generative AI Content with Embedding Models" ( https://arxiv.org/abs/2408.10437 ). DataBase POC: Max Vargas (max.vargas@pnnl.gov) The data is separated by experiment: A. The `stack_exchange` dataset contains a...
Filter results
Category
- (-) Computing & Analytics (18)
- (-) Ecosystem Science (5)
- Scientific Discovery (376)
- Biology (260)
- Earth System Science (164)
- Human Health (112)
- Integrative Omics (73)
- Microbiome Science (47)
- National Security (32)
- Computational Research (25)
- Energy Resiliency (13)
- Chemical & Biological Signatures Science (12)
- Weapons of Mass Effect (12)
- Materials Science (11)
- Chemistry (10)
- Data Analytics & Machine Learning (9)
- Computational Mathematics & Statistics (7)
- Data Analytics & Machine Learning (7)
- Renewable Energy (7)
- Atmospheric Science (6)
- Visual Analytics (6)
- Coastal Science (4)
- Energy Storage (4)
- Solar Energy (4)
- Bioenergy Technologies (3)
- Energy Efficiency (3)
- Plant Science (3)
- Transportation (3)
- Cybersecurity (2)
- Distribution (2)
- Electric Grid Modernization (2)
- Grid Cybersecurity (2)
- Wind Energy (2)
- Advanced Lighting (1)
- Computational Mathematics & Statistics (1)
- Environmental Management (1)
- Federal Buildings (1)
- Geothermal Energy (1)
- Grid Analytics (1)
- Grid Energy Storage (1)
- High-Performance Computing (1)
- Subsurface Science (1)
- Terrestrial Aquatics (1)
- Vehicle Technologies (1)
- Waste Processing (1)
- Water Power (1)
Content type
Tags
- Synthetic (5)
- Chitin (1)
- Data inventory (1)
- Droughts (1)
- dynamic LCA (1)
- Extreme weather (1)
- Fires (1)
- Heatwaves (1)
- High-Performance Computing (1)
- life cycle costing (1)
- Machine Learning (1)
- Mass Spectrometer (1)
- Mass Spectrometry (1)
- Metatranscriptomic (1)
- ML/AI (1)
- multi-criteria decision making (1)
- Omics (1)
- Predictive Modeling (1)
- Soil (1)
- systems thinking (1)
- techno-economic analysis (1)
- ToF-SIMS (1)
This repository contains data for the experiments run in the paper "Understanding Generative AI Content with Embedding Models" ( https://arxiv.org/abs/2408.10437 ). DataBase POC: Max Vargas (max.vargas@pnnl.gov) The data is separated by experiment: A. The `stack_exchange` dataset contains a...
Category
Extreme weather events, including fires, heatwaves(HWs), and droughts, have significant impacts on earth, environmental, and power energy systems. Mechanistic and predictive understanding, as well as probabilistic risk assessment of these extreme weather events, are crucial for detecting, planning...
This dataset presents land surface parameters designed explicitly for global kilometer-scale Earth system modeling and has significant implications for enhancing our understanding of water, carbon, and energy cycles in the context of global change. Specifically, it includes four categories of...
HDF5 file containing 10,000 hydraulic transmissivity inputs and the corresponding hydraulic pressure field outputs for a two-dimensional saturated flow model of the Hanford Site. The inputs are generated by sampling a 1,000-dimensional Kosambi-Karhunen-Loève (KKL) model of the transmissivity field...
Please cite as : McClure R.S., Y. Farris, R.E. Danczak, W.C. Nelson, H. Song, A. Kessler, and J. Lee, et al. 2022. Metatranscriptomic data from MSC-2. [Data Set] PNNL DataHub. https://data.pnnl.gov/group/nodes/dataset/33232 Metatranscriptomic data from MSC-2 12 fastq files (6 forward read, 6 reverse...
Category
ProxyTSPRD profiles are collected using NVIDIA Nsight Systems version 2020.3.2.6-87e152c and capture computational patterns from training deep learning-based time-series proxy-applications on four different levels: models (Long short-term Memory and Convolutional Neural Network), DL frameworks...
This year’s VAST Challenge focuses on visual analytics applications for both large scale situation analysis and cyber security. We have two mini-challenges to test your analytical skills and confound your visual analytics applications. In the first mini-challenge, (the imaginary) BankWorld's largest...
Category
The VAST 2010 Challenge consisted of three mini-challenges (MC) and one Grand Challenge (GC). Each MC had a data set, instructions and a number of questions to be answered. The GC required participants to pull together information from all three data sets and write a debrief summarizing the...
Category
The VAST 2009 Challenge scenario concerned a fictitious, cyber security event. An employee leaked important information from an embassy to a criminal organization. Participants were asked to discover the identity ofthe employee and the structure of the criminal organization. Participants were...
Category
Mini Challenge 1: Wiki Editors The Paraiso movement is controversial and is having considerable social impact in a specific area of the world. We have extracted a segment of the Paraiso (the movement) Wikipedia edits page. Please note this is not the Paraiso Manifesto Wiki page which is part of the...
Category
It is Fall of 2004 and one of your analyst colleagues has been called away from her current tasks to an emergency. The boss has given you the assignment of picking up her investigation and completing her task. She has been asked to pursue a line of investigation into some unexpected activities...
Category
Dataset The dataset will consist of: About 1200 news stories from the Alderwood Daily News plus a few other items collected by the previous investigators A few photos A few maps of Alderwood and vicinity (in bitmap image form) A few files with other mixed materials, e.g. a spreadsheet with voter...
Category
This data was generated by the organization IvySys. Activities can be phone calls, transactions, or any other type of communications. Most of the files are of the type .edges, .rdf, or .csv; but all can be opened in a text editor. A good introduction to this data can be found in \Tutorial1\MAA...
Category
Current methods for life cycle inventory (LCI) data often require significant time and effort from manufacturers to compile. Additionally, without a uniform methodology these can result in inconsistent datasets that are difficult for practitioners to evaluate and compare, even within similar product...
Category
Datasets
2