The following Items will be uploaded to the DataHub. ##################### ./npy_blocks/ Our image dataset is reported using FP32 numpy arrays. I have pre-sorted my arrays into static training-validation-testing splits. the npy_blocks data represent the main training input to the neural network: images of galaxies taken from the GALEX, PanSTARRS, and WISE surveys. Each subset has been further split into 16 different chunks for the 16 different workers running training. The *_indices.npy files are integer numpy arrays that communicate for each subset and chunk, which mantis_shrimp_index the objects are. This allows us to query our spectroscopic data for targets. ./npy_blocks/ ----train_indices.npy ----train/ ----mantis_shrimp_*.npy [* in {0,...15}] ----val_indices.npy ----val/ ----mantis_shrimp_*.npy [* in {0,...15}] ----test_indices.npy ----test/ ----mantis_shrimp_*.npy [* in {0,...15}] ###################### ./spectroscopy/ The spectrocopy directory contains comma-seperated-values (.csv) files and binary python pickle (.pkl) files of our learning targets + additional metadata about each source. Each file contained has an index representing the mantis_shrimp_index and additional metadata. That metadata is one of: a) redshift, a number describing the cosmological redshift of the galaxy b) ra, or right ascensnion describing the position in the sky of the galaxy c) dec, or declination describing the position in the sky of the galaxy d) extinction values (ebv), describing the reddening due to dust along the line of sight between us and the galaxy e) photosurvey_ID: if the original spectroscopic survey had an objID, here I include it. This can be either a big-int or text depending on the original survey f) survey: text describing the original source of the target g) photoz: number representing another project's predicted photometric redshift of the object h) photoz_err: number representing the uncertainty of another project's predicted photometric redshift of the object i) prob_gal: BeckPZ also report the probability of an object being a galaxy (rather than a star or QSO). j) a value representing whether the WISExPS1 photometry is well represented in the original trianing dataset used in Beck et al 2023 spectroscopy is taken from public astornomical spectroscopic surveys: Sloan digital sky survey (SDSS), dark energy spectroscopic instrument survey (DESI), DEEP2, GAMA, VVDS, VIPERS, 6dF, and WiggleZ. Citations for these sources are provided in Table 2 of our public report https://arxiv.org/pdf/2402.03535.pdf. ./spectroscopy/ ----redshifts.pkl ----redshifts_withextinction.pkl ----./redshifts_broken_beck/ ----chunk*.csv [* in {1,...,8}] ##################### ./masks/ masks contain an "exists_mask", a numpy boolean array communicating whether an object exists given our queried cutouts. Essentially, I chose to represent objects that do not have avaiable photometry as all zeros rather than remove them from the dataset entirely. This choice is reflected in there being images with all zeros in the npy_blocks/* files. A natural thing to do is simply remove these files in preprocessing. ./masks/ ----exists_mask.npy ##################### ./model_weights/ contains a directory of model weights to be trained. Currently empty. Will contain both SSL pre-trained weights and fine-tuned weights of our models. ##################### ./etc/ contains information queried from online, public astronomical surveys to be matches onto our mantis_shrimp index. For example, contains photon flux received by telescope from an object that exists in our image dataset.