Mission Overview
Mock Image Training Sets for DeepMerge ("DEEPMERGE")
Primary Investigator: Aleksandra Ćiprijanović
HLSP Authors: Aleksandra Ćiprijanović, Gregory Snyder
Released: 2022-03-03
Updated: 2022-03-03
Primary Reference(s): Ćiprijanović et al. 2020
Citations: See ADS Metrics
Overview
To investigate the use of Convolutional Neural Networks (CNNs) for distinguishing between simulated merging and non-merging galaxies at z=2, the authors created two versions of mock data mimicking Hubble Space Telescope and James Webb Space Telescope observations: pristine (simulated galaxy images with PSF blurring) and noisy (simulated galaxy images with PSF and observational noise). The accuracy of the CNN model on the test set is 79% (76%) on the pristine (noisy) mock data. The CNN outperforms a Random Forest classifier (Snyder et al. 2019), which was shown to be superior to conventional one- or two-dimensional statistical methods (Concentration, Asymmetry, the Gini, M20 statistics etc.), which are commonly used when classifying merging galaxies.
These data were derived from the z=2 snapshot of the Illustris-1 simulation from the Illustris Project. Initial mock images and merger labels were described in detail by Torrey et al. (2015) and Snyder et al. (2019). The merger labels were created by analyzing merger trees based on Rodriguez-Gomez et al. (2015).
Data Products
This HLSP contains two copies of the DEEPMERGE training set data, one copy stored in multi-extension FITS files, and one copy stored as simple .npy files for compatibility with the DEEPMERGE example notebook(s).
Four sets of training data are provided: pristine mock images and noisy mock images, each with a two-filter dataset and a three-filter dataset. Image data is stored in .npy files ending with "-x.npy", where x is used to indicate feature data. Merger labels are stored in an .npy file ending with "labels-y.npy". The FITS files combine the image/feature data and merger label data into two-extension FITS files, where the image data is the primary HDU ("Images") and the merger labels are stored as binary table extension ("MergerLabels").
In all files, the dimensionality of the image data arrays is given by (N_images = 15,426, N_filters = 2 or 3, N_pixels = 75, N_pixels = 75), where N_images corresponds to the number of images generated from the Illustris z=2 snapshot (snapshot number 068) including multiple viewing angles and augmentation, N_filters corresponds to the number of imaging filters (2 or 3 depending on file), and N_pixels corresponds to the number of pixels in each image dimension (75). All mock images have been convolved with model point-spread functions (PSFs) appropriate for the associated filter. Noisy mock images have had noise added such that the limiting surface brightness is approximately 25 magnitudes per square arcsecond in each filter. Finally, all images were re-binned to 75 pixels per side. Images span 120 physical kpc on a side, and each pixel spans approximately 0.1875 arcsec, assuming the source is at z=2 with the Illustris-1 cosmology. Image units are microjanskies per square arcsecond.
The FITS files have the following naming convention:
hlsp_deepmerge_hst_<instrument>_illustris-z2_<filters>_v1_sim-<type>.fits
where:
- <instrument> = the set of instruments for the two ("acs-wfc3") and three ("acs-wfc3-nircam") filter sets.
- <filters> = the set of filters for the two ("f814w-f160w") and three ("f814w-f160w-f356w") filter sets.
- <type> = the version of the mock images, either "pristine" or "noisy".
The .npy files have the following naming convention:
hlsp_deepmerge_hst_<instrument>_illustris-z2_<filters>_v1_sim-<type>-<set>.npy
where:
- <instrument> = the set of instruments for the two ("acs-wfc3") and three ("acs-wfc3-nircam") filter sets.
- <filters> = the set of filters for the two ("f814w-f160w") and three ("f814w-f160w-f356w") filter sets.
- <type> = the version of the mock images, either "pristine" or "noisy".
- <set> = the images ("x") or the merger labels ("y").
Data file types:
_.fits | Two- and three-filter mock images and merger labels. |
_.npy | Mock images ("x") and merger labels ("y") in .npy format. |
Data Access
Files can be downloaded directly from https://archive.stsci.edu/hlsps/deepmerge. The two-filter noisy and pristine image sets are each ~1.4 GB in size, and the three-filter noisy and pristine image sets are each ~2.0 GB in size. Links for downloading the individual datasets are included in the following table:
Datasets | Instruments | Filters | FITS Files | Numpy Files |
---|---|---|---|---|
Two-filter | HST/ACS, HST/WFC3 | F814W, F160W | ||
Three-filter | HST/ACS, HST/WFC3, JWST/NIRCam | F814W, F160W, F356W |