Mission Overview

Mock Image Training Sets for DeepMerge ("DEEPMERGE")


Primary Investigator: Aleksandra  Ćiprijanović

HLSP Authors: Aleksandra Ćiprijanović, Gregory Snyder

Released: 2022-03-03

Updated: 2022-03-03

Primary Reference(s): Ćiprijanović et al. 2020

DOI: 10.17909/t9-vqk6-pc80

Citations: See ADS Metrics

Read Me

Top panel: Galaxy image examples drawn from the pristine test dataset. Middle panel: The same galaxy images as in the top panel, but from the "noisy" test sample. Bottom panel: The same galaxy images as in the top panel, but drawn with logarithmic colormap normalization.  The top and middle panels also include the output values of the convolutional neural network (CNN). Non-mergers (“negative class” - N) are identified when the CNN outputs a value below 0.5, while mergers (“positive class” - P) are above this value. The authors present examples of true positive (TP), false negative (FP), true negative (TN) and false negative (FN) classifications.


To investigate the use of Convolutional Neural Networks (CNNs) for distinguishing between simulated merging and non-merging galaxies at z=2, the authors created two versions of mock data mimicking Hubble Space Telescope and James Webb Space Telescope observations: pristine (simulated galaxy images with PSF blurring) and noisy (simulated galaxy images with PSF and observational noise). The accuracy of the CNN model on the test set is 79% (76%) on the pristine (noisy) mock data. The CNN outperforms a Random Forest classifier (Snyder et al. 2019), which was shown to be superior to conventional one- or two-dimensional statistical methods (Concentration, Asymmetry, the Gini, M20 statistics etc.), which are commonly used when classifying merging galaxies.

These data were derived from the z=2 snapshot of the Illustris-1 simulation from the Illustris Project. Initial mock images and merger labels were described in detail by Torrey et al. (2015) and Snyder et al. (2019).  The merger labels were created by analyzing merger trees based on Rodriguez-Gomez et al. (2015).

Data Products

This HLSP contains two copies of the DEEPMERGE training set data, one copy stored in multi-extension FITS files, and one copy stored as simple .npy files for compatibility with the DEEPMERGE example notebook(s).

Four sets of training data are provided: pristine mock images and noisy mock images, each with a two-filter dataset and a three-filter dataset. Image data is stored in .npy files ending with "-x.npy", where x is used to indicate feature data. Merger labels are stored in an .npy file ending with "labels-y.npy". The FITS files combine the image/feature data and merger label data into two-extension FITS files, where the image data is the primary HDU ("Images") and the merger labels are stored as binary table extension ("MergerLabels").

In all files, the dimensionality of the image data arrays is given by (N_images = 15,426, N_filters = 2 or 3, N_pixels = 75, N_pixels = 75), where N_images corresponds to the number of images generated from the Illustris z=2 snapshot (snapshot number 068) including multiple viewing angles and augmentation, N_filters corresponds to the number of imaging filters (2 or 3 depending on file), and N_pixels corresponds to the number of pixels in each image dimension (75). All mock images have been convolved with model point-spread functions (PSFs) appropriate for the associated filter. Noisy mock images have had noise added such that the limiting surface brightness is approximately 25 magnitudes per square arcsecond in each filter. Finally, all images were re-binned to 75 pixels per side. Images span 120 physical kpc on a side, and each pixel spans approximately 0.1875 arcsec, assuming the source is at z=2 with the Illustris-1 cosmology. Image units are microjanskies per square arcsecond.

The FITS files have the following naming convention:



  • <instrument> = the set of instruments for the two ("acs-wfc3") and three ("acs-wfc3-nircam") filter sets.
  • <filters> = the set of filters for the two ("f814w-f160w") and three ("f814w-f160w-f356w") filter sets.
  • <type> = the version of the mock images, either "pristine" or "noisy".

The .npy files have the following naming convention:



  • <instrument> = the set of instruments for the two ("acs-wfc3") and three ("acs-wfc3-nircam") filter sets.
  • <filters> = the set of filters for the two ("f814w-f160w") and three ("f814w-f160w-f356w") filter sets.
  • <type> = the version of the mock images, either "pristine" or "noisy".
  • <set> = the images ("x") or the merger labels ("y").

Data file types:

_.fits Two- and three-filter mock images and merger labels.
_.npy Mock images ("x") and merger labels ("y") in .npy format.

Data Access

Files can be downloaded directly from https://archive.stsci.edu/hlsps/deepmerge. The two-filter noisy and pristine image sets are each ~1.4 GB in size, and the three-filter noisy and pristine image sets are each ~2.0 GB in size. Links for downloading the individual datasets are included in the following table:

Datasets Instruments Filters FITS Files Numpy Files
Two-filter HST/ACS, HST/WFC3 F814W, F160W



Noisy images, labels (same for all)

Pristine images

Three-filter  HST/ACS, HST/WFC3, JWST/NIRCam F814W, F160W, F356W



Noisy images

Pristine images



Please remember to cite the appropriate paper(s) below and the DOI if you use these data in a published work. 

Note: These HLSP data products are licensed for use under CC BY 4.0.