A framework for testing and benchmarking machine learning methods on astronomical data
Hello Universe is a new project at MAST designed to help astronomers develop machine learning (ML) methods for astronomical discovery. ML will be an essential tool for analyzing the rich data sets of the upcoming decade, and Hello Universe provides a framework for testing ML algorithms and new techniques. Each entry in the Hello Universe collection includes:
- Data: a high-level science product (HLSP) data set for testing and benchmarking ML algorithms
- Code: a tutorial Jupyter notebook that provides step-by-step examples of how to apply an ML technique to the data
Though these data sets are motivated by the needs of a novice data science learner, they are sufficient for a wide range of tasks. Hello Universe entries include examples of:
- analyzing 2D (image) and 1D (vector or light curve) data sets.
- applying techniques for regression and for classification.
- developing supervised and unsupervised learning models.
- using best practices for training and optimizing models.
- selecting metrics for assessing model performance.
Entries
-
Classifying JWST/HST galaxy mergers with CNNs
neural networks | 2d data | classification | overfitting | confusion matrix -
Predicting 3D-HST redshift with decision trees
decision trees | 1d data | regression | cross-validation -
Classifying Pan-STARRS with (un)supervised learning
classification | 1d data | PCA | tSNE | k-means | SGD | unsupervised | supervised -
Interpreting Convolutional Neural Networks
interpretation | 2d data | unsupervised | neural networks | CNNs
Get Involved!
-
Contribute to Hello Universe
Have an idea for a data set + notebook pair? We welcome your contributions to Hello Universe! Please contact archive@stsci.edu to get started. -
Run Hello Universe on TIKE
Want to interact with Hello Universe notebooks or come up with your own? Edit and run notebooks, or create your own ideas with TIKE.