New Data Science Mission Office Head arrives at STScI

Dr. Arfon Smith will be leading a newly created mission office at STScI focused on Big Data and archival data science initiatives.

Jonathan Hargis

In early November, Dr. Arfon Smith arrived at STScI to lead the newly created Data Science Mission Office at STScI. The Data Science Mission Head is responsible for maximizing the scientific returns from a huge archive containing astronomical observations from 21 space astronomy missions and ground-based observatories. The new mission head will work closely with STScI staff to optimize the Institute's ability to help the scientific community address the big challenges of accessing and working with large, complex astronomical observations.

Since 2013, Smith has been a project scientist and program manager at GitHub, Inc., the world's largest platform for open source software. His duties included working to develop innovative strategies for sharing data and software in academia. Smith also helped to define GitHub's business strategy for public data products, and he played a key role in establishing the company's first data science and data engineering teams. In addition, Smith is a co-founder of the The Zooniverse, a citizen science platform for research in the arts and sciences. He received his doctorate in astrochemistry in 2006 from the University of Nottingham in Nottinghamshire, U.K.

Programmatic Access to the STScI Archive

MAST provides a number of command line and programmatic options for querying the archive and retrieving data without a browser.

Randy Thompson

Most MAST classic search requests can be submitted by entering the URL for a search interface and appending a list of desired search parameters. For example, entering (on a single line)

will return the top 10 entries from HST for a 10 arcmin cone search with the default output columns displayed as comma-separated values (CSV format). This search request method can be used with a programming language like python, php, perl etc. to search the MAST archive without using a browser. Although differences obviously exist between programming languages and operating systems, some possible advantages of such a method include: resubmitting complicated queries is easier, online data can be downloaded without a browser (using curl or wget output formats), results in non-html format are usually returned in less time, and queries can be run in the background or scheduled to run off-hours (e.g., using cron). For example, this command will search for archived Kepler data for targets with effective temperatures between 8025 and 8050 K, and output results as a wget script to download the lightcurves:

wget -O out.txt -v -a logfile ''

There are countless variations possible with this basic technique. The MAST Services webpage can provide more detail, explain the available search parameters, and provide more program examples. More information on programmatic access is expected to be available in the near future, but please contact us at or post your questions on the MAST Forum if you have any questions.

The Hubble Spectroscopic Legacy Archive

The HSLA is designed to maximize the scientific impact of the data produced by the HST UV spectrographs by providing uniformly processed data packaged in “smart archives” by target type and scientific theme.

Figure 1: HSLA spectrum of NGC 5548, composed of 764 coadded exposures.

With an uncertain future ahead for space ultraviolet astronomy, the data from the UV spectrographs aboard the Hubble Space Telescope have a legacy value beyond their initial science goals. The Hubble Spectroscopic Legacy Archive (HSLA) provides to the community new combined spectra for COS far-ultraviolet (FUV) data publicly available as of February 2016. COS/NUV data and STIS UV spectra will be made available in future releases. These data are packaged into "smart archives" according to target type and scientific themes (such as "solar system," "early type stars," "white dwarfs," and "starburst galaxies") to facilitate the construction of archival samples for common science uses. A new "quick look” capability makes the data easy for users to quickly access and download.

One of the key concepts behind the HSLA is combining spectra across exposures, visits, and programs to give a single co-added spectrum per target. This is particularly useful for non-time-variable sources that have been observed on multiple occasions with a variety of gratings. Figure 1 shows the HSLA coadded spectrum for NGC 5548. Details about the data are combined can be found in the HSLA documentation on the project website.

This initial release of the Legacy database was intended to inform Cycle 24 AR and GO proposals. The content and format of the data are designed to enable users to assess the quantity and quality of data that already exists in the archive to support their science goals. Programs that are accepted and proceed to a full analysis should be sure to use the latest releases posted on the HSLA website ( which will be updated to include improvements to the COS FUV wavelength solution and updated reference files.

Comments and questions about the HSLA are welcome and can be directed to