spacer link to MAST page spacer logo image spacer

HLSP File Types, Formats, and Organization

HLSP collections in MAST must be prepared in such a way that the data files can be discovered with community standard queries (including the Data Discovery Portal), be described fully so that users understand how the data were prepared, and must be organized and tagged with metadata so that the data files are compatible with widely accepted data standards (such as FITS) and community software applications.

HLSP Collection Contents

An HLSP collection must include, at a minimum:

  1. One or more science files. If the collection includes observational data, the files should be one of the types tabulated below
  2. A manifest that lists the files in the collection
    • The format should be structured to enable processing with scripts (see example linked from the table below).
    • For each file, specify:
  3. A README file that must include:
    • contact information for the HLSP contributor
    • the contribution date
    • if applicable: the version identifier and a description of differences with the prior delivered version of the HLSP collection. We recommend a (lightly) structured format, but ASCII will do.
    • bibliographic information for one or more references to papers that describe the collection
  4. A brief project summary that describes the contributing team's science aims of the collection, coverage (spatial, temporal, etc.), and other information that may help another user understand the collection. This information may be folded into the README file for very small HLSP collections.

An HLSP collection should, but is not required to, also include:

  • Concomitant pixel data (variance arrays, bad pixel masks, exposure maps, etc.) where appropriate
  • If the data originate from a NASA mission, provide a mapping between each HLSP data product and the mission source files from which they were derived.
  • One graphic per science data product to be used as a preview; or a sky map for catalogs
  • MAST will generate preview graphics for products such as images, spectra, and light curves using an automated process if not provided by the HLSP contributors, which may be sub-optimal.

Accepted File Types, Formats, and Content

The types and formats of products that are accepted for HLSP collections are given in the following table.

Product Formats Data Organization Notes
Science Data Product Types
Image FITS Simple, or MEF with one or more IMAGE extensions One MEF file should include science and concomitant pixel data, in separate extensions (variance, data quality, exposure map, etc.), where appropriate.
Spectra: 2-D or higher FITS Simple, or MEF with one or more IMAGE extensions One MEF file should include science and concomitant pixel data, in separate extensions (variance, data quality, etc.), where appropriate. One-dimensional specta contained in images must express the WCS using standard header keywords.
Spectra: 1-D Simple, or MEF with one or more IMAGE or BINTABLE extensions
Catalog FITS, csv, DB dump BINTABLE (FITS), or ASCII (csv, DB dump) CSV files should be parsable with common software. Database dumps from e.g. MySQL or PostgresSQL may be acceptable; SQLite files are certainly acceptable.
Light curve FITS BINTABLE Time coordinate information may be represented in keywords or as an explicit table column.
Model or Simulation Consult with MAST staff. Simulations of observational data would ordinarily be represented in the format appropriate for the product being simulated.
Ancillary Products
README ASCII, HTML, markdown, sphinx Single flat file, or file with markup, Ex: README.rst, README.html Document that describes the collection contents. It must provide contact information for the originator of the archival material, a description of the semantic content and organization of data in the files, and bibliographic information for the published paper(s) that describes the creation and use of the collection. Put details of the science goals, data processing methodology, etc. in the Project Summary file.
Manifest json/yaml, csv structured; see example file in yaml format Document that describes the product delivery manifest. It must provide contact information for the originator of the archival material, and the organization of data in the files. Details of the science goals, data processing methodology, etc. must be included in the Project Summary file.
Graphics PDF, PostScript, gif, jpeg, png Plots and illustrations to be used as previews for image, light curve, spectra, atlases, (possibly) catalogs. Also appropriate for elements of the project description.
Animation MP4, WebM Appropriate for some data product previews, particularly for models or simulations.
Project Summary Outline-oriented structure: ASCII, HTML, sphinx One or two paragraph description of the science aims of the data collection, for incorporation into a web site for your collection. May also include a description of the observing program (including sky coverage). Must include literature reference(s) to the methodology used to create the HLSPs. For small HLSP collections this content may instead be folded into the README file.

For all data files be sure to:

For science data files be sure to:

  • include the required HLSP metadata (i.e., keywords) in the headers.

Non-Accepted File and Content Types

Proprietary Formats and Certain Content Types

Content that cannot be archived includes: publications, project tar files, or any files in a proprietary format (e.g., Microsoft Office). MAST also does not currently support ASDF files (whether stand-alone or in a FITS extension, other than as a pipeline end-product), though this may change in the future.

Software

It can be useful for contributors to associate software with their data products. Reasons include:

  • Documenting the processing software
  • Visualizing the data products
  • Providing an analysis tool that is tuned to the HLSP collection
  • Providing back-end services that operate on the collection data

MAST embraces the idea of associating software with data, but does not accept or support software. Rather, MAST encourages contributors to use third-party software repositories such as GitHub, and to register your software with the Astrophysics Source Code Library (ASCL). A link to your software repository can be included on the MAST website, and the link will be added to our master list of astronomical software that is tied to HLSP collections.

Only very limited support is presently available for back-end (analytical) services connected to HLSP collections, but more support is expected in the future.

Certain FITS File Organization

FITS is a fairly general format, and allows for multiple ways to organize data within a single file. However, some potential choices of data organization within FITS files have not seen wide use and are presently not supported for HLSP collections, including:

  • Binary tables containing images (any dimensionality) within table cells
  • ASCII table extensions (however, BINTABLE extensions are supported)