spacer link to MAST page spacer logo image spacer
 
link to STScI page


Data Archiving Guidelines

Introduction

The information below was written for projects intending to archive mission data sets within the Mikulski Archive for Space Telescopes (i.e., MAST). Recommendations are included regarding:

  • data set formats for archival data (i.e., using FITS),
  • the creation of database tables to allow online catalog searches,
  • the preservation of project-specific software and documentation.
The usefulness of mission data sets to future users will depend on how well this information is preserved. MAST staff members can work with projects to assist them in developing data products that will be useful to the general astronomical community for the long term. We can review FITS headers, for example, for compliance with the standard and to confirm the information most frequently used for searching is present.

1. Data Set Formats

The astronomical community has adopted the Flexible Image Transport System (i.e. FITS) format as the default standard for the exchange of data between institutions. The FITS file format is platform independent, supported by many institutions, and endorsed by both NASA and the IAU. For these reasons FITS is the recommended file format for archiving data at STScI. A description of the FITS data format recommendations can be found in the MAST Data Format Guidelines document. A online version of the FITS Standard Document and the FITS User's Guide is also available.

It is recognized however that some archival data may be stored in other formats, particularly for those projects which preceded the recent developments in FITS. One example is the earlier-processed IUE data which is stored in a VICAR-based IUE "GO" format. In some other cases, projects have distributed data as ASCII text files or created auxiliary data sets as ASCII text files or postscript format. In these cases, no attempt will be made to reformat the data sets before being archived within MAST.

2. Catalogs

MAST, like other data retrieval systems, uses online database tables to search for requested data sets. In many cases, projects store the same information in both the catalog or database table and the FITS keywords. To simplify adding new data sets into MAST, it is helpful if either

  1. the project provides a target list or catalog of observations, or
  2. the FITS headers are constructed such that MAST staff members can create a catalog from the FITS keywords.
Obviously, project-delivered catalogs would greatly simplify the archiving of mission data sets. In the absence of catalogs or target lists, we would appreciate guidance from the project staff concerning which of the FITS header keywords would be most useful for searching. We do not want to do a wholesale ingest of all header keywords

Catalogs should contain those fields which would most help users locate the desired observation(s). Coordinates, observation date, exposure time, and target name are fairly essential (depending on the method of observation), while parameters needed for analyzing or interpreting archived data would be highly desirable. A target classification entry has been very useful for users interested in particular types of objects.

Although the MAST uses the Sybase Database Management system, tables can be exchanged between most database systems by copying them to ASCII table files (be sure to include a sufficient number of significant figures for representing floating point values.) A WEB page containing an observation list may be an adequate replacement for a database table. In either case, a description of the individual fields within the table or list should be provided as well. The description should also define the source of the entries. For example, it should state whether the coordinates were supplied by the observer, or obtained from an existing catalog.

3. Documentation

Project-supplied documentation in the following categories should be made available for archive users:

  1. Project Description - General descriptions of the mission and instrumentation,
  2. Data Processing - how the data was reduced and calibrated,
  3. Data Description - Documentation on data characteristics (e.g., instrumental resolution, field of view, wavelength coverage, etc.), anomalies, (e.g., cosmic ray hits, bad pixels, scratches, etc.), measurement uncertainties, and database field descriptions,
  4. Data Format - A general description of the contents of the archived mission data sets including, for example, documentation on the FITS keyword entries. (Note generally the FITS keyword comment field alone is insufficient to properly define many keywords. Without additional documentation, many of these keywords will be of little or no use to future users.)

Since MAST documentation will be accessed primarily from the WEB, documentation is most useful if it exists online. Most text processing formats such as LaTeX or Microsoft WORD (or standard ASCII files) can be fairly easily converted to HTML by staff members. Large documents such as user manuals or data analysis guides should be made available to users in several formats such as HTML for online access, and POSTSCRIPT and/or PDF for downloading.

4. Software

Some projects have written software to analyze and interpret raw and/or processed data. These programs should be archived for future users. MAST will make project-supplied software available to requesters, although support for the software itself can not be provided. MAST currently maintains for example, the IUEDAC IDL software libraries, the UIT BDR software written in C and Fortran-77, and the EUVE EUV1.8 IRAF software.

A list of available Fits software packages is available from HEASARC. The list contains links to sites supporting general FITS readers and writers written in a variety of programming languages.