The HST Pipeline: What Processing Was Done to My Data?
The calibration of raw HST observations involves a number of data processing steps. Here is an insider’s look at creating calibrated data products for the MAST archive.
The HST Calibration Pipeline
Before data are archived in MAST, it passes through the data calibration pipeline. The pipeline itself is implemented as workflows in HTCondor with the in-house Open Workflow Layer (OWL) managing the jobs that are submitted for processing. A typical workflow for a new science association is shown in Figure 1. The process names and descriptions are detailed in Table 1.
Process Name | Description |
---|---|
dan_receipt | Science data receipt |
INGEST | Archive raw exposure |
2FITS | Convert raw data format to FITS |
RF | Determine best reference files |
BC | Before-Calibration checks |
CA | Calibration step |
MD | Multi-drizzle/astro-drizzle |
AC | After-Calibration checks |
INGEST_SCI | Archive calibrated science data |
PVW | Create preview files |
INGEST_PVW | Archive preview files |
CL | Cleanup |
Table 1: New science calibration pipeline steps |
Record of Pipeline Steps Taken
The steps taken by the calibration pipeline are recorded in several of the FITS files delivered by MAST. As an example, take the ACS association J9CV58020 made from the exposures J9CV58TZQ and J9CV58U1Q. In the primary header for the raw data files (j9cv58tzq_raw.fits and j9cv58u1q_raw.fits), there is a section of keywords with the header "/ CALIBRATION SWITCHES: PERFORM, OMIT, COMPLETE". A series of keywords/values follows:
DQICORR = 'PERFORM ' / data quality initialization
ATODCORR= 'OMIT ' / correct for A to D conversion errors
BLEVCORR= 'PERFORM ' / subtract bias level computed from overscan img
BIASCORR= 'PERFORM ' / Subtract bias image
FLSHCORR= 'OMIT ' / post flash correction
CRCORR = 'OMIT ' / combine observations to reject cosmic rays
EXPSCORR= 'PERFORM ' / process individual observations after cr-reject
SHADCORR= 'OMIT ' / apply shutter shading correction
PCTECORR= 'PERFORM ' / cte correction
DARKCORR= 'PERFORM ' / Subtract dark image
FLATCORR= 'PERFORM ' / flat field data
PHOTCORR= 'PERFORM ' / populate photometric header keywords
RPTCORR = 'OMIT ' / add individual repeat observations
DRIZCORR= 'PERFORM ' / drizzle processing
These are the steps in the calibration pipeline that need to be performed, are omitted, or have been completed (none have been completed in the raw file). The final associated products (j9cv58020_drz.fits and j9cv58020_drc.fits) have the following for the same series of keywords:
DQICORR = 'COMPLETE' / data quality initialization
ATODCORR= 'OMIT ' / correct for A to D conversion errors
BLEVCORR= 'COMPLETE' / subtract bias level computed from overscan img
BIASCORR= 'COMPLETE' / Subtract bias image
FLSHCORR= 'OMIT ' / post flash correction
CRCORR = 'OMIT ' / combine observations to reject cosmic rays
EXPSCORR= 'COMPLETE' / process individual observations after cr-reject
SHADCORR= 'OMIT ' / apply shutter shading correction
DARKCORR= 'COMPLETE' / Subtract dark image
FLATCORR= 'COMPLETE' / flat field data
PHOTCORR= 'COMPLETE' / populate photometric header keywords
DRIZCORR= 'COMPLETE' / drizzle processing
Now the steps that had been marked as "PERFORM" are "COMPLETE." Note that the list of pipeline steps in the raw and drz/drc files are not identical because calibration steps for associations and their members may not be the same.
The full processing log for data files are found in the trl files (e.g., j9cv58020_trl.fits, j9cv58tzq_trl.fits, and j9cv58u1q_trl.fits). The logs are stored as a binary table in the first data extension of the FITS file. These log files are somewhat more difficult to read, as seen in the excerpt below:
('>>>>>>>>>>>>>>>>>>>> /ifs/archive/ops/hst/store/HSTDP-2015_3-160126//bin/exposure_times.py
j9cv58020 <<<<<<<<<<<<<<<<<<<<')
('2016074184159-I-INFO-Start ------ Exposure Times Updater for j9cv58020 ------')
('2016074184159-I-INFO-exposure_times-UPDATE_EXPOSURE_TIMES is False. No update necessary.')
('2016074184159-I-INFO- End ------ Exposure Times Updater Nothing to do for j9cv58020 ------')
('FYI: exit( 0 )')
('2016074183916-I-INFO DP_open_newobs: -------------- Data Partitioning Started: j9cv58tzq
------------ (46927151842912)')
('2016074183916-I-INFO DP_open_newobs: Partitioning from POD file:
lz_bdc5_067_0000102262_j9cv58tzq (46927151842912)')
('2016074183916-I-INFO Search osf is ????????-
p???????????????????????.j9cv58tzq_______________________________-acs-???-????')
('(46927151842912)')
('2016074183916-I-INFO Search osf is ????????-
????????????????????????.j9cv58tzq_______________________________-acs-???-????')
They do however provide more detailed information, including timestamps in the form “YYYYDDDHHMMSS” where YYYY is the year, DDD is the day of year, HH is the hour, MM is the minute, and SS is the second.
The version of the pipeline software used to process the data is given in keywords in the primary header of the FITS files. The section of keywords with the header "/ DIAGNOSTIC KEYWORDS" contains the following keywords and values:
OPUS_VER= 'COMMON 2017_2 ' / data processing software system version
CAL_VER = '3.4.1 (20-April-2017)' / CALSTIS code version
PROCTIME= 5.793008144676E+04 / Pipeline processing time (MJD)
CSYS_VER= 'hstdp-2017.2' / Calibration software system version id
"OPUS_VER" and "CSYS_VER" give the current Data Management System (DMS) build version, and "CAL_VER" gives the version of CALSTIS used to process this dataset (STIS association od0m070b0). The keyword “PROCTIME” gives the modified Julian date of the last processing (26 June 2017, 01:57:17 UT for this dataset). Information regarding the current DMS build can be found at https://archive.stsci.edu/hst/processing_status/. Note that these keywords are standard for the current DMS build, but may be different for older archival data.
Are you interested in more details or do you have additional questions? Please send an email to the MAST Helpdesk at archive@stsci.edu.