WFPC2 Pipeline Products

The WFPC2 Associations Science Products Pipeline

 This initial release (November 8, 2002) of the WFPC2 Associations Science Products is a collaborative effort between CADC, ST-SCF and STScI.  We cannot and do not guarantee science quality in all of the products we are releasing. The present release  should be viewed as a "shared-risk" product release and users should consult the Data Quality assessement information that is available on these pages. A second release is planned for 2003 where more extensive documentation will be provided, including a refereed publication describing the processing methods and results. The goal of the second release is a fully "science-ready" product that is reliable and documented. The partners in this endeavor believe that the value of these WFPC2 products is high enough to warrant this early release.

The WFPC2 Associations Science Products Pipeline (WASPP) has been developed by the Canadian Astronomy Data Centre (CADC) and the Space Telescope-European Coordinating Facility (ST-ECF) with the goal of producing science-quality images and catalogs. Association information allows 74,787 individual images to be processed into 21,260 combined images, or "stacks". Data quality is greatly improved by this procedure that removes cosmic rays and increases sensitivity. Source detection and photometry is done on the stacks to produce catalogs. Corrections are made to the WFPC2 astrometry by reference to the USNO2 catalogue. The result will ultimately be ready-to-use science content. This project has consumed approximately 3 CPU-years of processing time and has yielded a massive dataset with 25 million source detections (which we intend to release as time permits) that represents a new class of archival science products for Hubble Space Telescope.

Motivation for associations and WFPC2 pipeline 

The accumulated archive of nearly a decade of WFPC2 observations is vast and varied, a rich resource for astronomy. As of May 2002 there are 21,260  associations after calibration observations are removed (see article by Micol & Durand, ST-ECF Newsletter January 2002) with a total exposure time of 42.7 million seconds. Figure 1 shows the distribution of associations across only the 10 most popular filters. Roughly 85% of the associations have 4 or fewer members (58% have two members). The value delivered to users of the WFPC2 archive is enhanced by applying the WASPP procedure. It obviates the need for labor-intensive procedures to transform raw or calibrated data into a science-ready product. The normal sequence of steps needed to use archive data (query, download, determine "dither" pattern by hand, combine images) is a reasonable investment for a few datasets but it is impractical if a large number of datasets need to be sifted through and evaluated to find a few of interest. 


We recognize that it is not possible to design a single pipeline that will be optimal for all science applications. Nevertheless, we believe that it is possible to save a lot of effort on the part of a lot of users and we believe there are many science applications that can use our pixel and catalog data directly.

How are the associations constructed?

Individual exposures are grouped into associations in a variety of ways (see references or The association process identifies logical sets of images and determines their offsets form one another via one of three routes: cross-correlation, jitter information, or World Coordinate System (WCS) information. Dither patterns are known reliably for 93% of the potential "stacks".  The association file includes all ancillary information like gain and exposure time for each member of the association. The associations used here are based on a common proposal identification, filter, roll angle, and position on the sky. Offsets are limited to no more than 100 WFC pixels to avoid geometrical distortions problems. This scheme can be extended to associate data from different telescopes at different wavelengths or even to associate data from widely divergent sources that address a common science theme.

How does the production pipeline work?

The starting point for WASPP is the association file which contains all of the information necessary to do the image combination. A series of scripts accesses this file and executes the WASPP procedures in sequence.

Recalibration on-the-fly is the first step of the process. Members of the association and their calibration data are retrieved from the archive and placed in a work directory where standard recalibration is performed.

Image shifting, scaling, zeropoint: We adopted a shift-and-add approach for combining the individual members of the association. Images are shifted to a common reference frame using fractional pixel shifts normally accurate to 0.015 arcseconds or better and then combined using a weighted average. The frames are scaled to the average exposure time and a zeropoint offset is added to correct for background differences between the images. 

Artificial Skepticism (AS) (Stetson 1989, V Advanced School of Astrophysics [Univerisidade de Sao Paulo], p.1.) is a method of computing a robust average image using a continuous weighting scheme that is derived from the data themselves. 


Weight Maps: After the first pass through the AS stacking is complete the resulting stack is used to back-predict the variance in each pixel yielding an improved, more robust estimate that is free of cosmic rays. The AS stacking is then repeated with this improved weight map. An output weight map is produced for the stack by propagating the AS weights for the final image.

Astrometric corrections: The WFPC2 World Coordinate System has good internal, or relative, precision but individual frames exhibit a dispersion of 1.6 arcsconds (systematic offset) relative to the USNO2 reference frame. This is not adequate to cross-identify sources. For each WFPC2 image we build a mosaic (using stsdas.hst_calib.wfpc2.wmosaic) and retrieve all USNO2 catalogue stars from the Vizier clone at CADC that might be on the frame given the original WCS information. We search for correspondence between the catalogue and observed bright sources and calculate offsets in RA and DEC and errors in the offset where more than 1 star was found. The uncertainty of the USNO2 data is claimed to be 0.25 arcseconds (root-mean-square or r.m.s.) but that was at an epoch in the 1950s. Accumulated proper motions bring the current r.m.s into the range 0.3-0.4 arcseconds depending on the field (Stetson, private communication) Figure 2 shows the offsets and the errors in those offsets. This reduces the systematic errors to less than 0.25 arcseconds for 50% of the WFPC2 pointings and less than 0.34 arcseconds for 85%.

Image Content classification

[Note that the image content classification is not available for all stacks in the initial release and the quality of those classifications is not always optimal.] 

There are many types of observations in the WFPC2 archive collection. There are fields that are dominated by a single bright star and fields that are dominated by numerous faint stars or galaxies. There are pointings with 20,000 detected sources and there are blank fields. Some fields have extended objects that are larger than the field-of-view of WFPC2. It is very useful to determine an "image content" classifier to allow querying for specific types of pointings and so that automated processing pipelines can "tune" the choice of processing parameters to match the image content. We have implemented a simple scheme to do this.

 Each WFPC2 chip  is analyzed to determine the total flux per second and the ratios of the fluxes in stars, small galaxies, and extended objects to the total flux in the chip. If a single chip differs dramatically in these ratios from the other chips this might indicate that a bright target is centered on that chip (often the PC). 

Source catalogs were produced using the widely-used Sextractor software (Bertin,E. & Arnouts, S. 1996 A&AS, 117,393). Sextractor was run on a variety of image types (for example, rich galaxy clusters, deep extragalactic pointings, globular clusters, and Cepheid fields in galaxies) in order to determine a good general set of configuration parameters that would yield robust detections and object deblending. The results were inspected to decide which detection and deblending parameter values worked best. 

 Our source catalogs for the 6 reddest filters contain detections for approximately 18.4 million and the full dataset containing all filters will contain roughly 25 million sources (not all distinct objects). [Source catalogues are available in the inital release.]

Products: In addition to the combined image we provide a weight map, an image showing the Sextractor apertures, the segmentation image (showing which pixel belongs to which source), and the background image. The catalogs themselves are ingested into a database and delivered through a query interface.

Data verification 

A verification process has been designed and preliminary  data quality has been
performed to demonstrate the quality of the products of the WFPC2 Association Science Products Pipeline. Science users need to be able to trust the data and to point to proof that their just is justified. Elements of the verifications process are:

* Photometric accuracy
o zero points have been set correctly
o header information (filter, exposure time, gain) is correct
* Astrometric accuracy
o zero point adjustments are determined and implemented correctly
* Basic calibration (flat-fielding etc.) is reliable
* Weight maps are correct
* Stacking has been implemented correctly 
o Scaling, shifting, zero offsets
* Sextractor results are accurate photometrically, astrometrically