The WFPC2 Associations Science Products Pipeline
This initial release (November 8, 2002) of the WFPC2 Associations
Science Products is a collaborative effort between CADC, ST-SCF and STScI.
We cannot and do not guarantee science quality in all of the products we
are releasing. The present release should be viewed as a "shared-risk"
product release and users should consult the Data Quality assessement information
that is available on these pages. A second release is planned for 2003
where more extensive documentation will be provided, including a refereed
publication describing the processing methods and results. The goal of
the second release is a fully "science-ready" product that is reliable
and documented. The partners in this endeavor believe that the value of
these WFPC2 products is high enough to warrant this early release.
The WFPC2 Associations Science Products Pipeline (WASPP) has been developed
by the Canadian Astronomy Data Centre (CADC) and the Space Telescope-European
Coordinating Facility (ST-ECF) with the goal of producing science-quality
images and catalogs. Association information allows 74,787 individual images
to be processed into 21,260 combined images, or "stacks". Data quality
is greatly improved by this procedure that removes cosmic rays and increases
sensitivity. Source detection and photometry is done on the stacks to produce
catalogs. Corrections are made to the WFPC2 astrometry by reference to
the USNO2 catalogue. The result will ultimately be ready-to-use science
content. This project has consumed approximately 3 CPU-years of processing
time and has yielded a massive dataset with 25 million source detections
(which we intend to release as time permits) that represents a new class
of archival science products for Hubble Space Telescope.
Motivation for associations and WFPC2 pipeline
The accumulated archive of nearly a decade of WFPC2 observations is
vast and varied, a rich resource for astronomy. As of May 2002 there are
21,260 associations after calibration observations are removed (see
article by Micol & Durand, ST-ECF Newsletter January 2002) with a total
exposure time of 42.7 million seconds. Figure 1 shows the distribution
of associations across only the 10 most popular filters. Roughly 85% of
the associations have 4 or fewer members (58% have two members). The value
delivered to users of the WFPC2 archive is enhanced by applying the WASPP
procedure. It obviates the need for labor-intensive procedures to transform
raw or calibrated data into a science-ready product. The normal sequence
of steps needed to use archive data (query, download, determine "dither"
pattern by hand, combine images) is a reasonable investment for a few datasets
but it is impractical if a large number of datasets need to be sifted through
and evaluated to find a few of interest.
We recognize that it is not possible to design a single pipeline that
will be optimal for all science applications. Nevertheless, we believe
that it is possible to save a lot of effort on the part of a lot of users
and we believe there are many science applications that can use our pixel
and catalog data directly.
How are the associations constructed?
Individual exposures are grouped into associations in a variety of ways
(see references or http://archive.eso.org/archive/hst/wfpc2_asn). The association
process identifies logical sets of images and determines their offsets
form one another via one of three routes: cross-correlation, jitter information,
or World Coordinate System (WCS) information. Dither patterns are known
reliably for 93% of the potential "stacks". The association file
includes all ancillary information like gain and exposure time for each
member of the association. The associations used here are based on a common
proposal identification, filter, roll angle, and position on the sky. Offsets
are limited to no more than 100 WFC pixels to avoid geometrical distortions
problems. This scheme can be extended to associate data from different
telescopes at different wavelengths or even to associate data from widely
divergent sources that address a common science theme.
How does the production pipeline work?
The starting point for WASPP is the association file which contains
all of the information necessary to do the image combination. A series
of scripts accesses this file and executes the WASPP procedures in sequence.
Recalibration on-the-fly is the first step of the process. Members of
the association and their calibration data are retrieved from the archive
and placed in a work directory where standard recalibration is performed.
Image shifting, scaling, zeropoint: We adopted a shift-and-add approach
for combining the individual members of the association. Images are shifted
to a common reference frame using fractional pixel shifts normally accurate
to 0.015 arcseconds or better and then combined using a weighted average.
The frames are scaled to the average exposure time and a zeropoint offset
is added to correct for background differences between the images.
Artificial Skepticism (AS) (Stetson 1989, V Advanced School of Astrophysics
[Univerisidade de Sao Paulo], p.1.) is a method of computing a robust average
image using a continuous weighting scheme that is derived from the data
Weight Maps: After the first pass through the AS stacking is complete
the resulting stack is used to back-predict the variance in each pixel
yielding an improved, more robust estimate that is free of cosmic rays.
The AS stacking is then repeated with this improved weight map. An output
weight map is produced for the stack by propagating the AS weights for
the final image.
Astrometric corrections: The WFPC2 World Coordinate System has good
internal, or relative, precision but individual frames exhibit a dispersion
of 1.6 arcsconds (systematic offset) relative to the USNO2 reference frame.
This is not adequate to cross-identify sources. For each WFPC2 image we
build a mosaic (using stsdas.hst_calib.wfpc2.wmosaic) and retrieve all
USNO2 catalogue stars from the Vizier clone at CADC that might be on the
frame given the original WCS information. We search for correspondence
between the catalogue and observed bright sources and calculate offsets
in RA and DEC and errors in the offset where more than 1 star was found.
The uncertainty of the USNO2 data is claimed to be 0.25 arcseconds (root-mean-square
or r.m.s.) but that was at an epoch in the 1950s. Accumulated proper motions
bring the current r.m.s into the range 0.3-0.4 arcseconds depending on
the field (Stetson, private communication) Figure 2 shows the offsets and
the errors in those offsets. This reduces the systematic errors to less
than 0.25 arcseconds for 50% of the WFPC2 pointings and less than 0.34
arcseconds for 85%.
Image Content classification
[Note that the image content classification is not available for
all stacks in the initial release and the quality of those classifications
is not always optimal.]
There are many types of observations in the WFPC2 archive collection.
There are fields that are dominated by a single bright star and fields
that are dominated by numerous faint stars or galaxies. There are pointings
with 20,000 detected sources and there are blank fields. Some fields have
extended objects that are larger than the field-of-view of WFPC2. It is
very useful to determine an "image content" classifier to allow querying
for specific types of pointings and so that automated processing pipelines
can "tune" the choice of processing parameters to match the image content.
We have implemented a simple scheme to do this.
Each WFPC2 chip is analyzed to determine the total flux
per second and the ratios of the fluxes in stars, small galaxies, and extended
objects to the total flux in the chip. If a single chip differs dramatically
in these ratios from the other chips this might indicate that a bright
target is centered on that chip (often the PC).
Source catalogs were produced using the widely-used Sextractor software
(Bertin,E. & Arnouts, S. 1996 A&AS, 117,393). Sextractor was run
on a variety of image types (for example, rich galaxy clusters, deep extragalactic
pointings, globular clusters, and Cepheid fields in galaxies) in order
to determine a good general set of configuration parameters that would
yield robust detections and object deblending. The results were inspected
to decide which detection and deblending parameter values worked best.
Our source catalogs for the 6 reddest filters contain detections
for approximately 18.4 million and the full dataset containing all filters
will contain roughly 25 million sources (not all distinct objects). [Source
catalogues are available in the inital release.]
Products: In addition to the combined image we provide a weight map,
an image showing the Sextractor apertures, the segmentation image (showing
which pixel belongs to which source), and the background image. The catalogs
themselves are ingested into a database and delivered through a query interface.
A verification process has been designed and preliminary data
quality has been
performed to demonstrate the quality of the products of the WFPC2 Association
Science Products Pipeline. Science users need to be able to trust the data
and to point to proof that their just is justified. Elements of the verifications
* Photometric accuracy
o zero points have been set correctly
o header information (filter, exposure time, gain) is correct
* Astrometric accuracy
o zero point adjustments are determined and implemented correctly
* Basic calibration (flat-fielding etc.) is reliable
* Weight maps are correct
* Stacking has been implemented correctly
o Scaling, shifting, zero offsets
* Sextractor results are accurate photometrically, astrometrically