The Nancy Grace Roman Space Telescope Project Science Team
NASA Goddard Space Flight Center
20 November 2022 (Rev. 3.0)
This notebook's purpose is to provide experienced astronomers with a short, simple introduction to the Nance Grace Roman Space Telescope ("Roman") Wide Field Instrument (WFI) sample data. We assume that the reader is familiar with Flexible Image Transport System (FITS) files. We furthermore assume that the reader has worked with IR array detector data, understands how IR arrays are operated for space astronomy, and is familar with common calibration pipeline steps.
Readers who would like a more thorough introduction may wish to see the JWST user documentation at https://jwst-docs.stsci.edu/
. See in particular the sections describing the Science Calibration Pipeline. Although some details may differ for Roman, it is envisioned that the WFI pipeline will eventually resemble the JWST NIRCam pipeline. Additionally, it may be valuable to read Roman-specific documents.
In this notebook, we show how to do the following simple tasks. Because our aim is to make things as clear as possible, we do not necessarily do them in the most efficient ways. Once these operations are understood, we envision that many users will be able to start doing more complex things with the data.
This document is an executable python-3.10 notebook. We begin by loading required packages and showing the versions. Our development environment was as follows.
# Show the python version
import sys
sys.version
'3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0]'
# Load required packages and show versions
import astropy # Community astronomy tools
from astropy.io import fits # FITS file I/O
import numpy as np # Numerical libraries
from glob import glob # Filename globbing
import matplotlib # Plotting
import matplotlib.pyplot as plt
import os
# By default, save results in home directory. Feel free to change this.
outdir = os.getenv('HOME')
print('astropy version = ', astropy.__version__)
print('numpy version = ', np.__version__)
print('matplotlib version = ', matplotlib.__version__)
astropy version = 5.0.4 numpy version = 1.22.3 matplotlib version = 3.5.1
We will occasionally need to robustly compute means and standard deviations. The following does this.
def madstat(a, axis=None, keepdims=False, std=False):
"""
MADSTAT - Robust statistics using median and median absolute deviation
Parameters: a, array_like
Calculate the median absolute deviation of these values.
axis, tuple
Axis or axes along which the median absolute deviation is
computed.
keepdims, bool (optional)
If this is set to True, the axes which are
reduced are left in the result as dimensions with
size one. With this option, the result will
broadcast correctly against the input array.
std, bool (optional)
If set True, then the MAD is multiplied by 1.4826 to
estimate the standard deviation of normal deviates.
Returns: Median and median absolute deviation. Optionally returns median
and estimated standard deviation.
"""
# Compute the median and median absolute deviation
median = np.median(a, axis=axis, keepdims=keepdims)
mad = np.median(np.abs(a-median), axis=axis, keepdims=keepdims)
# Estimate std() if requested
if std is True:
mad *= 1.4826
# Done
return(median, mad)
The sample data are stored much like uncalibrated JWST data. Each "exposure" is packaged into a FITS file. An exposure can contain multiple sampled up-the-ramp "integrations".
The science data are packaged into a $4^\textrm{th}$ rank tensor. The dimensions are: (1) integration, (2) up-the-ramp group, (3) slow pixel scan, and (4) fast pixel scan. Roman is not using multiple integrations yet, so the left-most dimension is always singleton. The others colloquially refer to $z$, $y$, and $x$ in the data cube.
Examine one dark file's header to get our bearings.
# This should point to the data directory
datdir = '/path-to-data/fr03/20210907230637_noiw' # Change datdir to point to fr03/20210907230637_noiw
Make a list of the available files and pick one. Any will do for now. Use triplet number 21816.
triplet_sn = '21816' # Triplet serial number
files = sorted(glob(datdir + '/' + triplet_sn + '*.fits')) # Make a sorted list of the available files
file = files[0] # Use the first for now
Open the file and get the primary header. Also get the science data, which are stored in header data unit 1.
with fits.open(file) as hdul:
H0 = hdul[0].header # Get primary header
H = hdul[1].header # The header that goes with the data
D = hdul[1].data # Get the data. They are stored as a 4th rank tensor.
To get our bearings, take a look at the first few lines of the header. The primary header describes the overall exposures.
H0[:8]
SIMPLE = T / conforms to FITS standard BITPIX = 8 / array data type NAXIS = 0 / number of array dimensions EXTEND = T CHAN = '2 ' FPAPOS = 'CH2 ' DETECTOR= '21816 ' CABLESN = '3 '
The data header tells us about the 4th rank tensor that contains the actual samples up-the-ramp. It is shorter than the primary header, so we show all of it.
H
XTENSION= 'IMAGE ' / Image extension BITPIX = 16 / array data type NAXIS = 4 / number of array dimensions NAXIS1 = 4224 NAXIS2 = 4096 NAXIS3 = 56 NAXIS4 = 1 PCOUNT = 0 / number of parameters GCOUNT = 1 / number of groups BSCALE = 1 BZERO = 32768 EXTNAME = 'SCI ' / extension name EXTVER = 1
We see that NAXIS=4, indicating that it is indeed a 4$^\textrm{th}$ rank tensor. The concept is for NAXIS=4 to contain possibly many "integrations" packaged into one "exposure" (although at the moment there is only ever one). NAXIS1, NAXIS2, and NAXIS3 are the familar data dimensions: fast scan dimension, slow scan dimension, and groups/frames sampling up-the-ramp. We see BITPIX=16, indicating that the data are unsigned 16-bit integer.
In practice, we usually operate on the floating point numbers and find that 32-bits are adequate for most things. 32-bit floating point also happens to be a native format for most graphics processing units (GPU). Although we don't use GPUs in this example, we often do, and therefore work with 32-bits here.
To complete fetching the data, we therefore: (1) drop the singleton integration number dimension and (2) convert to 32-bit floating point.
# Fetch the data
D = np.float32(D[0]) # Drop singleton dimension and convert to float32
It is informative to see what remains.
print('D.shape = ', D.shape)
D.shape = (56, 4096, 4224)
From the above, we see that there are 56 non-destructive samples up-the-ramp. When displayed in a FITS viewer like SAOImage DS9 (https://sites.google.com/cfa.harvard.edu/saoimageds9
), there are 4096 image rows and 4224 image columns in each frame of data. We are being careful to specify "image rows" and "image columns" because the sense is the inverse of what python does internally as a row-major computing language.
Roman's WFI returns 33 "outputs" of data, each output measuring $4096\times4224~\textrm{pixels}$. From left-to-right in DS9, these are the 32 "regular outputs" containing scientific data and a special 33$^\textrm{rd}$ "reference output" that is common to all the others. When displayed in DS9, the horizontal (fast scan) readout directions alternate. Even numbered outputs read from left to right and then from bottom to top. Odd numbered ones read from right to left and then from bottom to top.
When these data were acquired, the reference output was subtracted on-the-fly from all 32 regular outputs with unity gain using differential amplifiers and subsequently digitized. We do not believe that this is statistically optimal, but it is a common choice in IR array cameras for astronomy. For purposes of this example, we will crop it off.
# Crop off the reference output
D = D[:,:,:4096]
print('D.shape = ', D.shape)
D.shape = (56, 4096, 4096)
# Save the dimensions for later
nz,ny,nx = D.shape
Reference correction is one of the first steps in most pipelines. Here we show the simplest possible reference correction using only the reference pixels in rows. Its purpose is to take out constant offsets between the 32 outputs.
We find that the last 4 image rows read out are usually pretty stable, and within these the 2 inner rows are the most stable. Our simple correction uses only these.
Write a simple reference row correction function that overwrites the input.
def rowcor(D):
"""
rowcor(D)
Reference correction using only the top four rows of reference pixels.
parameters: D, array
The input data cube
Returns: nothing
This overwrites the input data
"""
# Get cube dimensions
nz,ny,nx = D.shape
# Definitions
nout = 32 # The WFI uses 32 outputs for full frame readout
w = nx//nout # Width of each output in pixels
count = 3 # Clip off the best/worse few samples for robust statistics
# Compute first and last image columns for each output
x0 = np.arange(0,nx,w) # First cols
x1 = x0 + w-1 # Last cols
# Apply reference correction working frame-by-frame and output-by-output
for z in np.arange(nz):
for op in np.arange(nout):
refpix = D[z,4093:4095:,x0[op]:x1[op]] # Get ref. pixels
refpix = np.sort(refpix.flatten())[count-1:-count+1] # Trim outliers
mu = np.mean(refpix) # Robust mean
D[z,:,x0[op]:x1[op]] -= mu
Do a simple reference rows correction.
# Make a copy of the data for comparison later
D0 = D.copy()
# Apply reference rows correction
rowcor(D)
Show the result.
plt.plot(D0[5,4094], '.', alpha=.7, label='Raw')
plt.plot(D[5,4094], '.', alpha=.7, label='Ref. cor.')
plt.xlabel('H4RG Column (#)')
plt.ylabel('Signal (DN)')
plt.legend()
plt.show()
The above plot shows that (blue) before reference correction, there were some output-to-output offsets. After reference correction (orange), these are greatly reduced.
With up-the-ramp sampling, it is conventional to least squares fit straight lines to the pixel data. When there is no cosmic ray disturbance, the fitted slope is taken to be a proxy for incident flux.
At this point, one could iterate over the pixels on-by-one and use numpy.polyfit()
to fit straight lines. This is unfortunately very slow in python, so we do something a little more sophisticated here. This builds on numpy's linear algebra libraries, which are highly optimized and parallelized.
The following builds a set of monomial basis vectors, $\mathbf{v}\in\{z^0,z^1,\dots z^\textrm{degree}\}$, and uses the Moore-Penrose inverse to fit polynomials by projecting the data onto them. Here, $z$ is up-the-ramp frame index and "degree" is the fit degree. This is orders of magnitude faster than iterating on numpy.polyfit()
. We show later that the results are the same as could have been obtained using numpy.polyfit()
.
class Polynomial():
def __init__(self, nz, degree):
"""
__init__(self, nz, degree)
Instantiate a polynomial object.
Parameters: nz, int
Number of up-the-ramp samples
degree, int
Desired fit degree. Use
degree=1 for straight lines.
"""
# Definitions
self.nz = nz # Number of up-the-ramp samples
self.degree = degree # Desired fit degree
self.z = np.arange(nz) # Vector of up-the-ramp sample indices
# Build the monomial basis matrix. It has the same number
# of rows as there are up-the-ramp samples. Each column
# contains a monomial basis vector.
self.B = np.empty((nz,degree+1), dtype=np.float64)
for col in np.arange(degree+1):
self.B[:,col] = self.z**col
# The Moore-Penrose inverse of the basis matrix does
# least squares fitting.
self.pinvB = np.linalg.pinv(self.B)
# Fitting method
def polyfit(self, D):
R = np.float32(np.matmul(self.pinvB,
D.reshape(nz,-1)).reshape(self.degree+1,4096,4096))
return(R)
Instantiate a polynomial object.
P = Polynomial(nz,1)
Do the fitting and save.
# Fit
R = P.polyfit(D)
# Save
fits.PrimaryHDU(R).writeto(outdir + '/R.fits', overwrite=True)
Show that we got the same result that we would have gotten using numpy.polyfit()
, only faster.
# Compare a 5x5 pixel box
z = np.arange(nz) # Up-the-ramp sample indices
RShort = np.empty((2,5,5), dtype=np.float32) # numpy.polyfit() result goes here
for y in np.arange(5):
for x in np.arange(5):
RShort[:,y,x] = np.polyfit(z, D[:,y,x], 1)[::-1]
# Look at the absolute value of the difference
delta = R[:,:5,:5] - RShort
# Show result
print('Max. deviate = ', np.abs(delta).max())
Max. deviate = 0.0
We recommend examining R.fits
using a FITS viewer such as DS9. Elements [0,:,:] are the offsets and elements [1,:,:] are the slopes. Here is an example of what these looked like for one triplet.
# Pick off the offset and slope
a = R[0].copy() # Offset
b = R[1].copy() # Slope
# Compute good ranges for display
m,s = madstat(a, std=True)
vmin_a = m-2*s
vmax_a = m+2*s
# Display
ticks = np.arange(0,4097,512)
fig, ax = plt.subplots(1, 2, figsize=(2*6.4,4.8))
ax[0].imshow(a, origin='lower', cmap='gray', vmin=-1130, vmax=5530)
ax[0].set_xticks(ticks)
ax[0].set_yticks(ticks)
ax[0].set_title('Offset')
ax[1].imshow(b, origin='lower', cmap='gray', vmin=-.5, vmax=.5)
ax[1].set_xticks(ticks)
ax[1].set_yticks(ticks)
ax[1].set_title('Slope')
plt.tight_layout()
plt.show()
The above results are typical.
The offset image shows a faint "picture frame" pattern that is often seen in Teledyne HxRG series detectors. The "tree rings" originate in the silicon readout integrated circuit (ROIC). These are an artifact of how the silicon wafer was made and processed. We also see small differences between the outputs, even though the data have been reference rows corrected.
As is usually the case, the slope image is comparatively clean. The white specs are mostly "hot pixels". Hot pixels are a class of inoperable pixel that has anomalously large leakage current.
We hope that you have found this short introduction to be helpful.