EUVE TELEMETRY STORAGE AND RETRIEVAL
Forrest R. Girouard and Allen Hopkins
Center for EUV Astrophysics, 2150 Kittredge St.,
University of California, Berkeley, California 94720, USA
ABSTRACT
The successful permanent archiving of the telemetry and convenient access to
that archive are crucial to the long-term success of the EUVE project. This
paper describes EUVE's system for archiving and retrieval of EUVE telemetry.
The goals of this system's design include archiving with no loss of information
and providing flexibility of access that is based on criteria meaningful to the
user and that is also independent of storage implementation. Clearly defined
interfaces and libraries aid in the functional organization of the system and
allow access to the archive to be built into application software at a high
level. The reader's familiarity with the Unix programming environment is as-
sumed.
1. DESIGN PHILOSOPHY
An underlying philosophical tenet of EUVE operations software design is that
of building the functionality into libraries with strictly defined interfaces.
This approach avoids duplication of code, which simplifies maintenance and
greatly reduces development time for many high-level applications. Strictly
defining these interfaces at the outset allows applications that use the li-
braries to be written even before the libraries themselves have been completely
implemented. The disadvantages of increased size of executables and increased
complexity of program flow are decidedly outweighed by the advantages of im-
proved large-project software development, ease of maintenance, and the avail-
ability of a powerful toolbox of libraries for future work.
All library routines have carefully been kept stateless. That is, no li-
brary routine maintains any information between calls. This enables the rou-
tines to be both re-entrant and sharable and eases debugging. The library
interfaces provide data structures for convenient use by client programs for
maintaining state information.
Extensibility is another important design factor. While interfaces are of
necessity strictly defined at the outset, it is important, wherever possible,
to avoid assumptions about the future and to keep a certain generality of
function in the design to allow for future extensions.
2. DATA ACQUISITION
Telemetry from the EUVE spacecraft is sent to the Center for EUV Astro-
physics (CEA) in Berkeley, California, from the Packet Processor System (Pacor)
at NASA's Goddard Space Flight Center (GSFC) in Greenbelt, Maryland. It is
received on Unix computers on a small network, the "operations" network, which,
for reasons of security, is isolated from the larger "analysis" network. Trans-
mission takes place over a dedicated 224-kilobit line, using the TCP/IP proto-
col suite with a special Pacor session-layer protocol.
CEA receives two types of telemetry: real-time and production, so named for
the ways in which they come to us from the satellite. While the spacecraft is
in contact with a TDRSS telemetry relay satellite (for about twenty minutes in
each 90-minute orbit), telemetry is received in real time. We refer to this as
real-time data. Regardless of TDRSS access, all telemetry is recorded on mag-
netic tape on board the spacecraft. Once every two orbits, during a TDRSS
contact, it is played back to the ground in bit-reversed order. This telemetry
is referred to as production data, because it goes through a "production" pro-
cess at Pacor, which re-reverses the bit order and checks for errors. An im-
portant difference between the two types is that production data comes to us
with a report about all frames containing errors, whereas the real-time data
has no such information and may contain garbage.
Both kinds of telemetry are sent to us by Pacor as "messages". Production
messages are about 8 megabytes in length; real-time messages are shorter.
Pacor treats data from orbital day as coming from a separate source from or-
bital night data. Therefore, day and night data are sent to us in separate
Pacor-protocol sessions. During each TDRSS contact, two sessions are opened
simultaneously -- one containing only real-time data from orbital day, and the
other containing only real-time data from orbital night. (Either session may
contain no telemetry data.) Production messages are sent after the TDRSS con-
tact in two consecutive sessions -- night data in the first, and day data in
the second.
Data reception is made fairly straightforward on our Unix workstations by
the Internet services daemon inetd. This program starts up specific programs
in response to connection requests on specific TCP ports. In this case, a
program called recvpacord is started up, which handles the Pacor protocol
handshaking and the creation of files to receive incoming messages. The mes-
sage files are uniquely named with the date and time they are created and the
process identification number of the recvpacord process.
By use of the standard Unix C library routines fork() and exec(), other
programs are spawned from recvpacord at various times in the data reception
cycle. Tasks that are performed in this way include making telemetry available
to multiple clients as it comes in, causing new message files to be added to
the telemetry archive, and running a suite of data analysis programs on the
new data.
3. ARCHIVING OF DATA
Early in the design of the EUVE operations software a conservative approach
was chosen: telemetry is stored exactly as received, rather than processing
it in any way before archiving it. This allows storage of all the original
information in its most compact form and allows any amount of reprocessing at
any time in the future.
The raw Pacor message files are initially archived to magnetic tape and
eventually to optical disc. Owing to limitations in the prelaunch budget, the
initial mass storage device was a tape carousel unit (rather than an optical
device) capable of holding 54 8-mm cartridge tapes. The initial design of the
software took into account that the final permanent archive was probably going
to be an optical disc jukebox, but exactly what type was not known. The seam-
less transition from a tape-carousel-only archive to a combination of tape and
optical disc was important in the design. To minimize the risk of human errors,
a high degree of automation was also one of the initial design constraints.
The archive was designed not only with the intention of providing permanent
storage for the telemetry but also as a means of providing online access to
the data.
The archive makes use of a relational database for tracking all the archive-
associated information, and the database also serves as a convenient means of
interprocess communication. Various user applications, as well as applications
that run automatically in response to incoming Pacor messages, add and retrieve
telemetry to and from the archive by creating or updating entries in the data-
base. The archive daemon, archived (pronounced "archive-dee"), handles the
actual reading from and writing to the physical archive. It periodically
checks the database for files that are to be added or retrieved.
Execution of the archive subsystem is triggered by recvpacord when a message
file is closed. The recvpacord spawns a process that creates an entry in the
database containing the message file's name and location and an "archive pend-
ing" flag indicating that the message file needs to be physically added to the
archive.
3.1 Archive Database
The database is central to all facets of the archive. It contains over a
dozen tables, of which about half support what is called the Comprehensive
Telemetry Indexing System (CTIS). The CTIS is used primarily to access tele-
metry that has already been written to tape or optical disc and, as such, is
discussed in detail in section 4. Here the focus is on the six tables that
compose the core of the archive itself.
The first two tables, slots and tapes, are used to manage the tape carousel
directly. The next two, files and locations, identify the files themselves and
where they are located on tape and/or disc. The last two, cache and open, are
used in managing the magnetic disk cache.
3.2 Interfaces and Libraries
The jbl.h, tapecarousel.h, archive.h, and transfer.h interfaces and libra-
ries provide the essential support necessary for the foundation of the tele-
metry archival subsystem.
The jbl.h interface and library initially came with the tape carousel unit
when it was purchased. We have modified the hardware configuration from con-
nection to a single host, with the carousel's two tape drives on a common SCSI
bus, to tape drives on separate SCSI busses -- one attached to a host on the
operations network, and the other to a host on the analysis network. This
nonstandard configuration allows us to transfer files via the tape carousel
from the operations network (where telemetry is received) to the analysis
network (where the telemetry is transferred to optical storage and made
available for analysis). The host on the operations network commands the
tape carousel's robotics through the carousel's single serial port, using
a simple low-level ASCII protocol. The unmodified jbl.h interface and library
export this simple protocol in so far as it is necessary to communicate with
the tape carousel unit. Our extensions allow the analysis-network host to
make carousel requests, via a secure serial line to the operations-network
host, which controls the carousel. On that host, a daemon we have written,
called rjbl, relays the requests to the carousel device, returns the resulting
responses, and handles contention issues between processes competing for
carousel access.
The tapecarousel.h interface and library export functionality that supports
initializing, writing and reading tapes in the tape carousel at a more abstract
level than that provided by the jbl.h interface, namely, at the tape level.
For example, one routine supports writing a file to a tape specified by its
tape identifier. Tape access is implemented through the use of the tar(1) and
mt(1) commands as well as the above-mentioned jbl.h interface. This library
provides an interface that directly manipulates tapes in the carousel without
knowing what the current network is. This allows applications to be written
that will work whether they run on the host that is directly connected to the
tape carousel unit, or the host that is indirectly connected via the serial
line and the rjbl daemon.
The archive.h interface and library export functionality that supports
writing and reading files from an archive without knowledge of the underlying
type of media. This interface is at the same level as the standard C library
of buffered input/output (I/O) routines, with the additional feature of being
able to archive a file. Most applications need interact with the archive only
at the level provided by this interface. Only the servers and daemons that
do the actual reading and writing to the various types of media need to make
use of the jbl.h and tapecarousel.h interfaces and libraries.
The transfer.h interface and library export the functionality that supports
transferring files from one network to another without knowledge of the under-
lying hardware. The present implementation of this library is built on top
of the tape carousel library. The library provides the functionality necessary
to move files to be transferred
(1) into the spool location,
(2) from the spool location to the transfer medium, and
(3) from the transfer medium to the spool location.
3.3 Applications
The notifytmarchive, archived, and rjbl applications along with the transfer
applications implement the basic telemetry archival subsystem.
The notifytmarchive application notifies the archive daemon, archived, about
new telemetry data. It makes use of the archive.h interface to submit a re-
quest to archive a file. Such a request is implemented simply by adding an
entry to the files table. This application is automatically invoked whenever
a production message is successfully closed, and whenever a real-time message
is either closed or aborted.
The archive daemon, archived, is the main application of the archive. It
uses most of the above-mentioned interfaces and libraries to accomplish all
physical access to the archive media. Its four main functions are writing new
files to the archive media, extracting files from archive media onto the mag-
netic disk cache, copying files from tape to optical disc storage, and main-
taining free space in the magnetic disk cache. It does all these things in
response to
(1) the contents of database tables and
(2) the space available in the cache.
The rjbl server runs on the host that is directly connected to the tape
carousel command port. It uses the jbl.h and tapecarousel.h interfaces and
libraries to support requests from the analysis network to command the tape
carousel unit.
The gettransferfiles and sendtransferfiles applications implement the in-
ternetwork transfer facility. They are run automatically twice a day by the
Unix cron(8) facility. They move files from the transfer medium to the spool
location and from the spool location to the transfer medium.
The getarchivedbupdates and putarchivedbupdates applications are also run
by cron. They use gettransferfiles, sendtransferfiles, and database query
language to update the analysis network's copy of the archive database with
the latest information that has been added on the operations network.
3.4 Discussion
Current problems with the archive revolve around shortcomings in either the
hardware or system software. We continue to have hardware problems with both
the tape carousel unit and the optical disc jukebox.
Also, we continue to have instances where the standard Unix mt and tar com-
mands fail without indicating a failure in their exit status. The tapecarousel
library has been upgraded to be distrustful of the mt command by verifying the
status after each request to reposition the tape. It has been proposed to make
it distrustful of the tar command as well, but this precaution has not yet been
implemented. Initially, independent copies of the telemetry data were made by
the analysis staff, since the dual network support and optical jukebox were not
available at the time of launch. The operations staff continues to make these
independent copies, now as a hedge against multiple catastrophic failures in
the automated system. We have had to use this independent copy of the tele-
metry on several occasions to reconstruct magnetic tapes that either became
unreadable or were completely destroyed by the hardware. To limit access delays
and tape wear, we use only a quarter of the capacity of each 8 mm tape.
4. ACCESS TO THE ARCHIVE
In designing a system for accessing the telemetry archive, we chose to allow
the user to request data by a series of contiguous time intervals, specified as
major frame numbers. The index into the telemetry data based on time is called
the Comprehensive Telemetry Indexing System (referred to in section 3.1). It
is implemented as an extension to the archive database.
4.1 CTIS Extensions to the Archive Database
The CTIS is designed to allow random access to the underlying telemetry data
without modifying the storage or contents of any of the message files. It es-
sentially provides a high-level index that enables random access to the whole
set of telemetry, removing duplicate data. The CTIS contains all the index
information for each message of telemetry and contains maps for various types
of telemetry. A map of telemetry eliminates overlaps that occur when a section
of telemetry is sent to us more than once. If duplicate frames of telemetry
exist, only one is entered into the map. The map provides time-ordered access
to the telemetry and also differentiates between real-time and production tele-
metry. It is also possible to create maps that have different meanings. For
example, the two maps we currently build select between overlapping data on a
first-entered basis, but it is also possible to build a map of the best-quality
data. A best-quality map is more complex and may be implemented in the future.
A map consists of intervals and gaps. Each interval represents a contiguous,
monotonically increasing sequence of major frames of telemetry contained within
a single message; each gap represents a contiguous, monotonically increasing
sequence of major frames of telemetry that are not currently available. Inter-
vals have associated with them a map identifier, message identifier, interval
number and first and last major frame numbers, while gaps have associated with
them a map identifier and first and last frame numbers. Although the gap in-
formation can be inferred from the interval information, it is faster to main-
tain and look up the gap information than it is to have to infer it from the
interval information.
The mapping that allows access to telemetry stored in message files, based
on arbitrary spans of time specified in frame numbers, is provided through five
database tables. They are named ctismaps, messages, intervals, mapintervals
and mapgaps.
The ctismaps table provides an association between a map identifier and the
type of data the map represents. Type is defined, essentially, by the source
of the data and whether it is real-time or production.
The messages table is a list of all the messages that are in the CTIS. Each
entry provides an association between a message identifier, its archive file
identifier, and the type of data the message contains.
The intervals table lists all the telemetry intervals that are in message
files. Each entry provides an association between an interval identifier and
the contiguous span of time the interval contains.
The mapintervals table lists all contiguous time intervals of telemetry that
have a place in a map and are from a single interval identifier. Each entry
provides an association between a contiguous span of time, the interval it is
from, and the map to which it belongs.
The mapgaps table lists all gaps in telemetry that have yet to be filled.
Each entry contains a map identifier and a span of time.
4.2 Interfaces and Libraries
The telemetryarchive.h and ctis.h interfaces and libraries provide access
to the telemetry archive. The telemetryarchive.h interface and library export
routines that read and write telemetry data in various formats and access it
from various sources. Telemetry can be accessed through this interface from
the archive, from plain files in EUVE or Pacor format, and also in real time
as it is received from Pacor on the operations network. The basic routine
exported by this interface reads a single major frame of telemetry at a time.
Most of the routines exported by this interface are used to specify which tele-
metry to read and how to read it.
The ctis.h interface and library export the routines used to update and ac-
cess all the archive database tables associated with the CTIS (see above). The
update routines take linked lists of telemetry interval objects produced by the
tmintervals.h interface and library. The access routines return linked lists
of telemetry interval objects. In the selection process, a map is first se-
lected based on data type constraints; and then intervals are selected from the
map based on starting and ending major frame numbers. This library is an ex-
ample of code compartmentalization. Although there is only a single applica-
tion that updates the CTIS and uses the update routines from this library, all
the routines that modify the CTIS tables in the database are together, so that
when the code is modified the implementor does not have to look throughout the
whole system. In other words, all the code that manipulates the CTIS database
is hidden behind an interface that strictly defines the type of access allowed.
4.3 Applications
The archivestat, archivecat, ctisupdate, ctisintervals, gettm and decommu-
tate applications provide a variety of telemetry access capabilities.
The archivestat and archivecat utilities provide the user with direct in-
formation about archived files and access to their contents. The archivestat
utility returns all the information in the files, locations, and cache tables
that pertain to a specified archive identifier. This utility can also take a
string argument and return all the archive identifiers that contain that string
in their original path names. The archivecat application reads a specified
archived file and writes it to the standard output.
The application that adds a new message to the CTIS database is ctisupdate.
It is automatically triggered off the successful archiving of a new message.
The automated scientific analysis of the telemetry is subsequently triggered
off successful execution of ctisupdate for the new message.
The ctisintervals application provides access to the information in the
CTIS. Given a message identifier, or a span of time specified as a pair of
major frame numbers, it outputs a list of telemetry intervals that are in the
CTIS from that message or within that span of time. Its primary output is in
binary form, for use in further processing. It can also generate ASCII output,
for human examination.
The gettm application is capable of accessing both formats of the telemetry
from all the available sources. It mainly provides a means by which some of
the older outdated applications can be run with the newer formats and sources.
It writes the desired telemetry data in the EUVE binary telemetry format to the
standard output. This is considered a transitory application that allows the
use of old software with new software until such time as the older applications
can be updated to use the new access mechanisms.
The decommutate application is the single most complex application in the
systems described herein, and possibly in the entire EUVE software system. It
takes telemetry input from a variety of sources in a variety of formats, and it
produces a variety of outputs. The interface to the application is quite in-
volved, and the experienced user can fine-tune the execution for performance
and speed. There are two main uses of this application in operations. First,
it is used on the payload controllers' workstations to decommutate the incoming
real-time telemetry for monitoring instrument health. The real-time telemetry
is accessed via the telemetryarchive.h interface from a server, tmrealtimed.
Second, whenever a production message is received from Pacor and the CTIS is
updated with the new message, decommutate is run on those parts of the new
message that were added to the CTIS -- i.e., those that have not been seen be-
fore. Here the telemetry is also accessed via the telemetryarchive.h interface,
but it is acquired from the archive itself, rather than from the real-time
telemetry server. Typically, the telemetry data message is in the magnetic
disk cache but, if not, it is automatically retrieved from one of the permanent
media.
4.4 Discussion
Presently, the biggest problem with access is the fact that we appear to be
I/O bound. The details of the optical jukebox and network bandwidth limita-
tions are under investigation. We hope to be able to improve the performance.
Currently, no quotas or limits are set on the number of processes competing for
access to the telemetry data via the optical jukebox. This has, occasionally,
resulted in slow response times spread across dozens of contending processes.
The tape carousel access is extremely slow on average. It can take up to forty
minutes to retrieve a file from a tape with more than a gigabyte of data.
A graphical user interface to the decommutate application is planned in the
short term. This application will make using the decommutate application user-
friendly and intuitive. Presently the typical user does not take full advan-
tage of several of the key performance-oriented features of the application.
5. CONCLUSION
The EUVE telemetry archive system automatically stores telemetry exactly as
it is received, to leave all processing options open. User access to the ar-
chived telemetry is provided in a stream-oriented fashion based on time. The
(user-transparent) mapping of time intervals to file locations on storage media
is implemented with a relational database. Any programmer can build archive
access directly into his or her program by use of well-defined interfaces and
libraries.
Page created by webmastr@cea.berkeley.edu
Last modified 10/14/98