IRSA Data Validation Tools

[ Overview | QA of Catalogs | QA of Images | QA of Spectra ]

Overview

IRSA has defined standards for the structure and organization of tabular data (source catalogs, metadata tables, spectra), images and spectra. Data planned for ingestion into the archive should follow these standards, which are intended to allow easy comprehension of the data by users. This page describes tools that aid providers in validating the structure, format and content of tables, images and spectra before delivery to IRSA. For those tools provided by IRSA, please contact the IRSA Help Desk with questions and comments. Please direct comments on third-party tools to the tool provider.

Validation of Tabular Data

Tabular data should be delivered to IRSA in IPAC Table Format, an ASCII format in which columns are aligned in fixed-width records, free of mark-up and escape codes. IRSA's Table Reformat and Validation Service tool converts ASCII files to IPAC table format; the tool accepts input tables with common formatting schemes such as pipe or comma delimiters. The tool removes common escape sequences that encode for tab stops etc., reformats misaligned records, and validates the consistency of the data in each column. It should be considered as an aid to validation as it does not guarantee detection and correction of all faults: the documentation discusses tables which can be validated/used by IRSA. There is a 2 GB file limit (set by Apache) on tables uploaded to the service.

For large tables that will be served through IRSA's query engines, there are additional requirements. Column names must not start with a number; "2MASS_J" would lead to a syntax error, "TWOMASS_J" is acceptable. Coordinates must be "ra" and "dec" - equatorial coordinates, with lower case column names. See Section 2 of the DBMS constraints documentation for details. In addition, a "data dictionary" must accompany the tabular data file. The data dictionary specifies the attributes of the columns in a table, such as units, descriptions, and data types, and defines how null values are represented in the table. The data dictionary (dd) must be in IPAC table format. IRSA offers a web based tool, DDGEN, that builds a data dictionary using input from the ASCII table header; instructions and examples can be found here.

Validation of Images

Images must comply with the definition of the FITS standard and represent the image footprint in the World Coordinate System (WCS). IRSA requires that all FITS files be read with the CFITSIO library (available from the FITS Support Office at NASA Goddard Space Flight Center) and with WCStools (from the Smithsonian Astrophysical Observatory). IRSA cannot accept files that use custom variants of the standard or that must be read with special software.

IRSA has an on-line image validation tool that validates the syntax and completeness of FITS files and aids in validating the astrometric accuracy of images. Users uploading images to this service must do so one at a time as bandwidth and Apache limitations prevent uploading of large image data sets. When images have been generated in a pipeline, spot checks of individual files generally suffice to reveal the defects present throughout the data set. Alternatively, users may download tools that perform the same functionality and can be run on collections of files - see the descriptions below.

FITS Syntax Validation

The fverify tool (developed at HEASARC as part of the FTOOLS bundle) validates the syntax of FITS files. It reports violations of the FITS syntax as errors or warnings. The errors at least must be corrected. IRSA advises that warnings be corrected as well. Run "fverify" as follows:

WCS Validation

The IRSA tool "mImgtbl" checks the validity and completeness of the WCS header information in FITS images. 'mImgtbl' is part of the Montage image mosaic toolkit, and must be run on a local machine. Download the Montage distribution, and build it according to these instructions.

"mImgtbl" can be run recursively down directory tree paths, on any number of images; the output is an ASCII IPAC table file reporting the WCS keywords for each image. To run "mImgtbl" recursively in the current directory with an output file called images.tbl, use this command:

If any of the WCS keywords in any of the files is invalid, the STDOUT message will indicate the number of defective files found. If this number is greater than zero, rerun the tool using the same command with the debug flag ('-d') to get more detailed information on each file.

Complete documentation on 'mImgtbl' and its return messages is available here.

There are several tools available to fix FITS image headers. IRAF has the tool 'hedit'. FTOOLS allows editing of FITS headers within 'FV/GUIs' (FITS Viewer). IRSA has two tools available as part of the Montage distribution. ' mGetHdr' returns the WCS keywords in an ASCII file. After editing this file in any editor, it can be input to ' mPutHdr', which will write the keywords back into the FITS file.

Astrometric Validation

The simplest way of assessing the astrometric accuracy in an image file is to compare the positions of point sources in the image with those of sources in a catalog of high astrometric accuracy. IRSA recommends overlaying sources from the 2MASS All-Sky Point Source Catalog on the image with the OASIS visualizer. It can be run as an applet in all common browsers (once the Java plug-in is installed). The OASIS Getting Started Guide describes how to upload the image and overlay the 2MASS point sources.

The National Virtual Observatory (NVO) has released two tools that will repair FITS files with misaligned astrometry - WCS Fixer, hosted by NOAO, and the WCS Correction Service, hosted by the University of Pittsburgh.

Validation of Spectra

1-D spectra may be served as either FITS or (preferred) ASCII tables (in IPAC table format). Delivery in both formats is also acceptable. The slit center coordinate and position angle should be in the header as keywords (SLT_RA, SLT_DEC and SLT_PA). Coordinates should always be in decimal degrees (J2000), per the WCS standard. For validation purposes, 1-D ASCII spectra files can be handled as catalogs - please refer to "Validation of Tabular Data" above. To validate 1-D FITS spectra, please use "fverify" discussed above.

2-D spectra are best served as FITS images, and 3-D spectra are best served as FITS Data Cubes. Standard WCS information should be in the header. These files can be validated using "fverify" as discussed above.