IRSA Data Validation Tools

[ Overview | QA of Catalogs | QA of Images | QA of Spectra ]

Version 1.3 - 10/23/15

Overview

IRSA has defined standards for the structure and organization of tabular data (source catalogs, metadata tables, spectra), images and spectra. Data planned for ingestion into the archive should follow these standards, which are intended to allow easy comprehension of the data by users. This page describes tools that aid providers in validating the structure, format and content of tables, images and spectra before delivery to IRSA. For those tools provided by IRSA, please contact the IRSA Help Desk with questions and comments. Please direct comments on third-party tools to the tool provider.

Validation of Tabular Data

Tabular data should be delivered to IRSA in IPAC Table Format, an ASCII format in which columns are aligned in fixed-width records, free of mark-up and escape codes. IRSA's Table Validator Service tool converts ASCII files with common formats, such as comma-delimited, to IPAC table format. The tool removes common escape sequences that encode for tab stops etc., reformats misaligned records, and validates the consistency of the data in each column. It should be considered as an aid to validation as it does not guarantee detection and correction of all faults. The Table Upload Help discusses tables which can be validated/used by IRSA. There is a 2 GB file limit (set by Apache) on tables uploaded to this service.

For tables that will be served through IRSA's catalog search, there are currently some additional restrictions. Coordinates should be equatorial in J2000 decimal degrees, with lower case column names "ra" and "dec". Column names must not start with a number; "2MASS_J" would lead to a syntax error, "TWOMASS_J" is acceptable. Column names must use only letters, numbers, and underscores; "B-V" or "3.6mag" would lead to a syntax error, "BmV" and "3_6mag" are acceptable. Many names are reserved by the database and cannot be used as column names. Examples include "array", "blob", "cluster", "comment", "complete", "cycle", "datafile", "date", "desc", "double", "group", "identified", "index", "key", "level", "limit", "max", "min", "member", "mode", "object", "order", "primary", "ref", "references", "sample", "size", "structure", "time", "type", "validation", "x", "y", "year", "z", "zone" (full list).

In addition, a "data dictionary" must accompany the tabular data file. The data dictionary specifies the attributes of the columns in a table, such as units, descriptions, and data types, and defines how null values are represented in the table. The data dictionary (dd) must be in IPAC table format. IRSA offers a web based tool, DDGEN, that builds a data dictionary using input from the ASCII table header; instructions and examples can be found here.

Validation of Images

Images must comply with the definition of the FITS standard and represent the image footprint in the World Coordinate System (WCS). IRSA requires that all FITS files be read with the CFITSIO library (available from the FITS Support Office at NASA Goddard Space Flight Center) and with WCStools (from the Smithsonian Astrophysical Observatory). IRSA cannot accept files that use custom variants of the standard or that must be read with special software.

IRSA has an on-line image validation tool that validates the syntax and completeness of FITS files and aids in validating the astrometric accuracy of images. Users uploading images to this service must do so one at a time as bandwidth and Apache limitations prevent uploading of large image data sets. When images have been generated in a pipeline, spot checks of individual files generally suffice to reveal the defects present throughout the data set. Alternatively, users may download tools that perform the same functionality and can be run on collections of files - see the descriptions below.

FITS Syntax Validation

The fverify tool (developed at HEASARC as part of the FTOOLS bundle) validates the syntax of FITS files. It reports violations of the FITS syntax as errors or warnings. The errors at least must be corrected. IRSA advises that warnings be corrected as well. Run "fverify" as follows:

WCS Validation

The IRSA tool "mImgtbl" checks the validity and completeness of the WCS header information in FITS images. 'mImgtbl' is part of the Montage image mosaic toolkit, and must be run on a local machine. Download the Montage distribution, and build it according to these instructions.

"mImgtbl" can be run recursively down directory tree paths, on any number of images; the output is an ASCII IPAC table file reporting the WCS keywords for each image. To run "mImgtbl" recursively in the current directory with an output file called images.tbl, use this command:

If any of the WCS keywords in any of the files is invalid, the STDOUT message will indicate the number of defective files found. If this number is greater than zero, rerun the tool using the same command with the debug flag ('-d') to get more detailed information on each file.

Complete documentation on 'mImgtbl' and its return messages is available here.

There are several tools available to fix FITS image headers. IRAF has the tool 'hedit'. FTOOLS allows editing of FITS headers within 'FV/GUIs' (FITS Viewer). IRSA has two tools available as part of the Montage distribution. ' mGetHdr' returns the WCS keywords in an ASCII file. After editing this file in any editor, it can be input to ' mPutHdr', which will write the keywords back into the FITS file.

Astrometric Validation

The simplest way of assessing the astrometric accuracy in an image file is to compare the positions of point sources in the image with those of sources in a catalog of high astrometric accuracy. One quick way to overlay sources from the 2MASS All-Sky Point Source Catalog on the image is with the Image Validator.

The National Virtual Observatory (NVO) has released a tool that will repair FITS files with misaligned astrometry - WCS Fixer, hosted by NOAO.

Validation of Spectra

1-D spectra may be served as either FITS or (preferred) ASCII tables (in IPAC table format). Delivery in both formats is also acceptable. The slit center coordinate and position angle should be in the header as keywords (SLT_RA, SLT_DEC and SLT_PA). Coordinates should always be in decimal degrees (J2000), per the WCS standard. For validation purposes, 1-D ASCII spectra files can be handled as catalogs - please refer to "Validation of Tabular Data" above. To validate 1-D FITS spectra, please use "fverify" discussed above.

2-D spectra are best served as FITS images, and 3-D spectra are best served as FITS Data Cubes. Standard WCS information should be in the header. These files can be validated using "fverify" as discussed above.