IRSA Data Validation Tools
OverviewIRSA has defined standards for the structure and organization of tabular data (source catalogs, metadata tables, spectra), images and spectra. Data planned for ingestion into the archive should follow these standards, which are intended to allow easy comprehension of the data by users. This page describes tools that aid providers in validating the structure, format and content of tables, images and spectra before delivery to IRSA. For those tools provided by IRSA, please contact the IRSA Help Desk with questions and comments. Please direct comments on third-party tools to the tool provider.
Validation of Tabular DataTabular data should be delivered to IRSA in IPAC Table Format, an ASCII format in which columns are aligned in fixed-width records, free of mark-up and escape codes. IRSA's Table Reformat and Validation Service tool converts ASCII files to IPAC table format; the tool accepts input tables with common formatting schemes such as pipe or comma delimiters. The tool removes common escape sequences that encode for tab stops etc., reformats misaligned records, and validates the consistency of the data in each column. It should be considered as an aid to validation as it does not guarantee detection and correction of all faults: the documentation discusses tables which can be validated/used by IRSA. There is a 2 GB file limit (set by Apache) on tables uploaded to the service. For large tables that will be served through IRSA's query engines, there are additional requirements. Column names must not start with a number; "2MASS_J" would lead to a syntax error, "TWOMASS_J" is acceptable. Coordinates must be "ra" and "dec" - equatorial coordinates, with lower case column names. See Section 2 of the DBMS constraints documentation for details. In addition, a "data dictionary" must accompany the tabular data file. The data dictionary specifies the attributes of the columns in a table, such as units, descriptions, and data types, and defines how null values are represented in the table. The data dictionary (dd) must be in IPAC table format. IRSA offers a web based tool, DDGEN, that builds a data dictionary using input from the ASCII table header; instructions and examples can be found here.
Validation of ImagesImages must comply with the definition of the FITS standard and represent the image footprint in the World Coordinate System (WCS). IRSA requires that all FITS files be read with the CFITSIO library (available from the FITS Support Office at NASA Goddard Space Flight Center) and with WCStools (from the Smithsonian Astrophysical Observatory). IRSA cannot accept files that use custom variants of the standard or that must be read with special software. IRSA has an on-line image validation tool that validates the syntax and completeness of FITS files and aids in validating the astrometric accuracy of images. Users uploading images to this service must do so one at a time as bandwidth and Apache limitations prevent uploading of large image data sets. When images have been generated in a pipeline, spot checks of individual files generally suffice to reveal the defects present throughout the data set. Alternatively, users may download tools that perform the same functionality and can be run on collections of files - see the descriptions below.
FITS Syntax ValidationThe fverify tool (developed at HEASARC as part of the FTOOLS bundle) validates the syntax of FITS files. It reports violations of the FITS syntax as errors or warnings. The errors at least must be corrected. IRSA advises that warnings be corrected as well. Run "fverify" as follows:
- Download and install HEASARC's FTOOLS-Futils distribution, the FITS "tool store"; run "fverify" on the command line
- Upload the FITS file (maximim allowed file size is 50MB) to an on-line interface to fverify
- Accces fverify through Hera -- HEASARC's online interface to FTOOLS services
WCS ValidationThe IRSA tool "mImgtbl" checks the validity and completeness of the WCS header information in FITS images. 'mImgtbl' is part of the Montage image mosaic toolkit, and must be run on a local machine. Download the Montage distribution, and build it according to these instructions. "mImgtbl" can be run recursively down directory tree paths, on any number of images; the output is an ASCII IPAC table file reporting the WCS keywords for each image. To run "mImgtbl" recursively in the current directory with an output file called images.tbl, use this command:
- mImgtbl -r -c ./ images.tbl
If any of the WCS keywords in any of the files is invalid, the STDOUT message will indicate the number of defective files found. If this number is greater than zero, rerun the tool using the same command with the debug flag ('-d') to get more detailed information on each file.Complete documentation on 'mImgtbl' and its return messages is available here. There are several tools available to fix FITS image headers. IRAF has the tool 'hedit'. FTOOLS allows editing of FITS headers within 'FV/GUIs' (FITS Viewer). IRSA has two tools available as part of the Montage distribution. ' mGetHdr' returns the WCS keywords in an ASCII file. After editing this file in any editor, it can be input to ' mPutHdr', which will write the keywords back into the FITS file.