How to Contribute Data

Overview

IRSA archives data that has been provided by NASA missions, as well as science teams that have created enhanced data products from NASA missions or closely-related ancillary data. This page provides information about the criteria for IRSA to consider hosting a data set, the cost of data archiving, and how to format data and metadata to ensure seamless ingestion into IRSA data services.

What types of data can IRSA Archive?

NASA Mission Data

IRSA ingests data collected by NASA Astrophysics Flagships, Explorers, and suborbital and suborbital-class investigations that are part of the Astrophysics Research and Analysis program (APRA).

If you are writing an APRA proposal, we encourage you to contact us through the IRSA Help Desk to discuss the possibility of hosting your data at IRSA. We will discuss the data services we can offer users, from program-friendly application program interfaces (APIs) to user-friendly graphical user interfaces, and we can provide you with a Letter of Support. It is best to contact us well in advance of your proposal deadline to allow time for these discussions, ideally two months before you need to submit your proposal.

Enhanced Contributed Data

IRSA also ingests enhanced or high-level data products that are contributed by researchers. Contributing data to IRSA ensures long-term accessibility through both interactive graphical interfaces and standard application program interfaces (APIs). IRSA will also provide Digital Object Identifiers (DOIs) for contributed data sets to increase discoverability of the data through the literature. Enhanced contributed data sets tend to get used in publications at higher rates than low-level data products (see, e.g., Scire et al. 2022). Contributed data products must meet the following criteria in order for IRSA to ingest and serve them:

  1. The raw data on which the contributed products are based have been collected by a NASA mission with an archive at IRSA, or directly support the scientific utilization of data sets hosted at IRSA.
  2. The contributed data products must be described in a peer-reviewed refereed journal article.

If you plan to deliver data to IRSA, please contact us through the IRSA Help Desk early in the process of preparing your data products. We will guide you through the metadata standards and data formats that are required, which are described below. It is often helpful to provide sample data products to make sure there are no issues with the data, formatting, or documentation. Please be aware that archiving data takes time, so there will be some delay between delivery and the data being publicly accessible at IRSA.

Cost of Data Archiving

IRSA receives funding from NASA to archive the data sets described above. For mission data, there is sometimes a cost to the mission as well. For costing information, please contact the IRSA Help Desk.

Required Documentation of Data Products

Contributed data sets must be documented clearly. This documentation can take the form of a journal article and/or a delivery document. Journal articles referring to the data to be archived at IRSA should say something similar to "The data will soon be available at the NASA/IPAC Infrared Science Archive". In addition, IRSA will provide a Digital Object Identifier (DOI) for each contributed data set, and this should be used to provide a reference and link to the data from any publication.

The documentation should clearly explain everything a potential user needs to know in order to make use of the data.

For catalog data, the attributes of the columns in a table, such as units, descriptions, data types, data format (precision), and how null values are represented in the table should all be specified. Any relevant details, such as how certain quantities were calculated, what various flag values mean, etc., should also be included.

Documentation for images and spectra should describe how the data were taken and processed. If multiple products are present for a given observation, the documentation should explain what the differences and use cases are. For example, one filtering algorithm optimizes for point sources, another for extended sources. Any additional products such as uncertainty maps, masks, coverage maps, etc. should be clearly identified.

Timeline for Data Releases

The time it takes to process and release contributed datasets can vary from as little as a month up to several months, depending both on the volume and complexity of the dataset and what other datasets IRSA may already be committed to deliver. If a particular release date is critical, then it is best to contact the IRSA Help Desk to work on a schedule with IRSA staff as early as possible.

Tabular Data Formats

Tabular data should be delivered to IRSA in IPAC Table Format, an ASCII format in which columns are aligned in fixed-width records, free of mark-up and escape codes. IRSA's Table Validator Service tool converts ASCII files with common formats, such as comma-delimited, to IPAC table format. The tool removes common escape sequences that encode for tab stops etc., reformats misaligned records, and validates the consistency of the data in each column. It should be considered as an aid to validation as it does not guarantee detection and correction of all faults. The Table Upload Help discusses tables which can be validated/used by IRSA. There is a 2 GB file limit (set by Apache) on tables uploaded to this service.

For tables that will be served through IRSA's catalog search, there are currently some additional restrictions:

Image Data Formats

Images must comply with the definition of the FITS standard and represent the image footprint in the World Coordinate System (WCS). IRSA requires that all FITS files be read with the CFITSIO library (available from the FITS Support Office at NASA Goddard Space Flight Center) and with WCStools (from the Smithsonian Astrophysical Observatory). IRSA cannot accept files that use custom variants of the standard or that must be read with special software.

To comply with International Virtual Observatory Alliance standards that enable data discovery, as well as to support data visualization and exploration, we ask data contributors to include these keywords in the images headers.

IRSA has an on-line image validation tool that validates the syntax and completeness of FITS files and aids in validating the astrometric accuracy of images. Users uploading images to this service must do so one at a time as bandwidth and Apache limitations prevent uploading of large image data sets. When images have been generated in a pipeline, spot checks of individual files generally suffice to reveal the defects present throughout the data set. Alternatively, users may download tools that perform the same functionality and can be run on collections of files - see the descriptions below.

FITS Syntax Validation

The fverify tool (developed at HEASARC as part of the FTOOLS bundle) validates the syntax of FITS files. It reports violations of the FITS syntax as errors or warnings. The errors at least must be corrected. IRSA advises that warnings be corrected as well. Run "fverify" as follows:

WCS Validation

The IRSA tool "mImgtbl" checks the validity and completeness of the WCS header information in FITS images. 'mImgtbl' is part of the Montage image mosaic toolkit, and must be run on a local machine. Download the Montage distribution, and build it according to these instructions.

"mImgtbl" can be run recursively down directory tree paths, on any number of images; the output is an ASCII IPAC table file reporting the WCS keywords for each image. To run "mImgtbl" recursively in the current directory with an output file called images.tbl, use this command:

If any of the WCS keywords in any of the files is invalid, the STDOUT message will indicate the number of defective files found. If this number is greater than zero, rerun the tool using the same command with the debug flag ('-d') to get more detailed information on each file.

Complete documentation on 'mImgtbl' and its return messages is available here.

There are several tools available to fix FITS image headers. IRAF has the tool 'hedit'. FTOOLS allows editing of FITS headers within 'FV/GUIs' (FITS Viewer). IRSA has two tools available as part of the Montage distribution. ' mGetHdr' returns the WCS keywords in an ASCII file. After editing this file in any editor, it can be input to ' mPutHdr', which will write the keywords back into the FITS file.

Astrometric Validation

The simplest way of assessing the astrometric accuracy in an image file is to compare the positions of point sources in the image with those of sources in a catalog of high astrometric accuracy. One quick way to overlay sources from the 2MASS All-Sky Point Source Catalog on the image is with the Image Validator.

The National Virtual Observatory (NVO) has released a tool that will repair FITS files with misaligned astrometry - WCS Fixer, hosted by NOAO.

Spectral Data Formats

1-D spectra may be served as either FITS or (preferred) ASCII tables (in IPAC table format). Delivery in both formats is also acceptable. The slit center coordinate and position angle should be in the header as keywords (SLT_RA, SLT_DEC and SLT_PA). Coordinates should always be in decimal degrees (J2000), per the WCS standard. For validation purposes, 1-D ASCII spectra files can be handled as catalogs - please refer to "Validation of Tabular Data" above. To validate 1-D FITS spectra, please use "fverify" discussed above.

2-D spectra are best served as FITS images, and 3-D spectra are best served as FITS Data Cubes. Standard WCS information should be in the header. These files can be validated using "fverify" as discussed above.

Data Transfer

Once validated and documented, the data may be transferred to IRSA using any convenient mechanism, including but not limited to anonymous sftp, dropbox, Google drive, or similar services. Please contact the IRSA Help Desk to coordinate the transfer.

Last updated: 2021-09-15