2MASS All-Sky DVD Data Release README


1. INTRODUCTION

The 2MASS All-Sky Data Release contains highly uniform near infrared Catalog and Image data covering 99.998% of the sky derived from observations made by the Survey's 1.3 m telescopes on Mt. Hopkins, Arizona and Cerro Tololo, Chile. The All-Sky Release data products include a Point Source Catalog (PSC) containing positions and photometry for 470,992,970 objects, a Scan Information Table (SCN) containing information common to the 59,731 Survey scans, an Extended Source Catalog (XSC) containing positions, photometry, and basic shape information for 1,647,599 spatially extended sources, most of which are galaxies, and an Image Atlas containing over 4,121,439 J, H, and Ks FITS images covering the sky. This DVD set contains the Point Source Catalog, the Scan Information Table, and the Extended Source Catalog. The Atlas Images are not contained on these disks, but are available via on-line the services of the Infrared Science Archive. The construction, contents and formats of the 2MASS data products are described in the online Explanatory Supplement. Format description files extracted from the on-line Explanatory Supplement are included in this DVD set for the convenience of users who do not have on-line access. The hyperlinks in the DVD format description files are not functional.

The data volume of 2MASS All-Sky Release Catalogs is so large that it is not practical to access the data with screen editors. For this reason the data on these DVDs have been formatted in a manner consistent with convenient loading into a database server which is the appropriate means for accessing large amounts of data. The data format and schemas on these DVDs are suitable without modification for use with Postgres version 7.3.2, an open source database server that is available without charge. The data and schemas on these DVDs can easily be modified for use with other database servers. Approximately 152 GB of disk space is required for the loading of the 2MASS Point Source Catalog into a Postgres table. The point source data files on these DVDs require approximately 43 GB of space if copied to a disk directory. The user may find it convenient to copy the data files from these DVDs to disk before loading the data into a Postgres table.


2. DATA FORMAT

The data format of the catalog and table files is ascii text. The files contain one line of data for each 2MASS source or scan. Each line consists of data fields delimited by the vertical bar | character. The lines of data have no leading delimiter and no trailing delimiter. Every line in the Point Source Catalog data files has exactly the same number of delimiters. The same is true for data in Extended Source Catalog and Scan Information Table files. The number of delimiters in the Extended Source files is much larger than the number of delimiters in the Point Source files because many more columns of data are required to characterize an extended source. Nulls are represented by \N. Null means no information. The users of database servers that are incompatible with the delimiter or null characters that have been chosen for this data release may find that piping the files through a stream editor such as SED will be a practical choice for format conversion.


3. SCHEMAS

A schema is a file that provides a database server with instructions about the names, ordering, and internal representations of columns of data. The schemas on these DVDs can be viewed and modified with a text editor. As the 2MASS data volume is large the user is advised to choose internal server representations for the data that are consistent with the precision of the data. The Postgres conventions for data representation names in the recommended schemas are as follows:

real 4 byte floating point
double precision 8 byte floating point
smallint 2 byte signed integer
integer 4 byte signed integer
bigint 8 byte signed integer (only used in verification queries, not in data loading)
character(n) character field with exactly n characters

The user is cautioned that dec may be a reserved word and thus not suitable as a column name. For this reason declination is abbreviated as decl in the schemas. Users may need to edit these schemas to make them compatible with servers other than Postgres. For debugging purposes it may be useful to initially load a small sample of data with all columns cast as data type text.

The order of the columns in the schemas is identical to the order of the columns in the files format_psc.html, format_scn.html, and format_xsc.html which describe for the user the meaning of the data. Updated versions of these files with functional links to additional information are available in the online Explanatory Supplement.


4. ORDERING OF THE DATA

The data retrieved from a query do not depend on the order in which lines of data are loaded into a table. The speed with which data are retrieved may be dependent on the order of loading and the nature of the query. For the Point Source Catalog table an ordering that places sources that are within a few minutes (of arc) of each other on the sky close to each other in the data files has been chosen. This choice may enhance the performance of table-table positional correlations and searches. Once a table is loaded a database user can order and index it to optimize performance for the type of query that is performed most frequently.

The Point Source Catalog has been ordered in 0.1 degree declination bins starting at -90.0 degrees. Within each declination bin the sources are in order of increasing right ascension. Sources with declination < 0.0 degrees are contained in 57 gzipped files (psc_aaa.gz to psc_ace.gz). Sources with declination > 0.0 degrees are contained in 35 gzipped files (psc_baa.gz to psc_bbi.gz). The declination bins may span file boundaries except at 0.0 degrees.

The Scan Information Table is in order of scan_key.

The Extended Source Catalog is in order of declination beginning at -90.0 degrees. It has been divided into two files, xsc_aaa which contains sources with declination < 0.0 degrees and xsc_baa which contains sources with declination > 0.0 degrees.

The files are named so that zcat psc_*   |, zcat scn_*   | or zcat xsc_*   | will load the files in a consistent manner.


5. PRIMARY KEYS AND UNIQUE OBJECT IDENTIFIERS

A column in a table that has data that are never null and are not duplicated elsewhere in that column in the table may be used to uniquely identify lines. Pts_key, scan_key, and ext_key are integer columns in the Point Source, Scan Information, and Extended Source tables that uniquely identify lines in the respective tables. Note that, although pts_key, scan_key, and ext_key may be present in more than one table, pts_key is only a unique identifier for the Point Source table, scan_key is only a unique identifier in the Scan Information table, and ext_key is only a unique identifier for the Extended Source table. Some database servers assign a hidden unique object identifier (oid) to lines in tables. The user may wish to suppress the generation of oids as pts_key, scan_key, and ext_key can serve to identify lines uniquely. These columns are used to make joins between tables. The declaration of a column as a primary key may slow the loading of the data considerably because of concurrent indexing.


6. PRACTICE FILES

The practice file directory contains practice data files that are sufficiently small to be viewed with a screen editor. It also contains sample data output for checking.


7. LOADING PRACTICE DATA INTO A POSTGRES TABLE

It is assumed that the user has installed Postgres 7.3.2 and set up appropriate permissions and that a database named wsdb has been created with the creatdb wsdb command entered at the console (not postgres) prompt. Change directory to the practice directory on DVD 1 side A. Type the command:

    psql   -f   test_psc_schema wsdb

This command will first drop any existing table named test_psc (if test_psc does not exist an error message which can be disregarded will be issued). To load the data type:

    cat test_psc   |   psql -c "copy test_psc from stdin with delimiter '|' " wsdb

When the loading has been completed type:

    psql -c "select * from test_psc" wsdb

The result of this query can be compared with the files test_psc_query.txt or test_psc_query.html to determine if the data have been loaded correctly.

To check the loading of nulls type:

    psql -c "select count(*) from test_psc where j_cmsig is null" wsdb

This query should return with a count of 1.

The Extended Source Catalog can be loaded in exactly the same manner by substituting xsc for psc in the above instructions.


8. LOADING THE 2MASS CATALOGS

Change directory to the top directory of DVD 1 SIDE A. To load the twomass_psc schema type:

    psql -f   twomass_psc_schema wsdb

Any existing table named twomass_psc should be dropped before attempting to execute the above command. The practice schemas included a drop table command. Production schemas do not drop tables.

To load the data type:

    zcat psc_*   |   psql -c "copy twomass_psc from stdin with delimiter '|' " wsdb

When this command has been executed mount DVD 1 side B in place of DVD 1 SIDE A and repeat the above copy command. Do not load the schema again. Repeat the above process with each side of each DVD in order of DVD, side. The user may find it convenient to begin by loading all of the data files into a single disk directory. In this circumstance the copy command need be given only once. Data from only one hemisphere can be loaded if desired. To load the southern hemisphere (declination < 0.0) type zcat psc_a* in place of the above command. To load the northern hemisphere type zcat psc_b*.

The Scan Information Table can be loaded by replacing psc with scn in the above commands. Since the data are entirely on DVD 1 SIDE A it can be loaded without changing DVDs.

The Extended Source Catalog can be loaded by replacing psc with xsc in the above commands. Since the data are entirely on DVD 1 SIDE A it can be loaded without changing DVDs.


9. VERIFYING THAT THE DATA HAVE BEEN LOADED CORRECTLY

Three queries that sum integer columns along with the query output are given in the files verification_query_psc.txt or verification_query_psc.html, verification_query_scn.txt or verification_query_scn.html, and verification_query_xsc.txt or verification_query_xsc.html.


10. OBTAINING UPDATES TO THE 2MASS ALL-SKY DVD DATA RELEASE

Table, catalog, and documentation DVD release updates are available at dvd_updates.


Revision 6: April 19, 2003, R. Stiening