Duplicate Source Handling

2MASS Catalog Generation: Resolution of Sources in Scan Overlap Regions



I. Algorithm Adopted for Spring 1999 Incremental Data Release (IDR)


Duplicate/multiple sources: Select one apparition of the source using a purely geometric algorithm. Source that is closest to center of its respective scan is selected for inclusion in catalog.

Non-duplicate sources in overlap regions: Due to an oversight and the lack of a functioning algorithm, all non-duplicate sources were included in the Spring IDR Catalogs.

Unbounded tile edges: Sources all the way to the edge of the safety boundaries were included in the Spring Catalogs. No special filtering was performed for the unbounded edges.

This link contains a discussion of the sources in tile overlaps for the Spring 1999 IDR.


II. Algorithm for Fall 1999 IDR


Adopt duplicate source resolution algorithm used for Spring IDR. Implement a non-duplicate source resolver in scan overlap regions.

Multiple Source Algorithm

  1. Safety border around scans: consider only sources >10 arcsec (point sources) or >15 arcsec (extended sources) from scan edges to avoid band coverage problems


  2. Match sources from adjoining scans in overlap region using a pure positional match


    • Consider only objects not identified with artifacts
    • Match radius < 2.0 arcsec (point sources) and < 5.0 arcsec (extended sources)

  3. For all sources detected in more than one scan (duplicate sources):


    • Examine the horizontal (EW) and vertical (NS) distances from two nearest scan edges, dh and dv, for all apparitions of source.
    • If the source is not detected in all observed scans that cover its position, then create a "virtual" duplicate of the source in the scans in which it was not detected. Use only virtual duplicates that fall more than the safety border distance from the edges of their respective scans.
    • Determine the minimum of dh or dv for each apparition, real and virtual, in their respective scans.
    • Select the apparation that has the larger min(dh,dv). If a real apparition is selected, then it is included in the Catalog. If a virtual apparition is selected, then no source is included in the catalog. Note that this forms a somewhat complex boundaries between scans in the very corners, but it is simple to implement and document.

This link provides graphical examples of the multiple source resolution algorithm at work ( However, the virtual apparition modification is not implemented in these examples).


Non-multiple Source Algorithm

For sources that fall in the overlap region between scans, but are detected in only one scan.

Algorithm Proposed by Howard McCallon - Accepted in 6/15/99 Science Team Telecon

  1. Safety border around scans: consider only sources >10 arcsec (point sources) or >15 arcsec (extended sources) from scan edges to avoid band coverage problems


  2. For each non-duplicate in a region covered by more than one scan, create "virtual" duplicates in all other scans which cover the source position (use real reconstructed tile positions, and require that the virtual duplicates fall more than the safety border distance from the edges of their respective scans).


  3. Evaluate the edge distances, dh and dv, for the "virtual" duplicates in their respective scans.


  4. If the real source has a larger min(dh,dv) than any of its virtual duplicates, then the source is accepted for inclusion in the catalog. If not, then the source is not accepted.


Note that the non-multiple source algorithm is now identical to the multiple source algorithm for the case where an object is not detected in all possible scans covering its position. Thus, the algorithm will be less confusing to code and the spatial boundaries of the duplicates and non-duplicates from the same tile will be identical. Although this needs to be verified with simulation and empirical tests, this algorithm should be generalizable to complex overlap regions in tile corners and at the poles.

Unbounded Tile Edges TBD

There are several options for dealing with sources that fall close to "open" edges and corners of scans (not bounded by another observed tile in the release database.

Issues:

  1. 6/15/99 Telecon - Team decided to add virtual duplicates to multiple source groups not detected in all possible scans. This effectively makes every object in the overlaps regions, multiples and non-multiples, treated exactly the same way.

    If an object is detected multiple times, but not in all of the tiles in which it could have been (e.g.2 out of 3 times in tile corners), should the object be treated like both a multiple and non-multiple source? This would be done by creating virtual multiples for the object in any tile in which it could have been but was not detected. If one of the virtual duplicates has a more favorable position in its tile, then the real sources would be rejected from the release.

III. Implementation Notes - 10/13/99

Sherry Wheelock writes:

There are several steps to the second phase of processing the dup sources.

  1. Assemble ALL snd (secondary non-dups) into one file and put into database (COMPLETED)


  2. For each night, assemble all pnd (primary non-dups) and compare to the snd list in the database(1.above) and create two files for each night. pndfile containing all sources NOT found in the snd database and u0pndfile containing all sources found in the snd database. (u0pndfile will contain all non-dups that fall outside the primary tiles usable area based on surrounding tiles and geometric algorithm.) This step is in progress now and takes 5-15 minutes per night to generate these two output files depending on density of primary non-dups.Assume of 10-15 min/night = 4-6 days to complete.


  3. This step will merge the dups and non-dups for each night and pull out the sources that are not from given night. This step will take 3 iterations to get all sources into the appropriate files and resolve any with chnging flags. Time to complete = uncertain. However, no database i/o performed in this step.


IV. Comments


Tom Chester:

Subject: Re: June 15 telecon - Multiple Source Resolution

re issue 1:

yes, to be consistent with the other algorithms, we must create the virtual duplicates and accept one of the actual multiple observations only if one of the actual multiple observations is the farthest from an edge. this will happen all the time for the fainter sources simply due to thresholding. if we don't do it this way, we will in effect go deeper in the multiple coverage areas, which we are striving not to do.


R. Cutri - IPAC
Last Update - 13 October 1999 - 9:00 PDT