2MASS Catalog Generation: Handling of Duplicate Sources in Scan Overlap Regions
I. Options
Select One Apparition
Adopted Option (12/29/98)
- Advantages
- Avoids imhomogeneity resulting from sqrt(2) SNR improvement
- Two-scan case relatively simple to implement (with caveats)
- Easier traceability between catalog source and DB entry (e.g. time of observation)
- Better choice if dominated by systematic photometric and/or astrometric errors
- Will allow favoring side of array that has flatter cross-scan photometric response
- Disadvantages
- Discontinuity at boundaries
- Moving source may be double-counted or missed
- Geometry in multi-scan overlap regions (e.g. corners or regions near poles) more complex to implement.
- Algorithm
- Safety border around scans: use only sources >10 arcsec from scan edges to avoid band coverage problems (Sampler used 5 arcsec border and still had a few missing bands)
- Match sources from adjoining scans in overlap region
- Consider only objects not identified with artifacts
- Match radius < 3.5 arcseconds (minimum resolution)
Final value will be set after analysis, but will likely be < 2 arcseconds - Require source brightness to agree to within TBD mags for valid
match
Point has been made that with purely geometric selection criteria, then a brightness match is not essential. Only requirement is that measurement of object from one scan be reported - For all sources detected in more than one scan:
- Select position and brightness from scan in which source lies closest to the scan centerline (U-scan coords) and closest to N/S declination midpoint for DEC overlaps redundancy
- Advantage #5 (above) cannot be realized since need full overlaps for extended sources
- For sources near corners of scans, examine the horizontal (EW) and vertical (NS) distances from two nearest edges, dh and dv. Select the minimum of dh or dv from each scan. Select the apparation from the scan that has the larger min(dh,dv). Note that this forms somewhat complex boundaries between scans in the very corners, but it is simple to implement and document.
- See Issues below
- For sources detected in only one scan:
- Define positional boundary between scans (e.g. U-scan or RA midpoint between adjacent scans; more complex at corners and near poles)
- Source makes catalog if it is detected in the scan on the appropriate side of boundary
- For "open" edges of scans (not bounded by another scan):
- Define a boundary that is set in from the edges the same amount that the opposite edge boundary is set in. That is, make the boundaries of the release scan symmetric for this scan. It is essential that this open-edge boundary be bookkept for subsequent releases to make certain that no area is inadvertanly skipped.
- Example:
- Algorithm points 1-4 were implemented and run on a set of scans from the Spring 1999 Working Release Point Source DB. This PostScript FIGURE illustrates the duplicate source rectification at the relatively complex boundary between several tiles. The boundaries of each tile are coded in a separate color - the "reference" tile is shown in black. The matched sources between the reference and surrounding tiles are color coded to show from which tile the apparation that will be selected for inclusion in the catalog. (the crosses on the figure can be ignored). Zoom in on the interesting corners to see detail of the selection.
- Issues:
- Should relative scan quality be allowed to override geometrical decision
from what scan to draw information? (e.g. one scan has
better sensitivity than the other)
Decision on 1/5/99 - No. Use only geometric decision - Should relative source measurement quality be allowed to override
geometrical decision from what scan to draw information?
(e.g. galaxy measurement in one scan has measurement
contaminated from bright star, but not in other scan)
Decision on 1/5/99 - No. Use only geometric decision - What to do with objects that match positionally, but not in brightness? (e.g. variable stars, marginally resolved multiple stars)
- What to do with relatively bright objects that do not match with object in adjacent scan(s)? (e.g.variable stars, asteroids)
General consensus on 1/5/99 was to start with a simple, position-only matching scheme and pure geometric decision for which scan to report, and keep track of the frequency of occurence of issues 3 and 4.
Merge apparitions
This option will not be adopted. Possibly reconsider
with full reprocessing at end of Survey
- Advantages
- Improved sensitivity in overlap regions if not dominated by systematic errors
- Can replace discontinuities with smooth transitions
- Easier to implement in multi-scan overlap region
- Disadvantages
- Should use weighted averaging algorithm to avoid discontinuities
- Inhomogeneity in sensitivity on scan spatial scales (may not be significant in light of seeing, transmission, background variations, etc)
- Not appropriate if dominated by systematic photometric and/or astrometric errors
- Potential problems matching variable, moving, or intermittently resolved sources
- Merge is especially difficult for extended sources because can have different sized "pieces" in adjacent scans
- Requires considerable testing of merge/average algorithms before release
- Sources at edges of blocks of released data may change name/position/brightness when adjacent block is released
- Algorithm
- Match sources from adjoining scans in overlap region (require positional and brightness match)
- Evaluate weighted average brightness and position values for matched groups
- Quote combined position and brightness in Catalogs
II. Comments
- Tom Chester (12/21/98):
although i strongly support merging point sources, tj, sschneider and i agree that we cannot merge galaxies. after all, the main reason for the scan overlap was to ensure that we got at least one good rendition of an extended source near a scan boundary. attempting to merge them could screw the good rendition up as often as it improves things. besides, unlike point sources, we don't know how to combine elliptical parameters, etc.! hence for gals we will always pick the one farthest from a boundary. i reluctantly conclude that because we cannot merge galaxies, and because of the complexity in the ties between the point and extended sources, for this first release we ought to do to point sources what we have done to galaxies. this means that we cannot "favor the side of the array with flatter response" - we have to use the same algorithm of picking the source farthest from an edge. btw, note that both advantages 4 and 5 go away from the "select one apparition" if you choose the source farthest from an edge, which you have to do for galaxies. i'm not at all convinced you can always obtain those advantages anyway.
- Dave Monet (12/22/98):
The real question is whether you think that the systematic errors are small or large compared to the random errors in these zones. If the random errors dominate, then it is reasonable to take the average. If the systematic errors dominate (or might dominate), then you should choose one instance and omit the rest. My suggestion for the culling algorithm is to take the one closest to the centerline of the scan. Presumably systematic errors grow rapidly near the edges of the scan, so one should not take an object very close to the edge in preference to one just inside of the overlap zone. Arguments about variability, motion, or some other physical effect will always arise, and the sophisticated user will want to examine all observations of the same object. Another aspect is the trade-off between producting the "best" catalog you can (i.e., random errors dominate so one takes the mean) and the difficulty of most users to correctly incorporate the factors of SQRT(2) or more difference between the uncertainty estimators of some catalog entries and the others. My gut feeling is that this first release should choose one observation and omit the rest. This avoids the proof that systematic errors are negligible (tough even under non-stressed timetables), and it makes the uncertainty estimators easier to describe (all entries have similar properties). Save the mean (or mean with sigma chopping with the associated flag for potential variability or motion) for a later release when there is time to use this algorithm, test it, and see how often weird things happen.
- John Carpenter (12/28/98):
(1) I strongly agree that duplicate sources must be removed in the overlap region. i.e. each source should reported only once in the catalog. (2) I just want to clarify one aspect about the notion of "going deeper" in the overlap region. This could mean one of two things. One, report more accurate photometry for sources in the overlap region that would normally appear in the database regardless of the fact that they are in the overlap region. Or two, report better photometry AND include additional sources that would meet some selection criteria for appearing in the catalog only after averaging the photometry. This distinction could become rather important once data near the 5 sigma detection threshold or lower start being released. Perhaps I have missed some discussion on this, but it is not clear to me what people are proposing regarding this. (3) In general, I do not favor averaging the photometry in the overlap region until I better understand the implications of the algorithm in terms of the (1) detection statistics in the overlap region, and (2) that the photometry is actually improved by approximately sqrt(2.0) in the overlap region. (And how do we add together a quality=10 scan with a quality=6 scan say?) Even then, I am concerned about deliberately adding additional inhomogeneity into the data on 8' scale. I cannot pinpoint a specific quantitative argument. But it just runs counter to the notion of releasing as homogeneous dataset as possible. (4) Finally, regardless if we average the photometry or not in the overlap region, will there be a flag in the database indicating whether the source is in the overlap region, and if so, whether it was detected by the other scan? This will be rather valuable for studies that want to empirically estimate the completeness limit in the overlap regions.
- Jay Elias (12/29/98):
A comment on the agenda that we won't re-issue sources from previous incremental releases made me realize that there is an issue concerning the borders of scans in this regard. That is, if we have scans in the current release which are overlapped by scans due for later release, what do we do? If we inclde sources all the way out to the edge of such scans, then we presumably don't use any of the data from the not-yet-released scans. This leades to far greater inhomogeneities than anything Martin was worrying about. The way to avoid this is presumably to not release any data that overlap with unreleased scans (with some margin for positional errors) if we merge sources, or else to include only out to the approximate splitting point (minus a margin) if we don't merge. In either case, though, this requires estimating what to leave out and also requires releasing what's in the safety margins in future releases. Thus, this solution, while ideal, involves extra work, and I don't know whether IPAC has taken this into account. ---- A second comment/question has to do with the statement that merging data will make the survey 40% deeper in the overlap zones. This is not quite true at the limit, since the sources have to be detected reliably in each scan; we are not co-adding the scans but rather merging the sources. Certainly signal to noise of sources will improve but there will not be added faint sources. I still favor merging sources but people need to understand that it is not the same as co-added data.
R. Cutri - IPAC
Last Update - 9 March 1999 - 17:00 PST