THE EFFECT OF ACTIVE DEBLENDING ON PROPHOT RUN TIME

K. A. Marsh, IPAC
kam@ipac.caltech.edu

May 5, 2000




SUMMARY

Within the context of the maximum-likelihood estimation procedure used in PROPHOT, the computation time involved in active deblending depends critically on the search techique used to arrive at the initial guesses of the source component positions. As an example, the processing time for an N=2 blend, expressed as a fraction of the single-source processing time, would be approximately 40 in the case of brute-force search, or as little as 8 if a guided search is used. For a 2MASS scan of high source density, the deblending operation could easily increase the total processing time for the scan by factor of 4 (brute-force search) or 1.5 (guided search) assuming that the criterion for an attempted deblend is rchisq > 3.


INTRODUCTION

PROPHOT has already been designed with the inherent capability of simultaneous fits for an arbitrarily large number, N, of point sources. All that is required is to provide the algorithm with initial-guess positions of the N sources, and it will then perform a maximum likelihood estimation of the source parameters and their associated uncertainties. If the sources are sufficiently far apart that they produce separate peaks on the coadded image, the initial positions can be obtained from the locations of those peaks, and this case (referred to as "passive deblending") is dealt with satisfactorily by the current version of PROPHOT. If, however, the sources are blurred into a single peak (the "active deblending" case), the determination of the initial positions is nontrivial. It is, however, a necessary step, since nonlinearities in the measurement model lead to a multi-valley structure for the chi-squared function being minimized, and poor initial guesses for the source locations could prevent the attainment of the global minimum. This will require some further development in PROPHOT, and the purpose of this memo is to outline some of the issues involved, and to provide some estimates of the impact on run time.


OBTAINING THE INITIAL POSITIONS

There are many different possible approaches to this problem, some of which involve steep increases of computation time with increasing N, and some for which the computation time is independent of N, but for which there is a substantial computational overhead (e.g. Richardson and Marsh 1991). In the case of 2MASS, for which PSF errors limit the feasible range of N to values of at most 2 or 3, the most efficient scheme would probably be one which involves an initial search on a coarse grid of source locations, calculating the value of the chi squared function at each location. The grid would actually be a phase space of dimensionality 2N, corresponding to a pair of coordinates per source. At each grid point, a limited (flux only) maximum likelihood estimation would be performed, and the phase-space point at which the chi squared function is minimized would then correspond to the set of initial positions for the subsequent full maximum likelihood fit (positions and fluxes). Such a search could be either brute-force (a complete search of a restricted volume of phase space) or a guided search of some kind.

In the brute-force case, the search grid should cover a range of positions somewhat in excess of the critical source separation at which the two respective response functions combine to produce a single peak. In practical terms this corresponds to the FWHM of the PSF. The sampling interval should be such that the PSF behaves roughly quadratically from one sample to the next. For 2MASS images, these requirements would be satisfied by a 5 x 5 grid with a half-pixel sampling interval. For N sources, the number of phase-space search points is then ^{25}C_N, which can obviously get quite large.

A guided search could dramatically reduce the number of required search points. For example, in the N=2 case, one only need search along the axis of elongation in the coadded image. Asymmetries in the PSF could, however, compromise this procedure. In addition, it would still leave us unprepared for the N>2 cases. Another possibility is to use an iterative procedure such as the one proposed by Elliott (1998), whereby we start with the N=1 case, and then successively add extra sources, with the search being restricted to the most recently added component at each iteration. After each single-component search, a full maximum likelihood fit is performed in order to adjust the positions of previously-found compoents. This scheme would have the advantage that the computation time increases roughly quadratically with N, thus avoiding the combinatorial explosion associated with the brute-force approach.


ESTIMATES OF COMPUTATION TIME PER BLEND

Factors involved in assessing the computation time involved in active deblending are:

(1) The number of initial search points. For a 5 x 5 grid, this would involve a total of ^{25}C_N evaluations of the chi squared function in the case of a brute-force search, and 25N in the case of the guided search procedure.

(2) The number of arithmetic operations involved in an N-source maximum likelihood solution. This is roughly proportional to N^2.

Taking these factors into consideration, the estimated factor by which the computation time is increased for an N-blend over a single source is:


      N          Brute-force         Guided search 
                 search             (Elliott 1998)
--------------------------------------------------------
      2              40                   8
      3             500                  20



EFFECT ON COMPUTATION TIME FOR 2MASS SCANS

To assess the impact on the computation time for a complete scan, it is necessary to determine the fraction of the sources to be considered as unresolved blends; it would seem reasonable to base this on the value of reduced chi squared (rchisq).

Based on an analysis of 6 science scans from the night of 990928n, covering a wide range of source densities (specifically, from about 7000 to 37000 sources per scan), the fraction of extracted sources whose rchisq exceeded 3.0 in scans of low source density was of the order of 1%. For the scans of highest source density, it reached 7%. Assuming these to be representative, the condition rchisq > 3 would seem a reasonable criterion for attempting active deblending. This being the case, we can use the values in the above table to estimate the increase in computation time for processing a scan. The following predictions are based on N=2:

For low density scan (in which the majority of the 1% of blend candidates would probably turn out to be single sources), the scan processing time would increase by a factor of 1.4 for brute-force search, and 1.07 for the guided search procedure.

For a high density scan (in which the majority of the 7% of blend candidates would probably turn out to be genuine), the scan processing time would increase by a factor of 3.7 for brute-force search, and 1.5 for the guided search procedure.


COMMENT

From considerations of computation time, a guided search is obviously desirable. It will be necessary to do some numerical simulations to make sure that it is robust for the parameters involved in 2MASS source extraction.


REFERENCES

Elliott, D. G. 1998, "Crowded-Field Photometry for WIRE," JPL Report No. D-15360.

Richardson, J. M. and Marsh, K. A. 1991, "Point-Process Theory and the Surveillance of Many Objects," in Maximum Entropy and Bayesian Methods, C. R. Smith et al (eds.), Kluwer Academic Publishers, pp 213-220.