Contents | |
Let be the complete set of observed apparitions . The total number of apparitions is equal to .
Each apparition has a position on the sky , specified as a unit vector in (that is, ).
Define the distance between two positions to be
Given , define the radius-R match set for as follows :
Each apparition has a centroid which is defined as follows :
where will be referred to as the density radius.
Each apparition has an integer density :
A group is a set of nearby apparitions likely to be obvservations of the same astronomical source. Define the group to be the set of apparitions within the group radius of the apparition centroid :
Note that the group radius should be set so that if for two apparitions and , then they are very unlikely to be observations of the same astronomical source.
A confused apparition is an apparition that belongs to more than one group.
A confused group is a group that contains at least one confused apparition.
Each apparition has a group count which is equal to the number of groups has been assigned to. Note that:
Each group has a group type which takes the following values :
The seed set is the set of apparitions which are to be considered as the starting points for generating groups.
Each apparition has a seed flag which takes the following values :
Each generated group has an integer group counter that uniquely identifies it within the set of all generated groups.
Each apparition has an integer apparition counter which uniquely identifies in .
The operators , , for apparitions are defined as follows :
Each apparition optionally has a scan key which identifies the scan it was extracted from.
Each group has a scan count, equal to the number of scans containing at least one apparition belonging to the group.
Each group has an apparition count, equal to the number of apparitions belonging to the group.
cntr
, ra
, dec
, and scan_key
. Additional retrieval columns can be specified at compile time - for more information concerning this feature, please see the Writing WAX Plug-Ins documentation page. gcntr
: the group counter, a unique identifier for the group napp
: the group apparition count gtype
: the group type sdet
(optional) : the group scan count gcntr
: the group counter, a unique identifier for the group cntr
: the apparition counter, a unique identifier for the apparition ngrp
: the group count, the number of groups the apparition belongs to
This output table will normally contain a single column named gcntr
, a unique identifier for the single apparition group (see D14). This is the only column necessary since all singletons have an apparition count and scan count equal to 1, and a group type of 0. Note that when a singleton table is in use, singletons are not recorded in the link table since singletons always have group counters equal to their apparition counters.
Finally, there is a compile time option to include all columns retrieved from the input table in the singleton table (just as for the group-apparition link table). For more details, see the Usage Examples documentation section of the srcgen (Source Code Generator) helper application.
This output table will normally contain a single column named cntr
, a unique identifier for the grouped apparition (see D15).
In practice, the algorithm as described would be very hard to implement since the size of can be much greater than the physical memory available to a computer. It is therefore necessary to limit processing to small subsets of . Because the computation of groups and apparition centroids involves finding apparitions in the immediate vicinity of positions on the sky, it is natural to split into subsets corresponding to apparitions from distinct regions of the sky. A simple way of doing this is to partition the sky into a sequence of annuli (or bands) which each cover a range of declinations.
A particular band is fully processed when every group stemming from a centroid with declination falling into the range has been generated (and those with centroids in have been stored). For each apparition belonging to such a group, all groups containing must have been generated so that the group count for is valid prior to storage.
Consider the group generated around the centroid . The seed apparition for the group, , is within distance of . Therefore, if every apparition in the declination range has been removed from the set of seeds, then all groups with centroids inside will have been generated.
Now consider an apparition . All groups containing will have been generated when and there exists no apparition with and . Note that for such an apparition to exist, .
Therefore, if with in the declination range , then has been fully processed. Groups generated from apparitions in that declination range will include apparitions in the range , and, in order to compute densities and centroids for them, the declination range must be considered. For simplicity we consider only the maximum density radius, , arriving at the conclusion that in order to fully process the band , it is necessary to consider apparitions inside the declination range . This condition, looser than strictly necessary, is not sufficient. To see why, reconsider the Basic Algorithm. It is clearly possible for an apparition not in the declination range to be denser than all apparitions in . Furthermore, generating could potentially remove apparitions inside the dec band from the set of seeds. Without considering , it would therefore be impossible to finish processing . In the general case, must be considered in its entirety to fully process .
If this situation is encountered, the parallel swiss cheese algorithm changes the optimal order (as defined by the Basic Algorithm) in which apparitions are considered for grouping.
A primary band is a band where is even.
A secondary band is a band where is odd.
For band , the apparitions in the declination range where are considered (currently, is set to by default). If the condition is enforced, then the regions on the sky being considered by any two non-adjacent bands are disjoint. In particular, no two primary or secondary bands will process regions that overlap. Therefore all primary bands can be processed independently and in parallel. Secondary bands can also be processed independently of eachother and in parallel, but do depend on the results of adjacent primary bands.
By forcing primary bands to save any information which effects adjacent secondary bands, it is possible to guarantee full processing of secondary bands with a significantly tighter declination range.
A primary band will store a list of groups generated around centroids with declination greater than or equal to (the top group list for ), and a list of groups generated around centroids with declination less than (the bottom group list for ). The groups in these lists will be stored according to the order in which they were generated.
Furthermore, group counts for apparitions in the declination range (the top apparition set for ) as well as group counts for apparitions in (the bottom apparition set for ) will be stored.
Before creating any new groups, a secondary band will read the top group list and apparition set generated by as well as the bottom group list and apparition set generated by and generate the indicated groups in the given order. Essentially, the secondary band will blindly copy the work of its neighbours. This behaviour gives each primary band the latitude to deviate from the group generation order imposed by the Basic Algorithm.
What follows is a detailed description of the various steps of the swiss cheese algorithm.
Define the removal flag for apparition as follows :
Finally, define the total seed count to equal the sum of the removal flags of all retrieved apparitions.
Now for each retrieved apparition, set . Also set .
Otherwise, for each apparition in :
If there is no such , then the pass has completed. At this point, if the pass did not generate any new groups and , the algorithm has reached an impasse and continues with a Cheese Crumbling Pass.
If , then has been completely processed and the algorithm continues with the Output Pass.
Otherwise, another swiss cheese pass is performed.
Now, a group must be generated from if there is no other apparition with (a seed) that is "denser" than () which might generate a group containing .
Any group containing must have a centroid within distance of , that is to say, must be generated from an apparition within distance of .
So, if with and , then continue at step 4.3.
Otherwise continue at step 4.1.
Otherwise :
Finally, if is inside :
For each group stored by step 4.4 :
For each apparition in , pass the list of groups containing to a pluggable apparition parameter computation stage which computes attributes for . Finally, insert and its associated attributes into the grouped apparition table.