CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Density-Equalizing Euclidean Minimum Spanning trees for the Detection of All Disease Cluster Shapes

Shannon C. Wieland, John S. Brownstein (Harvard Medical School), Kenneth D. Mandl (Harvard Medical School) & Bonnie Berger


Existing disease cluster detection methods cannot detect clusters of all shapes and sizes, or identify highly irregular sets that overestimate the true extent of the cluster. We introduce a graph-theoretical method for detecting arbitrarily-shaped clusters based on the Euclidean minimum spanning tree of cartogram-transformed case locations, which overcomes these shortcomings. The method is illustrated using several clusters, including historical data sets from West Nile virus and inhalational anthrax outbreaks. Sensitivity and accuracy comparisons with the prevailing cluster detection method show that the method performs similarly on approximately circular historical clusters, and it greatly improves detection for non-circular clusters. This work has been submitted to the Proceedings of the National Ac! ademy of Sciences.


Methods! for det ecting localized spatial clusters of diseases are typically variations of the circular scan statistic method [1]. They restrict the number of potential clusters by considering all circular [1], rectangular [2], or elliptical [3] regions, and then apply a likelihood ratio test to evaluate the statistical significance of each potential cluster. Because disease outbreaks may have highly variable shapes, there has been recent interest in developing methods to evaluate irregularly-shaped patient subsets.

Few methods aim to detect clusters of arbitrary shape. One class of methods based on graph theory has recently emerged to address this problem. However, these have several limitations: they are restricted to clusters that fit inside a circular region of fixed size [4], they attempt to examine a set of potential clusters too large to exhaustively search [5], they have poor specificity [6], or have yet to be implemented or evaluated [7].

EMST Cluste! r Detection

We consider data sets consisting of the precise spatial coordinates of disease cases and controls. We first transform the study region into a density-equalizing cartogram [8]. Under the null hypothesis of constant relative risk, the new locations of the cases on the cartogram are uniformly and independently distributed. We define a mathematical definition of a potential cluster based only on intercase distances [9], and prove that there is a one-to-one correspondence between these potential clusters and a special class of subsets of the EMST of the case locations. We develop a test statistic based on the weight of the potential cluster subgraph.

Comparative Performance

We found that the EMST method is a powerful and accurate alternative to the SaTScan circular scan statistic for non-circular clusters. At a specificity of 95%, the method had comparable sensitivity to SaTScan applied to large synthetic circular clusters and to a! n approximately circular 1999 outbreak of West Nile virus in N! ew York C ity. When applied to small circular clusters, synthetic rectangular clusters, and a highly non-circular 1979 outbreak of inhalational anthrax in Sverdlovsk, Russia, the EMST method had greater sensitivity. Although SaTScan had better accuracy detecting large circular clusters, the EMST method had comparable or superior accuracy for all other cluster types.

Research Support

This project was supported by grant LM007677-03S1 from the National Library of Medicine.


[1] Martin Kulldorff and N. Nagarwalla. Spatial Disease Clusters: Detection and Inference. In Statistics in Medicine, vol. 14, pp. 799--810, 1995.

[2] Daniel B. Neill, Andrew W. Moore and Maheshkumar Sabhnani. Detecting Elongated Disease Clusters. In Morbidity and Mortality Weekly Report, vol. 54S, pp. 197, 2005.

[3] Martin Kulldorff, Lan Huang, Linda Pickle and Luiz Duczmal. An elliptical spatial scan statistic. In Statistics in Medicine, in press.

[4] T. Tango and K Takahashi. A flexibly shaped spatial scan statistic for detecting clusters. In International Journal of Health Geographics, vol. 4, 2005.

[5] Luiz Duczmal and Renato Assuncao. A Simulated Annealing Strategy for th! e Detection of Spatial Clusters of Irregular Shape. in Co! mputatio nal Statistics and Data Analysis, vol. 45, pp 269--286, 2004.

[6] Renato Assuncao, M. Costa, A. Tavares and S. Ferreira. Fast Detection of Arbitrarily Shaped Disease Clusters. In Statistics in Medicine, vol. 25, pp. 723--742, 2006.

[7] G.P. Patil and C. Taillie, Upper Level Set Scan Statistic for Detecting Arbitrarily Shaped Hotspots. In Environmental and Ecological Statistics, vol. 11, pp. 183--197, 2004.

[8] M. Gastner and M. Newman. Diffusion-Based Method for Producing Density-Equalizing Maps. In Proceedings of the National Academy of Sciences, vol. 101, pp. 7499--7504, 2004.

[9] Y. Xu, V. Olman and D. Xu. Clustering Gene Expression Data Using a Graph-Theoretic Approach: an Application of Minimum Spanning Trees. In Bioinformatics, vo. 18, pp. 536--545, 2002.


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu