CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Embeddings into Low-Dimensional Spaces

Mihai Badoi, Julia Chuzhoy, Erik Demaine, MohammadTaghi Hajiaghayi, Piotr Indyk, Javed Samuel & Anastasios Sidiropoulos

The goal of this project is to design efficient methods for embedding metrics into low-dimensional spaces. The input to the problem is an nxn matrix containing pairwise distances between n points; the distance function should be symmetric and satisfy triangle inequality. The output is a mapping of the points into a low-dimensional geometric space, such as the real line or the plane. The mapping should incur small distortion: the distances between the low-dimensional points should approximate the distances in the input matrix.

Why

This problem has many natural applications, notably visualization. There are several popular algorithms designed for this task: Multi-Dimensional Scaling (MDS), Locally-Linear Embedding (LLE) and Isomap. Unfortunately, most of those algorithms have only limited guarantees, since they can "get stuck" in a local minimum.

How

Low-distortion embeddings into low-dimensional spaces are provably impossible to achieve for all input metrics. Therefore, we design algorithms that construct a good embedding of a given input metric, if such an embedding exists for that particular metric. We call this approach the relative embedding problem.

Most of our algorithms aim to minimize the multiplicative distortion. That is, we want to minimize c subject to the constraint that the distance between each pair of points: (a) does not contract and (b) stretches by at most a multiplicative factor of c. This notion of distortion has a few nice properties. In particular, it is "democratic": small distances are treated in the same way as large distances, which means that the embedding preserves both local and global structures. Some of our results hold for the additive distortion as well. In that case the distortion is defined as the maximum, over all pairs of points, of the absolute value of the difference between the original and embedded distances between the points.

Progress

We designed several algorithms for (approximately) minimizing the distortion of embedding the input metric into specific host metrics. Their functionality and guarantees are given in the following table. We use c to denote the minimum distortion of an embedding of the input metric into the host space.

FROM	INTO	DISTORTION TYPE	DISTORTION BOUND	TIME	REF
general metrics	plane	additive	O(c)	quasi-polynomial	[1]
shortest path metrics over unweighted graphs	line	multiplicative	exactly c	exponential in c	[2]
shortest path metrics over unweighted graphs	line	multiplicative	O(c^2)	polynomial	[2]
shortest path metrics over unweighted trees	line	multiplicative	O(c^1.5)	polynomial	[2]
general metrics	ultrametrics	multiplicative	exactly c	quadratic	[3]
general metrics	line	multiplicative	"non-trivial"	polynomial	[4]
shortest path metrics over weighted trees	line	multiplicative	polynomial in c	polynomial	[4]
metrics over sets of points on a sphere	plane	multiplicative	3c	near-linear	[2]
ultrametrics	plane	multiplicative	O(c^3)	linear	[5]
ultrametrics	d-dimensional space	multiplicative	O(c^6)	linear	[5]

We also showed hardness of approximation of some of the problems. That is, we show that approximating the distortion up to a factor equal to or better than the one specified below is "hard" (e.g., NP-hard).

FROM	INTO	DISTORTION TYPE	HARDNESS FACTOR	REF
shortest path metrics over unweighted graphs	line	multiplicative	some a>1	[2]
shortest path metrics over weighted trees	line	multiplicative	n^a, for some a>0	[4]
ultrametrics	plane	multiplicative	1	[5]

In addition, we showed other results for different notions of distortion, where it is only required that the order between the distances is approximately preserved [3]. We also showed improved algorithmic results in situations where the algorithms are provided with some "extra information", e.g., are given approximate values of the angles between triples of points [1].

We implemented the sphere-to-plane algorithm of [2]. See screenshots of the results (170 Northern World is particularly interesting). We are currently implementing the algorithm of [5], with the goal of building a tool for visualising ultrametrics, with applications to biological data.

Research Support

This research is supported by NSF Career Award, Alfred P. Sloan Foundation, Packard Foundation and Alexandros S. Onassis Public Benefit Foundation.

References

[1] Badoiu, M., and E. Demaine and M.T. Hajiaghayi and P. Indyk, "Embedding with Extra Information", Proceedings of the 20th Symposium on Computational Geometry, New York, NY, June 2004, pp. 320-329.

[2] Badoiu, M., and K. Dhamdhere and A. Gupta and Y. Rabinovich and H. Raecke and R. Ravi and A. Sidiropoulos, "Approximation Algorithms for Low-Distortion Embeddings into Low-Dimensional Spaces", Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms, Vancouver, Canada, January 2005.

[3] Alon, N., and M. Badoiu and E. Demaine and M. Farach-Colton and M.T. Hajiaghayi, "Ordinal Embeddings of Minimum Relaxation: General Properties, Trees, and Ultrametrics", Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms, Vancouver, Canada, January 2005.

[4] Badoiu, M., and J. Chuzhoy and P. Indyk and A. Sidiropoulos, "Low-Distortion Embeddings of General Metrics Into the Line, Proceedings of the 37th ACM Symposium on Theory of Computing, Baltimore, MD, May 2005, to appear.

[5] Badoiu, M., and J. Chuzhoy and P. Indyk and A. Sidiropoulos, "Embedding Ultrametrics into Low-Dimensional Spaces", manuscript.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)