CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Discovering Relational Systems

Charles Kemp, Thomas L. Griffiths & Joshua B. Tenenbaum

Introduction

Finding latent structure in data is a challenge faced by humans and machines. Understanding this process may help psychologists explain how people acquire new concepts, and should allow engineers to build low-complexity representations of large and complex datasets. Here we consider the problem of discovering the structure of a relational system. We assume that we are given information about several domains, and about relationships between objects in those domains. Our goal is to organize each domain into latent classes, and specify how those classes relate to each other.

For example, suppose that we have information about friendships between students at a high school. In this case there is a single domain (the domain of students) and a single binary relation (the friendship relation), and our goal is to identify social groups (eg jocks, math nerds, music nerds) and the relationships between them (math nerds are often friends with music nerds). Suppose now that we learn about the books in the school library and the students who borrow those books. Now we have three domains (students, books, and words) and two new binary relations: borrows(S, B) which is true if student S has ever borrowed book B, and appears_in(W, B) which is true if word W appears in book B. We wish to cluster the three domains simultaneously, and might find, for instance, that math nerds tend to borrow a set of books we can call `math books,' and that math books are associated with a cluster of words including `calculus,' `vector,' and `topology.'

Part of the problem is figuring out how many classes should be introduced for each domain. We deal with this problem by taking a non-parametric Bayesian approach. The same technique is used by infinite mixture models [1] (also known as Dirichlet Process mixture models [2]), and our approach is an extension of these models to relational data. Before describing our model, we review the infinite mixture model for binary data.

Generative models for features and relations

Suppose that we have a binary matrix D representing feature values for a set of objects. The infinite mixture model attempts to sort the objects into latent classes as represented in Figure 1(a). Let $f^m_{i}$ indicate the value for object i on feature m. We assume that dependencies between the features can be explained by a set of latent classes, and that z, a partition of the objects into classes is drawn from a Chinese Restaurant process (CRP) with hyperparameter $\alpha$ . The CRP favours partitions that use a small number of classes, but assigns some weight to all possible partitions. We introduce parameters

--> $\theta^m_{a}$ for each (feature, class) pair which specify the probability that an object in latent class has feature .

The generative model for the data is thus:

$\displaystyle z$	$\displaystyle \sim CRP(\alpha)$
$\displaystyle \theta^m_{a}$	$\displaystyle \sim$ Beta $\displaystyle (\beta, \beta)$
$\displaystyle f^m_{i}\vert\theta, z$	$\displaystyle \sim$ Bernoulli $\displaystyle (\theta^m_{z_i})$

where $\alpha$ and $\beta$ are hyperparameters. The posterior distribution over class assignments $p(z\vert D)$ can be computed using a MCMC sampler.

[three clustering problems]

Figure 1: Three clustering problems. All examples include a matrix of people and books they may have borrowed. 1(a) shows a solution found by the infinite mixture model if we treat the people as objects and the books as features. The model clusters the objects but not the features. 1(b) shows a solution found by our approach if we simultaneously cluster the people and the books. 1(c) shows how the approach generalizes to more complex relational systems. Here we have four domains (people, books, words, and demographic features) and we cluster all of them simultaneously.

Suppose we now think of the features as forming a domain in their own right. The matrix D represents a binary relation between objects and features, and we can partition both objects and features into latent classes (see Figure 1(b)). Let and indicate class assignments for the two domains. We assume that the existence of a link between an object and a feature depends only on the class assignments of the object and the feature, and introduce parameters $\eta_{ab}$ which specify the probability that an object in latent class has a feature in latent class :

$\displaystyle z, w$	$\displaystyle \sim CRP(\alpha)$
$\displaystyle \eta_{ab}$	$\displaystyle \sim$ Beta $\displaystyle (\beta, \beta)$
$\displaystyle f^m_{i}\vert\eta, w, z$	$\displaystyle \sim$ Bernoulli $\displaystyle (\eta_{z_i w_m})$

[clustering animals and
features]

Figure 2: Clustering animals and their features. The matrix on the left shows the raw data, and the matrix on the right is sorted according to the clusters found by our algorithm.

Figure 2 shows the outcome when this model is applied to a dataset of animals and their features collected by Osherson and colleagues [3]. Intuitively, the goal of our MCMC sampler is to shuffle the order of the rows and the columns so that the matrix takes on a clean block structure. Figure 2 shows that the algorithm discovers both class assignments and the relations between the classes --- we learn, for example, that the animal cluster including the marine mammals is strongly associated with a cluster of marine features.

In the case when we have two domains and a single binary relation, our approach reduces to an algorithm for biclustering or co-clustering [4]. Yet the approach deals naturally with datasets including many domains and with many kinds of relations (note that the relations can have different arities). The key idea is that there is a single partition for each domain, and that the partition is used to generate all of the relations that the domain participates in. As before, all of the elements in the dataset are assumed to be conditionally independent given class assignments for the domains. Figure 1(c) represents the kind of solution we might find for a dataset with four domains and three binary relations.

It is also possible to introduce relational links as first class objects. Suppose that we have a binary matrix specifying who talked to whom at a party. We can now add the domain of conversations to the domain of people, and associate each conversation with the words it included. The class assignments for the domain of conversations are not generated independently of the other domains, but instead are induced by the class assignments for the people. If there are two classes of people, say, there will be four classes for the conversations (conversations between people in classes 1 and 1, 1 and 2, 2 and 1, and 2 and 2). Apart from this single difference, domains for relational links are treated in exactly the same way as other kinds of domains.

We have assumed that the data provided are binary, and constructed a generative process around the Beta-Bernoulli model. We can deal with continuous data using a Gaussian model, and frequency data using the Gamma-Poisson model. It is straightforward to deal with a case, say, where some of the relations involve binary data and others involve frequency data.

Related Work

As mentioned earlier, our approach extends the infinite mixture model to relational data. It belongs to the same family as the blockmodels developed by sociologists [5][6], and can be viewed as a special case of a Probabilistic Relational Model (PRM) [7]. Compared to previous work on relational clustering using blockmodels and PRMS, the main contribution of our approach is that it automatically determines the number of latent classes in each domain.

Future Work

We are currently working on models where classes can overlap, and where objects are given a probability distribution over classes instead of belonging to just one. We are also working on models where the relationships between classes can be specified by logical rules. For example, a logical rule can specify that every object in class has at least one of the features in class . Our ultimate goal is to use these models to explore how people acquire semantic knowledge, and moving towards richer representations is an important step in this direction.

Research Support

Supported by NTT Communication Sciences Lab, the DARPA CALO program and the Newton Chair (JBT).

References

[1] C. E. Rasmussen. The infinite Gaussian mixture model. In NIPS, 2002.

[2] R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. In Journal of computational and graphical statistics, pp. 249-265, 2000.

[3] D. N. Osherson and J. Stern and O. Wilkie and M. Stob and E. E. Smith. Default probability. In Cognitive Science, pp. 251-269, 1991.

[4] S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A Survey. In IEEE Transactions on computational biology and bioinformatics, pp. 24--45, 2004.

[5] H. C. White and S. A. Boorman and R. L. Breiger. Social structure from multiple networks. In American Journal of Sociology, pp. 730--780, 1976.

[6] K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. In Journal of the American Statistical Association, pp. 1077-1087, 2001.

[7] B. Taskar and E. Segal and D. Koller. Probabilistic classification and clustering in relational data. In IJCAI, 2001.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)