CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Two-Sided Bandit Problems and Matching Markets

Sanmay Das

The Problem

We are studying the learning and decision problems of agents in matching models. Das and Kamenica have defined a class of problems in multi-agent learning and decision-making called two-sided bandit problems[1]. This class of problems is intended to capture the essence of two-sided matching scenarios in which agents must learn their preferences through experience, rather than knowing them a priori. Two-sided bandit models can be applied to a wide range of markets in which two different types of agents must match with each other. Consider a repeated game in which agents gain an uncertain payoff from being matched with a particular agent on the other side of the market in each time period. A natural example of such a situation is the dating market, in which men and women repeatedly go out on dates and learn about each other. Another example is a spot labor market, in which employers and employees are matched for particular job contracts. A matching mechanism is used to pair the agents. For example, we can consider a mechanism in which all the women decide which man to "ask out," and then each man selects a woman from his set of offers, with the rejected women left unmatched for that period. We are studying the properties of learning and decision-making algorithms in such settings.


Standard models of matching in economics[2] almost always assume that each agent knows his or her preferences over the individuals on the other side of the market.  This assumption is too restrictive for many markets and the model introduced here is driven by the need to relax this assumption.  The existing literature on two-sided search with nontransferable utility (for example[3]) assumes matching is exogenous and random.  The problem we are studying is more deeply related to bandit problems[4], in which an agent must choose which arm of an n-armed bandit to pull in order to maximize long-term expected reward, taking into account the tradeoff between exploring, that is learning more about the reward distribution for each arm, and exploiting, pulling the arm with the maximum expected reward.  Two-sided bandit problems extend the standard multi-armed bandit model by giving the arms of the bandit agency --- they can decide whether to be pulled or not, or whom to be pulled by, and they themselves receive rewards based on who the puller is. This adds an enormous degree of complexity to the problem.

It is prohibitively difficult to solve for game-theoretic equilibria in all, but the simplest two-sided bandit problems. In fact, even defining what would constitute optimal behavior with unbounded computational resources and a perfect model of the world is hard to do. The focus of research should be on good algorithms and what we can show about these algorithms in general settings.  This approach is also related to the theory of learning in games[5], which considers more generally how individual learning rules affect outcomes in games and whether agents reach static equilibria.

Various choices have to be made in modeling a two-sided matching game with learning. These choices relate to the structure of repetition in the game, the payoffs agents receive and when they receive them, how agent preferences over the other side of the market are defined, how much information agents have about themselves and others, what the space of actions available to agents is, and how agents are matched based on the actions they take.


In previous work[1], we report empirical results that focus on the stability of outcomes in dating markets under different matching mechanisms when agents use a simple epsilon-greedy exploration policy and either sample means or a leaky integrator for estimating values of agents on the other side of the market and the probabilities of offers being made or accepted.

We consider three matching mechanisms in the above paper. In Gale-Shapley matching each agent submits a list of preferences and a centralized matching procedure produces a matching based on these lists. The Gale-Shapley algorithm[6] guarantees a matching that is stable under the submitted prefer-ences. With simultaneous offers, each woman independently chooses one man to make an offer to. Each man selects one of the offers he receives. Women who are rejected are unmatched for the period, as are men who receive no offers. Finally, under the sequential offer mechanism, each woman independently chooses one man to make an offer to. The offers, however, are randomly ordered and men must decide on these "exploding" offers without knowing what other offers are coming. If an offer is rejected the woman making the offer is unmatched in that period. A man is unmatched if he rejects all offers he receives. Our results with 5 men and 5 women (see Figure 1) confirm our intuitions that under Gale-Shapley matching agents always converge to a stable matching (even for small epsilon) and that the sequential problem is "harder" than the simultaneous problem.

Fig. 1:   Probability of a Stable (asymptotic) Matching as a Function of Initial Value of ?

Future Work

We are interested in modeling other markets in addition to the dating market using the two-sided bandit framework. Ideally, we would like to present a model that is analytically tractable under certain learning and decision-making algorithms so that we can prove statements about the convergence and stability properties of algorithms.

Other possible extensions involve exploring learning algorithms that allow agents to perform well across a broad range of environments without having to make assumptions about the decision-making algorithms or learning processes of other agents.  In addition to simpler models, we are also interested in more complex versions of the problem, for example, describing the behavior of simple learning rules with a greater diversity of preferences and a larger number of agents.


This report describes research done at the Center for Biological & Computational Learning, which is in the McGovern Institute for Brain Research at MIT, as well as in the Dept. of Brain & Cognitive Sciences, and which is affiliated with the Computer Sciences & Artificial Intelligence Laboratory (CSAIL).

This research was sponsored by grants from: Office of Naval Research (DARPA) Contract No. MDA972-04-1-0037, Office of Naval Research (DARPA) Contract No. N00014-02-1-0915, National Science Foundation (ITR/SYS) Contract No. IIS-0112991, National Science Foundation (ITR) Contract No. IIS-0209289, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218693, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218506, and National Institutes of Health (Conte) Contract No. 1 P20 MH66239-01A1.

Additional support was provided by: Central Research Institute of Electric Power Industry (CRIEPI), Daimler-Chrysler AG, Compaq/Digital Equipment Corporation, Eastman Kodak Company, Honda R&D Co., Ltd., Industrial Technology Research Institute (ITRI), Komatsu Ltd., Eugene McDermott Foundation, Merrill-Lynch, NEC Fund, Oxygen, Siemens Corporate Research, Inc., Sony, Sumitomo Metal Industries, and Toyota Motor Corporation.


[1] Sanmay Das and Emir Kamenica. Two-sided bandits and the dating market. . To appear in Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, August 2005.

[2] Alvin E. Roth and Marilda Sotomayor. Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Econometric Society Monograph Series. Cambridge University Press, Cambridge, UK, 1990.

[3] K. Burdett and R. Wright. Two-sided search with nontransferable utility. Review of Economic Dynamics, 1:220-245, 1998.

[4] D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, London, UK, 1985.

[5] D. Fudenberg and D. K. Levine. The Theory of Learning in Games. MIT Press, Cambridge, MA, 1998.

[6] D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):9-15, 1962.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)