MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

A Computational Model of Perspective in Narrative Generation

Alice Oh

Problem Statement

One can talk about a baseball game as a come-from-behind win or a blown-save loss. One can describe the winning pitcher's brilliant performance, or one can try to convince the fans of the losing team that they still have a great team capable of winning. Why and how do people talk about a single event in several different ways? In this thesis research, I will build a computational model of perspective in narrative generation and build a system capable of generating narratives from multiple perspectives.

Analysis of Perspective

The first step in building a computational model of perspective is to analyze data to find out just what constitutes perspective on the surface. That is, before trying to delve into the depth of understanding the underlying processes involved in creating a certain perspective, it is important to look at the surface form of perspective in already written text. In the baseball domain, it is readily apparent that a Boston Globe article and a New York Times article would be written with vastly different perspectives on a Red Sox vs. Yankees game. However, to get a more precise idea of perspective, I will need to quantify the difference by carefully analyzing sufficient data.

I will use two types of data for automatically modeling the domain, analyzing perspective, and constructing a computational model of perspective. The first are the descriptive logs and quantitative data, and the second are the newspaper articles. For every game in the Major League Baseball (MLB), the pitch-by-pitch logs contain detailed accounts of the game. The boxscores contain statistics of each player's performance. The second type of data are online newspaper articles. I have gathered articles from the Boston Globe, New York Times, Baltimore Sun, Tampa Tribune, and Toronto Star. I have also collected AP articles from ESPN and MLB online articles. I collected five articles each for 74 Red Sox games during the 2005 season. The five articles are AP (neutral), MLB-Red Sox, MLB-opponent, Boston Globe, and the opponent team's local major newspaper.

A Computational Model of Perspective

The bulk of this research will be in the building of a computational model of perspective. The final product of this research is a system that takes a baseball game and generates stories from multiple perspectives using the computational model. The different perspectives are apparent in the three levels of the story: headlines, subtopics, and individual events. However, the three levels are not independent from each other. There is much interaction among the three levels, and I will attempt to model the interactions between the levels as shown in figure below.

I hypothesize that there are two main factors that contribute to perspective in narrative generation. The first is the overall goal of the writer. The second is the prior knowledge of the writer or the target audience.

Goals

There have been studies in cognitive science that look at how a person's goal-directed learning affects the spatial perspective when recalling a spatial layout [1]. Although it is not clear whether the same effect exists for non-spatial information, I hypothesize that having different goals leads to difference stories. For example, after the Boston Red Sox lost a game to the New York Yankees, the writer may create a story with the goal of convincing the Red Sox fans that the Red Sox are still a great team capable of advancing to the playoffs. On the other hand, the New York Times writer may have the goal of delivering the excitement of the victory and describing the great New York performances.

In making the connection between goals and perspective, I intend to study two parts independently. First, I will study how the different perspectives lead to different goals. As the end result, I would like to have a mapping of goals from the list of major events of the game such as win/loss, good/bad player performance, changes in the important trends. An independent variable in that mapping would be the perspective. Figure below illustrates how this might look. Second, I will study how the goals influence the content at the three levels: headlines, subtopics, and individual events. This seems relatively straightforward, as it will be an analysis of the distribution of headlines, subtopics, and individual events, given the goals.

Prior Knowledge

I will assume that goals directly affect the top two levels (headlines and subtopics), and prior knowledge affects the bottom two levels (subtopics and individual events). The hypothesis that prior knowledge is related to perspective comes from the generalization that the fans know much more about the home team players than the visiting team players. For the home team, they know that a certain player is the star of the team. They know the history of the team, the history of the individual players, and the history of the rivalry between their team and some other team. For the visiting team, they may not have heard of many of the players and probably know very little about the players' statistics. Related to knowledge is interest. Not only do they know very little about the visiting team, they may have little interest in the players' individual performances. Therefore, it is only natural that journalists leave out detailed information about the other team's performance.

I will model prior knowledge by varying the amount of information for each player. For example, I may model each home player with all the statistics (batting average, number of home runs, number of walks, etc.) for the entire season, and I may model each visiting player with only select statistics (e.g., only batting average for each offensive player, and only ERA for the pitchers). I will experiment with varying degrees of details for home and visiting teams and use them to generate the subtopics and the individual events.

Story Generation and Evaluations

Once I have a deeper understanding of perspective, I will use the computational model built in Section \ref{s:model} to generate stories from multiple perspectives. In doing so, I will not try to generate the entire document as it is conventionally done in NLG. Instead, I will generate the headline, the subtopics, and the individual events, and I may use simple template-based NLG to generate the individual sentences. Therefore, the end product will not be a complete article as you may see in the New York Times, but more like an outline of the article.

The resulting outlines of the stories will be evaluated to show that they do indeed represent multiple perspectives. Since the main thesis of this research is that different perspectives lead to different stories based on the same event, I propose that the most useful evaluation metric would measure how well the articles reflect the main event, and more importantly, how well the articles present the different perspectives.

References:

[1] HA Taylor, SJ Naylor, and NA Chechile. Goal-specific influences on the representation of spatial perspective. In Memory and Cognition, 27(2), pp. 309--319, March 1999.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu