CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

How Accurate are Experts in Spotting the Right Words in a Dialogue?

Ronilda C. Lacson & William J. Long


Medical dialogue occurs in almost all patient-caregiver interaction. We create and implement a classification algorithm that relies on manually chosen words to classify segments of dialogue within the hemodialysis domain that belong to usable categories. Because identifying words manually is labor-intensive, we only analyze a subset of training data. We use the UMLS semantic network to augment the base model. The accuracy of the models were not statistically significant at 59% and 61% (p=0.89).


Spoken medical dialogue in a home care setting such as a home hemodialysis program is vital to quality patient care. Although substantial research has addressed analyzing written text in the medical domain,[1] this is not sufficient for spoken dialogues, largely due to the lack of structure and organization of topics. When spoken text is transcribed and manually segmented by topic, certain words appear predictive in classifying segments into categories -- clinical, technical, backchannel or miscellaneous. We develop and implement a classification algorithm that uses manually identified words and combine them to predict the category of a segment of dialogue.


Data Source: We used recorded and transcribed phone conversations between nurses and 25 adult patients treated at a nephrology home hemodialysis program from July to September 2002. All subjects signed consent forms, approved by the MIT Committee on the Use of Humans as Experimental Subjects. The transcribed dialogue was manually segmented by two domain experts (RL and E. Lacson MD) according to previously identified topics that were motivated by the application: clinical, technical, backchannel and miscellaneous. Clinical data relates to the patientís health. Technical data consists of discussions about machinery and supplies. Miscellaneous data consists of scheduling and social concerns. Backchannel refers to acknowledgement and greetings. A segment is defined as successive dialogue turns that all address one topic. The labeled data were then divided into training and testing sets consisting of 71 segments each. WL, who is familiar with the classification scheme but did not perform annotation, reviewed the training data. He manually chose words that appeared predictive for each of the four categories.

Algorithm Development: For each category, each word identified by the expert is assigned a score, corresponding to the weighted count of instances it occurred in the training data for that particular category. Each class then votes using the word scores derived from the training data. When a new data segment is presented, each category computes the votes from its list of pertinent words and the class with the highest vote wins. The algorithm is implemented in Common Lisp.

Incorporating Semantic Types: To study the contribution of semantic knowledge in increasing the accuracy of predicting relevant topics in our application, we used the UMLS Semantic Network.[2] For each pertinent word that was identified by the expert, and for each word in the test set, we used MetaMap to represent the word using its semantic type.[2] For our third model, we did a similar substitution using MetaMap for only the nouns in the same text because it achieved better predictive accuracy in our preliminary studies. In the latter case, we used Ratnaparkhiís tagger to identify the nouns in the data.[3] We compared the predictive accuracy of the three models.

Results and Discussion

The accuracy of the three models are as follows: 61%, 52% and 59%. T-test was performed to compare the best UMLS model with the base model and showed no significant difference.

Words within an utterance are clearly predictive of their semantic category. Our classifier, which relies on shallow word representations, performed better than what we would obtain if we label each segment with the most frequent class (accuracy=35%). Testing the intuition that an expert can predict the most appropriate features in the model showed that this is not always the case. Adding semantic knowledge from the UMLS did not provide significant improvement. Increasing the training data, as well as identifying more words and combinations of words, would probably increase the classification accuracy.


[1] W Chapman, M Fiszman, JN Dowling, BE Chapman, TC Rindflesch. Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. In Medinfo, pp. 487-491, 2004.

[2] A Aronson. Effective mapping of biomedical text to the UMLS metathesaurus: The MetaMap program. In Proc AMIA Symposium, pp. 17-21, 2001.

[3] A Ratnaparkhi. A maximum entropy part-of-speech tagger. In EMNLP Conference, pp. 133-142, 1996.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)