Voice-to-Voice Translation Machine Perfects Bedside Manner
Three years of work by a large interdisciplinary team at the University of Southern California has created a rudimentary but working two-way voice translation system that allows an English-speaking doctor to talk to a Persian-speaking patient.
The Transonics Spoken Dialog Translator turns a doctor’s spoken English questions into spoken Persian, and translates patients’ spoken Persian replies into spoken English.
Shrikanth Narayanan leads the USC Viterbi School group that developed Transonics. One member of this team presented a report on the system June 26 at the Association for Computational Linguistics conference in Ann Arbor Michigan.
“Fluent two-way machine voice translation is one of the holy grails of engineering,” said Narayanan, an associate professor of electrical engineering, computer science and linguistics at the USC Viterbi School of Engineering who directs the Speech Analysis and Interpretation Laboratory (SAIL) in the Viterbi School’s Integrated Media Systems Center.
“We are years away from perfecting it, but we think the choices we have made about how to go about creating such a system are working. We hope to have something that will be useful in emergency rooms or ambulances within two years or so.”
The system that exists, funded by two DARPA grants totalling $3.8 million, is a result of intensive research in information technology, critically supplemented by careful observation of patient-doctor dynamics in numerous bilingual interaction sessions staged for the project.
Narayanan noted that the Transonics approach relies not just on computer code, but also on the ability of humans to use even imperfect tools. This approach, he adds, grows directly out of the extraordinary difficulty of the technical problems involved.
“Two-way voice translation involves combining at least three highly imperfect existing disciplines, with the errors multiplying at every stage,” Narayanan explained. These include:
Text translation. Taking a written text in one language, and translating it into another. Machine translation systems developed by researchers Kevin Knight and Daniel Marcu at the Viterbi School’s Information Sciences Institute consistently rank among the world’s best “” but still make frequent grammatical and other errors. Marcu and Knight developed a specialized system specifically for use in Transonics.
Spoken word recognition. This is Narayanan’s specialty. Just being able to reliably recognize a large number of different single words, in a variety of regional or foreign accents, is a difficult problem that is far from solved, as anyone who has tried to use existing telephone interfaces knows. Recognizing a wide variety of words informally spoken in a noisy, chaotic environment (emergency room, ambulance) adds another level of difficulty.
Extra-verbal communication: Humans speak not just with words, but also with intonations. A rising tone at the end of a sentence to express a question is one familiar example of this, one that is extraordinarily difficult for a machine to assess. Nonsense syllables (“um, uh, ah, er”), catchphrases (“you know, like,”) and exclamations (Wow! Hey!) in utterances are easy for humans to decode or ignore, but major stumbling blocks for machines. The insights of David Traum of the USC Institute for Creative Technologies in dialog management are aiding in this area by narrowing the range of possibilities and by bringing context and previous exchanges into the computer’s decision-making. Additionally, teaching computers to detect human emotions in speech is a major focus work by researchers at the USC Speech Analysis and Interpretation Laboratory under the direction of Narayanan and his colleague, USC research assistant professor Panos Georgiou The Transonics interface stretches the limits of technology by systematicaly taking advantage of the fact that doctor- patient discourse is, by its nature, highly structured, using a narrow set of concepts. “We can take advantage of using essentially pre-fabricated sentences in many cases by trying to understand and paraphrase what is being communicated instead of doing exact word for word translation,” Narayanan says.
Additionally, the system uses the human ability to read text as a bridge over some of the worst problems of speech recognition and machine translation, by allowing users to select alternate possible messages.
The Transonics system runs on a laptop computer using the Linux operating system. Doctor and patient both wear headphones with attached microphones. A small keypad connected to the computer speeds and simplifies certain routine commands “” switching from doctor mode to patient mode, for example.
When a doctor asks a question, the speech recognition software captures it “” but hedges its bets by displaying not just its best guess about what was said, but a range of options. When the doctor chooses the most appropriate (some of the most used phrases can be put in a quick access “ready menu,”) and the result is a spoken Persian question in the earphones of the patient.
The same process takes place in the reverse direction.
Narayanan says much of the success of the interface grows directly out of analysis of a large database of some 300 English-speaking-doctor/Persian-speaking-patient dialogs created by USC medical students and Iranian-heritage USC students and Los Angeles residents. “Rather than imagining what people might say, we analyzed what people did say,” he explained, adding that recordings of the encounters were used to train and tune the system.
USC linguistics Ph.D. candidate Shadi Ganjavi played a vital role in setting up these encounters, said Narayanan. “We are grateful to her and to the large Persian-speaking community in Los Angeles.
The system contains about 23,000 English and 9,000 Persian words, a disproportion that exists because relatively little has so far been done in machine translation of Persian (a language also called Farsi), either written or spoken. “In addition to our progress in the general problem of the interface,” says Narayanan, “we are also contributing to the specific problems posed by translating between English and Farsi.”
For Narayanan, one of the striking things that has emerged so far is the dependence of the system, in its current state, on the ability of users to recognize its limits and weaknesses, and work within them.
The team has created an elaborate user manual, and as with any system, reading the manual improves performance a great deal. And common sense is critical. Narayanan ruefully describes an interaction labeled a failure in followup questioning by both ‘doctor’ and ‘patient’ that foundered because both expected the system to translate the name “Excedrin.”
The drug name wasn’t in the system. It’s the same in both languages, and both sides of the interaction understood it when they heard the other pronounce the word. But rather than just moving on, both stubbornly kept trying to enter it into the system “” which kept rejecting it.
“We learn from things like this,” said Narayanan. He and his colleague Georgiou estimate that if the system were tagged with the familiar release number decimal system, the system would be at “three point something” “” it has gone through three radical reconstructions in its three years of development so far.
Transonics interface displays a possible message or messages captured from doctor’s speech. The doctor can choose the one he wants, and the machine will pronounce a Persian translation. Right hand column stores heavily used questions.
The system is in a continuing process of upgrading and improvement. Simultaneously with the presentation at the ACL conference, use testing was in process at a military facility.
In addition to the researchers and institutions already named, Malibu California-based HRL Laboratories works with USC on the project. HRL personnel involved include USC alumni Dr. Robert Belvin and Howard Neely, and Cheryl Hein. Usability testing and interface design contributions were made by Scott Millward, a postdoctoral scientist at IMSC. Additionally, four USC electrical engineering graduate students have made large contributions: Emil Ettellaie, Dagen Wang, Ananthakrishnan Shankar, Murtaza Bulut, and Sudeep Ghande, the presenter of the paper at ACL.
On the Internet: