Program Developed At UF Enables Computer To "Emote" In Speech
February 15, 1999
GAINESVILLE — A recent University of Florida computer engineering graduate student has created a program that gives a computerized voice synthesizer the unusual ability to convey human emotional states including anger, sadness and fear.
D’Arcy Truluck’s program is aimed at helping speech-disabled people get across their feelings when using the synthesizer, but it also may help pave the way for computer-generated voices that one day will prove difficult to distinguish from the real thing.
“There’s a lot of potential here,” said Doug Dankel, an expert in artificial intelligence and UF assistant professor of computer and information science and engineering. Dankel is Truluck’s faculty advisor.
Truluck, who graduated with a master’s degree in computer and information science and engineering in December, created the program for her master’s project. She first had to become expert in a topic that would seem to have little to do with computer engineering: how the voice expresses emotion.
“There are quite a few psychological studies that I looked at that tried to figure out what is in speech that makes you hear certain emotions,” she said.
Truluck, 28, found many complex vocal variables play a role, including pitch, volume, accent, vowel length and the speed at which the speaker delivers words.
Her program manipulates these and other elements in a commercially available speech synthesizer program, allowing it to project five emotional states: fear, sadness, anger, happiness and neutrality. The program is easy to use: People type in what they want to say, choose how to express it, then press a “translate” button on the screen.
The program conveys some emotions better than others, Truluck’s tests showed.
Of 30 randomly selected volunteers who listened to sentences read by the computer, nearly all identified the sad voice. Many also identified the angry and fearful voice, but the volunteers had trouble differentiating the happy voice from the neutral voice, Truluck said.
That’s partly because angry and sad voices have distinctive qualities. Angry voices, for example, are characterized by dramatic, rapid, decreases in pitch, Truluck said.
Truluck said she would need to refine and improve the program considerably before it could be marketed. One possibility is to add more emotionally nuanced voices, such as sarcasm or contentment, she said. Another is to give the program the ability to vary multiple parameters in voices, so that speech-disabled people could create personalized voices.
William Brown, professor and chairman of UF’s department of communication sciences and disorders, said Truluck’s program has promise for the speech-disabled.
“It’s of great benefit because people do not want to hear the monotonous type of dialogue that is usually associated with computer synthesized speech,” he said. “I think it’s quite fascinating.”
Dankel said computer-generated voices are likely to become much more common in the future as businesses find more and more applications for them. For example, people one day may converse with computers for help with products or services, he said.
People find traditional computer-synthesized monotones hard to focus on, and Truluck’s program could be a building block for more accessible voices, he said.
“When we get into the process of humans and computers communicating with each other, we want to be as natural as possible,” Dankel said.