ModelTalker Personalized Synthetic Speech

Investigators:

  • H. Timothy Bunnell, PhD


Background

There are a variety of congenital and acquired neurological and neuromuscular impairments that can cause an inability to speak and require patients to use assistive communication devices. These conditions include cerebral palsy, autism, developmental apraxia of speech, traumatic brain injury, and spinal cord injury, among others. Most affect children as well as adults. It is estimated that 8 to 12 individuals per 1000 in the general population experience communication impairments that require Assistive and Augmentative Communication (AAC). Census Bureau estimates from 1996 suggest that the number of children 15 years of age and younger with severe communication disabilities is more than half a million. The Therapeutic Services unit in the Alfred I. duPont Hospital for Children sees approximately 100 children who are AAC users each year, although this estimate includes children with computer-assisted writing needs. The laboratorys ModelTalker speech synthesis system is intended to supply a natural-sounding and personalized voice for these and similar AAC users.

What We’re Doing

The ModelTalker speech synthesis system is a state of the art speech synthesizer that was specifically designed to meet the needs and interests of AAC users. The system, developed in our laboratory over the past several years, is capable of providing a personalized voice with recorded natural speech quality for selected utterances and high quality, unrestricted text to speech synthesis in the same "voice" for novel utterances. The ModelTalker system presently works with augmentative communication tools such as the Prentke-Romich Wivik program, and in the course of the project, will be extended to work with emerging computer based AAC devices as a "plug compatable" synthesizer to provide augmented communicators with the alternative of using their own personalized voice in their AAC device.

The ModelTalker System requires that a person whose voice we wish to emulate record an inventory of speech. These recordings are stored in a database. Speech is then synthesized from this database using longer stretches of recorded speech for utterances that closely or exactly match speech recordings in the database, and shorter stretches of recorded speech appended together for novel utterances. When speech is synthesized using the longer stretches of recorded speech, the intonation and timing patterns of the speech do not need to be altered. Thus the speech is close to the quality of recorded natural speech. When the speech is synthesized by appending together shorter stretches of speech, the timing and intonation patterns are generated from a set of rules. The resulting speech sounds like high quality synthetic speech, but will still retain the characteristics of the recorded voice in the database. Software components for the ModelTalker system include the speech recording software, called InvTool; the BCCdb program, which puts the inventory of recorded speech from InvTool into a form that the ModelTalker program can use for speech synthesis; and the ModelTalker program itself, which accepts typed text from the user and carries out the actual synthesis. These components are constantly being updated and improved in our laboratory.

Some of What We’ve Found

We are currently conducting a study to evaluate the effectiveness of the ModelTalker system, including end-user satisfaction with the recording process and satisfaction with the resulting synthetic voice. The system is being evaluated using people who have been diagnosed with ALS who are currently speaking but who anticipate losing their voice at some time in the future due to progression of the disease. These subjects, along with a close associate with whom they communicate on a regular basis, register for the study over the Internet and download software from our laboratory that allows them to record an inventory of speech and create a personalized synthetic voice. Once they have created a synthetic voice from their recordings, they participate in a series of surveys and test sentence evaluations over the Internet for the next eight weeks. Survey questions address the intelligibility of the synthesizer, the naturalness of sentence intonation and rhythm, the sound quality of the voice, etc. Through the surveys we also hope to identify talker/voice characteristics which tend to work poorly with this technology. Hopefully this will enable us to fix problems with the technology. However, if we are unable to fix the system, we will be able to warn prospective users with voice characteristics that may prove difficult. The surveys are also intended to assess the voice creation process specific to ModelTalker.

Other Nemours Websites
X

Our Locations

Delaware

Hospital
Children's Clinic (specialty care)
Cardiac Center
Pediatrics (primary care)
Health Clinic (senior care)

Pennsylvania

Children's Clinic (specialty care)
Pediatrics (primary care)
Pediatric Partner Hospitals