Throughout the last 20 years, several important advances in speech recognition and analysis have occurred, but very few of these advances have impacted speech disorder research or clinical practices. The speech genetic study currently underway will address this divide between experimental achievements and clinical methods. By adapting current, popular speech recognition technologies (based on Hidden Markov Modeling) for use in analyzing speech delay of unknown origin in children, much can be learned about the effectiveness of these newer, breakthrough techniques. The research project will concentrate genetic investigation on specific regions of chromosome 3 that have previously been implicated in speech disorders, thereby increasing the potential for meaningful discoveries.
Although current speech research practice is dominated by computer-based analytical aids, much time must be spent manually calibrating the speech models used by computers. Because of the required amount of detailed input necessary on the part of highly experienced speech scientists, speech research is not as efficient or as productive as current technology actually allows. It is possible though, to overcome this obstacle by adapting computer models to work in the researchers’ favor. Thus, the current research will explore this potential.
Another goal of the present research is to study possible genetic factors in childhood speech disorders of unknown origin (i.e., the delay cannot be caused by deafness, cognitive delays, or physical malformation). Previous studies have shown that genetic areas currently linked with dyslexia may also be associated with speech delay. By simultaneously studying speech production and perception of children with speech delay and of those without speech delay, it may be possible to more accurately identify which areas of the genome act as predictable markers for speech delay. If a genetic test for speech delay is achieved, or assessed even as being possible, the speech therapy needs of future generations of children can be more quickly evaluated and conducted with a conceivably higher success rate.
What We're Doing
We’re currently seeking participant families for the speech genetics research project. Each family must include at least two siblings in the age range of 5-9 years during the course of this study, as well as at least one biological parent. One or more children from each family must represent speech delay without known causes. The two children and at least one biological parent must participate in the saliva (DNA) collection process, as well as partake in a genetic interview. For more information on family participation, or to take a participation survey, please visit the Speech Research Lab website.
Soon after a family is confirmed to meet the research participant criteria, a parent will be contacted and an appointment can be scheduled for a visit to the lab. It is there that the project can be explained in detail to the family. A great benefit of a personal meeting between family members and lab staff is that any outstanding questions or concerns regarding the focus, content, procedures, and impact of this important research can be fully addressed.
Once the family is confident in their understanding of the project, participation may begin. Each child participant will be administered several tests, including some that measure speech production and speech perception. We will collect saliva samples from all participating family members so that we can analyze DNA. In addition, each participating parent will be interviewed by a genetic counselor so that we may learn more about family history. Testing will require two visits to our lab (lasting about 2 hours each) for each child. Families who participate will receive compensation after completing all of the necessary interviews and testing.
ModelTalker Speech Synthesis System
The second area of investigation presently underway in the Speech Research Lab seeks to evaluate and refine a speech synthesizer developed in the lab over the past several years. The synthesizer, called ModelTalker, is a state-of-the-art, "corpus-based" speech synthesizer capable of capturing personalized voices and producing speech that can range in quality from that of recorded natural speech to high-quality synthetic speech. ModelTalker may be used, for example, by children with neuromuscular disorders, such as cerebral palsy, that render them unable to speak intelligibly. A voice can be customized with an appropriate age, gender and regional dialect for each individual child. Significant components of this work will optimize and evaluate the procedure used to create the individualized voices. The lab will also evaluate satisfaction of end users, clinicians, and families to ensure continuous improvement of the ModelTalker software.
There are a variety of congenital and acquired neurological and neuromuscular impairments that can cause an inability to speak and require patients to use assistive communication devices. These conditions include cerebral palsy, autism, developmental apraxia of speech, traumatic brain injury and spinal cord injury, among others. Most affect children as well as adults. It is estimated that 8 to 12 individuals per 1000 in the general population experience communication impairments that require Assistive and Augmentative Communication (AAC). Census Bureau estimates from 1996 suggest that the number of children 15 years of age and younger with severe communication disabilities is more than half a million. The Therapeutic Services unit in the Nemours Children’s Hospital, Delaware sees approximately 100 children who are AAC users each year, although this estimate includes children with computer-assisted writing needs. The lab's ModelTalker speech synthesis system is intended to supply a natural-sounding and personalized voice for these and similar AAC users.
What We're Doing
The ModelTalker speech synthesis system is a state-of-the-art speech synthesizer that was specifically designed to meet the needs and interests of AAC users. The system, developed in our lab over the past several years, is capable of providing a personalized voice with recorded natural speech quality for selected utterances and high-quality, unrestricted text-to-speech synthesis in the same "voice" for novel utterances. The ModelTalker system presently works with augmentative communication tools such as the Prentke-Romich Wivik program, and in the course of the project, will be extended to work with emerging computer-based AAC devices as a "plug compatible" synthesizer to provide augmented communicators with the alternative of using their own personalized voice in their AAC device.
The ModelTalker system requires that a person whose voice we wish to emulate record an inventory of speech. These recordings are stored in a database. Speech is then synthesized from this database using longer stretches of recorded speech for utterances that closely or exactly match speech recordings in the database, and shorter stretches of recorded speech appended together for novel utterances. When speech is synthesized using the longer stretches of recorded speech, the intonation and timing patterns of the speech do not need to be altered. Thus the speech is close to the quality of recorded natural speech. When the speech is synthesized by appending together shorter stretches of speech, the timing and intonation patterns are generated from a set of rules. The resulting speech sounds like high-quality synthetic speech, but will still retain the characteristics of the recorded voice in the database. Software components for the ModelTalker system include the speech recording software, called InvTool; the BCCdb program, which puts the inventory of recorded speech from InvTool into a form that the ModelTalker program can use for speech synthesis; and the ModelTalker program itself, which accepts typed text from the user and carries out the actual synthesis. These components are constantly being updated and improved in our lab.
Some of What We’ve Found
We are currently conducting a study to evaluate the effectiveness of the ModelTalker system, including end-user satisfaction with the recording process and satisfaction with the resulting synthetic voice. The system is being evaluated using people who have been diagnosed with ALS who are currently speaking, but who anticipate losing their voice at some time in the future due to progression of the disease. These subjects, along with a close associate with whom they communicate on a regular basis, register for the study over the internet and download software from our lab that allows them to record an inventory of speech and create a personalized synthetic voice. Once they have created a synthetic voice from their recordings, they participate in a series of surveys and test sentence evaluations over the internet for the next 8 weeks. Survey questions address the intelligibility of the synthesizer, the naturalness of sentence intonation and rhythm, the sound quality of the voice, etc. Through the surveys we also hope to identify talker/voice characteristics which tend to work poorly with this technology. Hopefully this will enable us to fix problems with the technology. However, if we are unable to fix the system, we will be able to warn prospective users with voice characteristics that may prove difficult. The surveys are also intended to assess the voice creation process specific to ModelTalker.
STAR: Speech Training, Assessment and Remediation
The lab's third area of investigation is the study of computer speech processing and speech recognition to support clinical speech assessment and speech training. This work has grown out of earlier studies of a specific type of speech disorder (dysarthria) and now encompasses the speech of children with a variety of other, often curable, speech disorders. In collaboration with clinical staff, the lab has developed the Speech Training, Assessment and Remediation (STAR) project, an interactive computer-based program that provides rigorous speech measurement and assessment along with effective training procedures to help children quickly improve their articulation skills. This is an ambitious research project that will leverage existing speech processing and speech recognition software in our lab, but it will also require research into new speech processing and artificial intelligence technologies to achieve the project goals.
Each week, the hospital receives about five new referrals for speech and language evaluation. There is a significant wait before these children can be scheduled for a clinical assessment. If speech therapy is recommended, there may be an additional wait to begin therapy. Clearly, there is a substantial population of children who need speech therapeutic services. Better access to speech therapeutic services would benefit the hospital and its patients. Computer-based tools would facilitate the process of assessing speech and delivering speech therapy.
The STAR system, developed in the Speech Research Lab, is a prototype computer-based speech assessment and training tool that uses an animated star character to interact with and encourage children in speech learning and production tasks. These tasks are incorporated into different games. The star character acts as a guide and motivator throughout these games. The system, when completed, will start with basic phoneme recognition tasks, and progress through phoneme production tasks, first in isolation, then in simple, and eventually more complex phonetic contexts, and finally in sentences. The long range goal for the STAR project is to combine the latest technologies involving believable animated agents, speech recognition, artificial intelligence, and speech synthesis to create a computer-based speech training tool that children find enjoyable and easy to work with.
What We're Doing
Currently we are in the process of evaluating one of the games. The game is a simple speech production game in which a child is asked to produce words drawn from a minimal pair that contrast a phoneme to be trained with one the child has already achieved. For example, if a child produces /w/ correctly, but also substitutes /w/ for /r/, we would use the minimal pair "red" and "wed" as test words in the game. The game proceeds by asking the child to say one of the words, captures and analyzes the child's utterance to determine how closely it matches recognition models for both contrasting words, and gives feedback to the child in the form of a spoken response from the star character regarding the accuracy of the child's production. The game is being evaluated for its ability to effectively decrease the length of acquisition time for the sounds being trained. Children between 4 and 7 years of age who had difficulty articulating the /r/ sound were recruited to participate in the evaluation study. Children in one group used STAR to train the /r/ sound. Children in another group played with STAR using sounds that they had already mastered (i.e., different sounds than the one that they were having difficulty with). All children in the study received three half-hour sessions with STAR each week in addition to one half-hour of conventional speech therapy provided by a Speech Language Pathologist who didn't know which group the children were in. Before and after each session with STAR, all children recorded a series of words designed to probe their pronunciation of a variety of phonemes including /r/. These probe words are being analyzed for the percentage of phonemes correctly pronounced over the course of the experiment and compared between the two groups. In addition, we plan to use the data recorded from all children as they pronounced the words presented to them during the STAR game to further improve our speech recognition models for children's speech.
The STAR program is required to recognize and rate children's utterances. Because so little work has been done with children's speech, this is an important area of research in our lab. In order to continue to make improvements in the speech recognition technology that this program uses, we are developing a database of voice recordings of normal, American-English speaking children between the ages of 6 and 8. The database will be used for analysis of typically-developing children's speech patterns.
Some of What We’ve Found
Preliminary analyses from the STAR evaluation study are encouraging. Calculations using data from the probe recordings have been completed for 10 of the subjects who have completed the experiment. The amount of improvement between the probe recordings at the beginning and those at the end of the study were compared. Children who used STAR to train on the /r/ sound improved an average of 22.7 percentage points over the course of the study. Other children who also played with STAR, but not for the /r/ sound that they were having trouble with, improved an average of only 14.6 percentage points. This preliminary pattern suggests that STAR may be effective for decreasing acquisition time for sounds being trained, but this impression needs to be supported with more data before any firm conclusions can be offered.
In the current phase of the project, we are preparing raw data from 18 subjects for analysis by collaborating Speech Language Pathologists (SLPs). Seventeen SLPs have been enlisted to listen to and rate the accuracy of the children’s pronunciations of the probe words, which were recorded before and after each half-hour session with STAR over the course of the experiment.