Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Cogn Sci
Impact Factor: 3.175

Speech perception and production

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context‐free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. Copyright © 2010 John Wiley & Sons, Ltd. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Language

Speech waveform of the words typical and yesteryear as produced by an adult male speaker, representing variations in amplitude over time. Vowels are generally the most resonant speech component, corresponding to the most extreme amplitude levels seen here. The identifying formant frequency information in the acoustics is not readily accessible from visual inspection of waveforms such as these.

[ Normal View | Magnified View ]

A sagittal view of the human vocal tract showing the main speech articulators as labeled. (Reprinted with permission from Ref. 70 Copyright 2001 Blackwell Publishers Inc.)

[ Normal View | Magnified View ]

Observations from early perceptual speech cue studies. In the first case, two different acoustic signals (consonant/vowel formant frequency transitions) result in the same percept. In the latter case, identical acoustics (release burst at 1440 Hz) result in two different percepts, depending on the vocalic context. In both cases, however, perception reflects articulatory, rather than acoustic, contrast. (Adapted and reprinted with permission from Ref. 29 Copyright 1996 American Institute of Physics.)

[ Normal View | Magnified View ]

Data for a single subject from a categorical perception experiment. The upper panel gives labeling or identification data for each step on a [b]/[g] place‐of‐articulation continuum. The lower graph gives this subject's ABX discrimination data (filled circles) for the same stimuli with one step difference between pairs, as well as the predicted discrimination performance (open circles). Discrimination accuracy is high at category boundaries and low within categories, as predicted. (Reprinted with permission from Ref. 9 Copyright 1957 American Psychological Association.)

[ Normal View | Magnified View ]

Top panel: A diagram of the principles and components at work in the Haskins Pattern Playback speech synthesizer. (Reprinted with permission from Ref. 68 Copyright 1951 national Academies of Science.) Lower panel: A series of hand‐painted schematic spectrographic patterns used as input to the Haskins Pattern Playback in early research on perceptual ‘speech cues.’ (Reprinted with permission from Ref. 69 Copyright 1957 American Institute of Physics.)

[ Normal View | Magnified View ]

A wide‐band speech spectrogram of the same utterance as in Figure 1, showing the change in component frequencies over time. Frequency is represented along the y‐axis and time on the x‐axis. Darkness corresponds to greater signal strength at the corresponding frequency and time.

[ Normal View | Magnified View ]

Related Articles

Mirror systems
Cognitive Science: Overviews

Browse by Topic

Psychology > Language
Linguistics > Language in Mind and Brain

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts