Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Cogn Sci
Impact Factor: 1.413

Speaker perception

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

While humans use their voice mainly for communicating information about the world, paralinguistic cues in the voice signal convey rich dynamic information about a speaker's arousal and emotional state, and extralinguistic cues reflect more stable speaker characteristics including identity, biological sex and social gender, socioeconomic or regional background, and age. Here we review the anatomical and physiological bases for individual differences in the human voice, before discussing how recent methodological progress in voice morphing and voice synthesis has promoted research on current theoretical issues, such as how voices are mentally represented in the human brain. Special attention is dedicated to the distinction between the recognition of familiar and unfamiliar speakers, in everyday situations or in the forensic context, and on the processes and representational changes that accompany the learning of new voices. We describe how specific impairments and individual differences in voice perception could relate to specific brain correlates. Finally, we consider that voices are produced by speakers who are often visible during communication, and review recent evidence that shows how speaker perception involves dynamic face–voice integration. The representation of para‐ and extralinguistic vocal information plays a major role in person perception and social communication, could be neuronally encoded in a prototype‐referenced manner, and is subject to flexible adaptive recalibration as a result of specific perceptual experience. WIREs Cogn Sci 2014, 5:15–25. doi: 10.1002/wcs.1261

Conflict of interest: The authors have declared no conflicts of interest for this article.

A short excerpt (heute ist ‘today is’) of speech from a male speaker illustrating typical acoustic parameters measured to assess voice similarity. (a) Sound pressure wave. (b) Excerpt of oscillogram showing six periods of voicing. Different jitter measures characterize local differences in the duration of the periods (blue curved arrows). Shimmer measures characterize local differences in period amplitude (vertical red arrows). (c) Formant frequencies (here F1‐F6) can be estimated from the time–frequency–intensity display (spectrogram). (d) The individual peaks in the spectral slice represent the individual harmonics (whole integer multiples of f0). The difference between the strength of the first and second harmonics (H1‐H2) is an important indication of breathiness in the voice source.
[ Normal View | Magnified View ]
(a) Independent of speaker identity correspondence, a frontocentral ERP negativity to dynamic and time‐synchronized audiovisual face‐voice stimuli emerges around 50–80 milliseconds (arrow), substantially earlier than to algebraically summed ERPs to the same individual unimodal stimuli. Note: Electrodes C3 and C4 are located 20% to the left and right of the vertex (top of the head, percentage is relative to the distance between the two pre‐auricular points when measured across the vertex), and thus approximately over the left and right hemispheric central sulci, respectively. (b) Scalp voltage maps of the speaker correspondence effect (difference between audiovisual—AV noncorresponding minus AV corresponding condition). Reliable correspondence effects first emerge around 250 milliseconds, when a central negativity is seen for noncorresponding pairs. This negativity then increases and shifts to a rightfrontotemporal maximum between 600 and 1200 milliseconds. (Reprinted with permission from Ref . Copyright 2011 Elsevier)
[ Normal View | Magnified View ]
Voice‐sensitive (anterior, middle, and posterior STS regions depicted as blue, red, and green spheres) and face‐sensitive brain areas (FFA, yellow sphere) and direct connections between them, as found with probabilistic fiber tracking. Panels A to D show different views. Data are from one representative participant of a larger study. Results of that study suggested particularly prominent connections between the FFA and more anterior voice‐sensitive areas in superior temporal cortex. (Reprinted with permission from Ref . Copyright 2011 Society for Neuroscience)
[ Normal View | Magnified View ]

Browse by Topic

Linguistics > Language in Mind and Brain
Psychology > Perception and Psychophysics
Neuroscience > Cognition
blog comments powered by Disqus

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts

Twitter: WBPsychology Follow us on Twitter