Research Article| Volume 34, ISSUE 3, P487.e1-487.e9, May 2020

Crossing Gender Borders: Bidirectional Dynamic Interaction Between Face-Based and Voice-Based Gender Categorization

Published:October 27, 2018DOI:


      The processing of voices and faces is known to interact, for example, when recognizing other persons. However, few studies focus on both directions of this interaction, including the influence of incongruent visual stimulation on voice perception. In the present study, we implemented an interference paradigm involving 1152 videos of faces with either gender-congruent or gender-incongruent voices. Participants were asked to categorize the gender of either the face or the voice via key press. Task (face-based vs. voice-based gender categorization task) was manipulated both block-wise (relatively low executive control demands) and in a mixed block (relatively high executive control demands due to trial-by-trial task switches). We aimed at testing whether and how gender-incongruent stimuli negatively affected gender categorization speed and accuracy. The results indicate significant congruency effects in both directions – gender-incongruent visual information negatively affected voice categorization time and errors, and gender-incongruent voices affected visual face categorization. However, the former effect was stronger, supporting theories postulating visual dominance in face-voice integration. Congruency effects, which were not significantly reduced over the course of the experiment, were larger under high executive control demands (task switches), suggesting the availability of fewer attentional resources for incongruency resolution. Overall, voices generally appear to be processed in conjunction with facial information, which yields enhanced processing for more authentic voices, that is, voices that do not violate face-based expectancies. The data strengthen theories of face-voice processing emphasizing strong interaction between both processing channels.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Fodor J.A.
        Modularity of Mind: An Essay on Faculty Psychology.
        MIT Press, Cambridge, MA1983
        • Broadbent D.E.
        Perception and Communication.
        Oxford University Press, New York1958
        • Yovel G.
        • Belin P.
        A unified coding strategy for processing faces and voices.
        Trends Cogn Sci. 2013; 17: 263-271
        • Barton J.J.
        Structure and function in acquired prosopagnosia: lessons from a series of 10 patients with brain damage.
        J Neuropsychol. 2008; 2: 197-225
        • Garrido M.I.
        • Kilner J.M.
        • Stephan K.E.
        • et al.
        The mismatch negativity: a review of underlying mechanisms.
        Clin Neurophysiol. 2009; 120: 453-463
        • Hailstone J.C.
        • Crutch S.J.
        • Vestergaard M.D.
        • et al.
        Progressive associative phonagnosia: a neuropsychological analysis.
        Neuropsychologia. 2010; 48: 1104-1114
        • Herald S.B.
        • Xu X.
        • Biederman I.
        • et al.
        Phonagnosia: a voice homologue to prosopagnosia.
        Vis Cogn. 2014; 22: 1031-1033
        • Van Lancker D.R.
        • Cummings J.L.
        • Kreiman J.
        • et al.
        Phonagnosia: a dissociation between familiar and unfamiliar voices.
        Cortex. 1988; 24: 195-209
        • Bruce V.
        • Young A.
        Understanding face recognition.
        Br J Psychol. 1986; 77: 305-327
        • Stevenage S.V.
        • Hugill A.
        • Lewis H.G.
        Integrating voice recognition into models of person perception.
        J Cogn Psychol. 2012; 24: 409-419
        • Ellis H.D.
        • Jones D.M.
        • Mosdell N.
        Intra- and inter-modal repetition priming of familiar faces and voices.
        Br J Psychol. 1997; 88: 143-156
        • Stevenage S.V.
        • Neil G.J.
        Hearing faces and seeing voices: the integration and interaction of face and voice processing.
        Psychol Belg. 2014; 54: 266-281
        • Campanella S.
        • Belin P.
        Integrating face and voice in person perception.
        Trends Cogn Sci. 2007; 11: 535-543
        • Colavita F.B.
        Human sensory dominance.
        Percept Psychophys. 1974; 16: 409-412
        • Spence C.
        Explaining the Colavita visual dominance effect.
        Prog Brain Res. 2009; 176: 245-258
        • Bertelson P.
        • Aschersleben G.
        Automatic visual bias of perceived auditory location.
        Psychon Bull Rev. 1998; 5: 482-489
        • Warren D.H.
        • Welch R.B.
        • McCarthy T.J.
        The role of visual-auditory “compellingness” in the ventriloquism effect: implications for transitivity among the spatial senses.
        Percept Psychophys. 1981; 30: 557-564
        • Shams L.
        • Kamitani Y.
        • Shimojo S.
        Visual illusion induced by sound.
        Cogn Brain Res. 2002; 14: 147-152
        • Belin P.
        • Fecteau S.
        • Bedard C.
        Thinking the voice: neural correlates of voice perception.
        Trends Cogn Sci. 2004; 8: 129-135
        • Belin P.
        • Bestelmeyer P.E.
        • Latinus M.
        • et al.
        Understanding voice perception.
        Br J Psychol. 2011; 102: 711-725
        • McGurk H.
        • MacDonald J.
        Hearing lips and seeing voices.
        Nature. 1976; 264: 746-748
        • Green K.P.
        • Kuhl P.K.
        • Meltzoff A.N.
        • et al.
        Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect.
        Percept Psychophys. 1991; 50: 524-536
        • de Gelder B.
        • Vroomen J.
        The perception of emotions by ear and by eye.
        Cogn Emot. 2000; 14: 28-311
        • Hagan C.C.
        • Woods W.
        • Johnson S.
        • et al.
        MEG demonstrates a supra-additive response to facial and vocal emotion in the right superior temporal sulcus.
        Proc Natl Acad Sci USA. 2009; 106: 20010-20015
        • Hietanen J.K.
        • Leppänen J.M.
        • Illi M.
        • et al.
        Evidence for the integration of audiovisual emotional information at the perceptual level of processing.
        Eur J Cogn Psychol. 2004; 16: 769-790
        • Pourtois G.
        • de Gelder B.
        • Bol A.
        • et al.
        Perception of facial expressions and voices and of their combination in the human brain.
        Cortex. 2005; 41: 49-59
        • Boltz M.G.
        Facial biases on vocal perception and memory.
        Acta Psychol. 2017; 177: 54-68
        • Schweinberger S.R.
        • Kloth N.
        • Robertson D.M.C.
        Hearing facial identities: brain correlates of face–voice integration in person identification.
        Cortex. 2011; 47: 1026-1037
        • Zweig L.J.
        • Suzuki S.
        • Grabowecky M.
        Learned face–voice pairings facilitate visual search.
        Psychon Bull Rev. 2015; 22: 429-436
        • Joassin F.
        • Maurage P.
        • Campanella S.
        The neural network sustaining the crossmodal processing of human gender from faces and voices: an fMRI study.
        Neuroimage. 2011; 54: 1654-1661
        • Smith E.L.
        • Grabowecky M.
        • Suzuki S.
        Auditory–visual crossmodal integration in perception of face gender.
        Curr Biol. 2007; 17: 1680-1685
        • Masuda S.
        • Tsujii T.
        • Watanabe S.
        An interference effect of voice presentation on face gender discrimination task: evidence from event-related potentials.
        Int Congr Ser. 2005; 1278: 156-159
        • Freeman J.B.
        • Ambady N.
        When two become one: temporally dynamic integration of the face and voice.
        J Exp Soc Psychol. 2011; 47: 259-263
        • Weston P.S.
        • Hunter M.D.
        • Sokhi D.S.
        • et al.
        Discrimination of voice gender in the human auditory cortex.
        Neuroimage. 2015; 15: 208-214
        • Peynircioglu Z.
        • Brent W.
        • Tatz J.
        • et al.
        McGurk effect in gender identification: vision trumps audition in voice judgments.
        J Gen Psychol. 2017; 144: 59-68
        • Latinus M.
        • VanRullen R.
        • Taylor M.J.
        Top-down and bottom-up modulation in processing bimodal face/voice stimuli.
        BMC Neurosci. 2010; 11: 36
        • Stroop J.R.
        Studies of interference in serial verbal reactions.
        J Exp Psychol. 1935; 18: 643-662
        • MacLeod C.M.
        Half a century of research on the Stroop effect: an integrative review.
        Psychol Bull. 1991; 109: 163-203
        • Frost R.
        • Armstrong B.C.
        • Siegelman N.
        • et al.
        Domain generality versus modality specificity: the paradox of statistical learning.
        Trends Cogn Sci. 2015; 19: 117-125
        • Allport D.A.
        • Styles E.A.
        • Hsieh S.
        Shifting intentional set: exploring the dynamic control of tasks.
        in: Umiltà C. Moscovitch M. Attention and Performance Series. Attention and Performance 15: Conscious and Nonconscious Information Processing. MIT Press, Cambridge, MA1994: 421-452
        • Rogers R.D.
        • Monsell S.
        Costs of a predictible switch between simple cognitive tasks.
        J Exp Psychol Gen. 1995; 124: 207-231
        • Kiesel A.
        • Steinhauser M.
        • Wendt M.
        • et al.
        Control and interference in task switching—a review.
        Psychol Bull. 2010; 136: 849-874
        • Alsius A.
        • Möttönen R.
        • Sams M.E.
        • et al.
        Effect of attentional load on audiovisual speech perception: evidence from ERPs.
        Front Psychol. 2014; 15: 727
        • Kim S.Y.
        • Kim M.S.
        • Chun M.M.
        Concurrent working memory load can reduce distraction.
        Proc Natl Acad Sci USA. 2005; 102: 16524-16529
        • Logan G.D.
        • Zbrodoff N.J.
        When it helps to be misled: facilitative effects of increasing the frequency of conflicting stimuli in a Stroop-like task.
        Mem Cognit. 1979; 3: 166-174
        • Meiran N.
        Reconfiguration of processing mode prior to task performance.
        J Exp Psychol Learn Mem Cogn. 1996; 22: 1423-1442
        • Cook S.
        • Wilding J.
        Earwitness testimony 2: voices, faces and context.
        Appl Cogn Psychol. 1997; 11: 527-541
        • Stevenage S.V.
        • Howland A.
        • Tippelt A.
        Interference in eyewitness and earwitness recognition.
        Appl Cogn Psychol. 2011; 25: 112-118
        • Stevenage S.V.
        • Neil G.J.
        • Hamlin I.
        When the face fits: recognition of celebrities from matching and mismatching faces and voices.
        Memory. 2014; 22: 284-294
        • Hanley J.R.
        • Smith S.T.
        • Hadfield J.
        I recognise you but I can't place you: an investigation of familiar-only experiences during tests of voice and face recognition.
        Q J Exp Psychol. 1998; 51: 179-195
        • Barsics C.
        • Brédart S.
        Recalling semantic information about newly learned faces and voices.
        Memory. 2012; 20: 527-534
        • Hanley J.R.
        • Damjanovic L.
        It is more difficult to retrieve a familiar person's name and occupation from their voice than from their blurred face.
        Memory. 2009; 17: 830-839
        • De Fockert J.W.
        • Rees G.
        • et al.
        The role of working memory in visual selective attention.
        Science. 2001; 291: 1803-1806
        • Schweinberger S.R.
        • Robertson D.
        • Kaufmann J.M.
        Hearing facial identities.
        Q J Exp Psychol. 2007; 60: 1446-1456
        • Dyer E.N.
        The Stroop phenomenon and its use in the study of perceptual, cognitive, and response processes.
        Mem Cogn. 1973; 1: 106-120
        • Robertson D.M.C.
        • Schweinberger S.R.
        The role of audiovisual asynchrony in person recognition.
        Q J Exp Psychol. 2010; 63: 23-30
        • Nava E.
        • Pavani F.
        Changes in sensory dominance during childhood: converging evidence from the colavita effect and the sound-induced flash illusion.
        Child Dev. 2013; 84: 604-616
        • Noles N.S.
        • Gelman S.A.
        Preschool children and adults flexibly shift their preferences for auditory versus visual modalities, but do not exhibit auditory dominance.
        J Exp Child Psychol. 2012; 112: 338-350