Research Article|Articles in Press

Perceptual and Computational Estimates of Vocal Breathiness and Roughness in Sustained Phonation and Connected Speech

  • Supraja Anand
    Address correspondence and reprint requests to: Supraja Anand, 4202 E. Fowler Avenue, PCD 1017, Tampa, FL, 33620.
    Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
    Search for articles by this author



      Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness.


      VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively.


      High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels.


      Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.

      Key words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Fant G.
        The voice source in connected speech.
        Speech Commun. 1997; 22: 125-139
        • Halberstam B.
        Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.
        ORL. 2004; 66: 70-73
        • Lowell SY.
        The acoustic assessment of voice in continuous speech.
        Perspect Voice Voice Disorders. 2012; 22: 57-63
        • Maryn Y
        • Corthals P
        • Van Cauwenberge P
        • et al.
        Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels.
        J Voice. 2010; 24: 540-555
        • Shrivastav R.
        Evaluating voice quality.
        Handbook Voice Assessments. 2011; : 305-318
        • Yiu E
        • Worrall L
        • Longland J
        • et al.
        Analysing vocal quality of connected speech using Kay's computerized speech lab: a preliminary finding.
        Clin Linguist Phon. 2000; 14: 295-305
        • Cannito M
        • Buder E
        • Chorna L
        • et al.
        Acoustic measures of phonation during connected speech in adductor spasmodic dysphonia.
        Otolaryngol. S1. 2012; 3
        • Erickson ML.
        Effects of voicing and syntactic complexity on sign expression in adductor spasmodic dysphonia.
        Am J Speech Lang Pathol. 2003; 12: 416-424
        • Roy N
        • Gouse M
        • Mauszycki SC
        • et al.
        Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia.
        The Laryngoscope. 2005; 115: 311-316
        • Awan SN
        • Roy N
        • Dromey C.
        Estimating dysphonia severity in continuous speech: application of a multi-parameter spectral/cepstral model.
        Clin Linguist Phon. 2009; 23: 825-841
        • Qi Y
        • Hillman RE
        • Milstein C.
        The estimation of signal-to-noise ratio in continuous speech for disordered voices.
        J Acoust Soc Am. 1999; 105: 2532-2535
        • Kempster GB
        • Gerratt BR
        • Abbott KV
        • et al.
        Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol.
        Am J Speech Lang Pathol. 2009; 18: 124-132
        • Hirano M.
        Psycho-acoustic evaluation of voice: GRBAS Scale for evaluating the hoarse voice.
        Clinical Examination of voice. Springer Verlag, Wien1981
        • Barsties B
        • De Bodt M.
        Assessment of voice quality: current state-of-the-art.
        Auris Nasus Larynx. 2015; 42: 183-188
        • De Bodt MS
        • Wuyts FL
        • Van de Heyning PH
        • et al.
        Test-retest study of the GRBAS scale: influence of experience and professional background on perceptual rating of voice quality.
        J Voice. 1997; 11: 74-80
        • Karnell MP
        • Melton SD
        • Childes JM
        • et al.
        Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders.
        J Voice. 2007; 21: 576-590
        • Nemr K
        • Simoes-Zenari M
        • Cordeiro GF
        • et al.
        GRBAS and Cape-V scales: high reliability and consensus when applied at different times.
        J Voice. 2012; 26: 812.e17-812.e22
        • Wuyts FL
        • De Bodt MS
        • Van de Heyning PH
        Is the reliability of a visual analog scale higher than an ordinal scale? An experiment with the GRBAS scale for the perceptual evaluation of dysphonia.
        J Voice. 1999; 13: 508-517
        • Bele IV.
        Reliability in perceptual analysis of voice quality.
        J Voice. 2005; 19: 555-573
        • Kreiman J
        • Gerratt BR
        • Kempster GB
        • et al.
        Perceptual evaluation of voice quality: review, tutorial, and a framework for future research.
        J Speech Lang Hear Res. 1993; 36: 21-40
        • Peterson GE
        • Barney HL.
        Control methods used in a study of the vowels.
        J Acoust Soc Am. 1952; 24: 175-184
        • Anand S
        • Kopf LM
        • Shrivastav R
        • et al.
        Using pitch height and pitch strength to characterize type 1, 2, and 3 voice signals.
        J Voice. 2021; 35: 181-193
        • Bielamowicz S
        • Kreiman J
        • Gerratt BR
        • et al.
        Comparison of voice analysis systems for perturbation measurement.
        J Speech Lang Hear Res. 1996; 39: 126-134
        • Rabinov CR
        • Kreiman J
        • Gerratt BR
        • et al.
        Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter.
        J Speech Lang Hear Res. 1995; 38: 26-32
        • Parsa V
        • Jamieson DG.
        Acoustic discrimination of pathological voice.
        J Speech Lang Hear Res. 2001; 44: 327-339
        • Hillenbrand J
        • Houde RA.
        Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech.
        J Speech Lang Hear Res. 1996; 39: 311-321
        • Heman-Ackah YD
        • Michael DD
        • Goding Jr, GS.
        The relationship between cepstral peak prominence and selected parameters of dysphonia.
        J Voice. 2002; 16: 20-27
        • Fairbanks G.
        The rainbow passage.
        Voice Articulation Drillbook. 1960; 2: 127
        • Peterson EA
        • Roy N
        • Awan SN
        • et al.
        Toward validation of the cepstral spectral index of dysphonia (CSID) as an objective treatment outcomes measure.
        J Voice. 2013; 27: 401-410
        • Watts CR
        • Awan SN.
        An examination of variations in the cepstral spectral index of dysphonia across a single breath group in connected speech.
        J Voice. 2015; 29: 26-34
        • Maryn Y.
        • De Bodt M.
        • Barsties B.
        • et al.
        The value of the Acoustic Voice Quality Index as a measure of dysphonia severity in subjects speaking different languages.
        Eur Arch Oto-Rhino-Laryngol. 2014; 271: 1609-1619
        • Patel RR
        • Awan SN
        • Barkmeier-Kraemer J
        • et al.
        Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function.
        Am J Speech Lang Pathol. 2018; 27: 887-905
        • Latoszek BB
        • Maryn Y
        • Gerrits E
        • et al.
        A meta-analysis: Acoustic measurement of roughness and breathiness.
        J Speech Lang Hear Res. 2018; 61: 298-323
        • Eadie TL
        • Baylor CR.
        The effect of perceptual training on inexperienced listeners' judgments of dysphonic voice.
        J Voice. 2006; 20: 527-544
        • Hanson W
        • Emanuel FW.
        Spectral noise and vocal roughness relationships in adults with laryngeal pathology.
        J. Commun. Disord. 1979; 12: 113-124
        • Eddins DA
        • Anand S
        • Lang A
        • et al.
        Developing clinically relevant scales of breathy and rough voice quality.
        J Voice. 2021; 35: 663.e9-663.e16
        • Patel S
        • Shrivastav R
        • Eddins DA.
        Perceptual distances of breathy voice quality: a comparison of psychophysical methods.
        J Voice. 2010; 24: 168-177
        • Patel S
        • Shrivastav R
        • Eddins DA.
        Developing a single comparison stimulus for matching breathy voice quality.
        J Speech Lang Hear Res. 2012; 55: 639-647
        • Patel S
        • Shrivastav R
        • Eddins DA.
        Identifying a comparison for matching rough voice quality.
        J Speech Lang Hear Res. 2012; 55: 1407-1422
        • Anand S
        • Kopf LM
        • Shrivastav R
        • et al.
        Objective indices of perceived vocal strain.
        J Voice. 2019; 33: 838-845
        • Shrivastav R.
        The use of an auditory model in predicting perceptual ratings of breathy voice quality.
        J Voice. 2003; 17: 502-512
        • Shrivastav R
        • Sapienza CM.
        Objective measures of breathy voice quality obtained using an auditory model.
        J Acoust Soc Am. 2003; 114: 2217-2224
        • Shrivastav R
        • Camacho A
        • Patel S
        • et al.
        A model for the prediction of breathiness in vowels.
        J Acoust Soc Am. 2011; 129: 1605-1615
        • Eddins DA
        • Anand S
        • Camacho A
        • et al.
        Modeling of breathy voice quality using pitch-strength estimates.
        J Voice. 2016; 30: 774.e1-774.e7
        • Park Y
        • Anand S
        • Ozmeral EJ
        • et al.
        Predicting perceived vocal roughness using a bio-inspired computational model of auditory temporal envelope processing.
        J Speech Lang Hear Res. 2022; : 1-11
        • Anand S
        • Skowronski MD
        • Shrivastav R
        • et al.
        Perceptual and quantitative assessment of dysphonia across vowel categories.
        J Voice. 2019; 33: 473-481
        • Watts CR.
        The effect of CAPE-V sentences on cepstral/spectral acoustic measures in dysphonic speakers.
        Folia Phoniatr Logop. 2015; 67: 15-20
      1. American National Standards Institute. (2010). Methods for manual pure-tone threshold audiometry.

        • Shrivastav R.
        • Skowronski M.D.
        • Anand S.
        • et al.
        Measurement of vocal breathiness perception with a matching task for sustained phonation and running speech.
        in: 5th joint meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, HI, December, 20162016
        • Anand S.
        • Eddins D.A.
        • Shrivastav R.
        Comparing roughness in sustained phonations and connected speech using a matching task.
        in: 174th Meeting of the Acoustical Society of America, New Orleans, LA, December, 20172017
      2. On the use of auditory models' elements to enhance a sawtooth waveform inspired pitch estimator on telephone-quality signals.
        in: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA). IEEE, 2012
        • Dau T
        • Kollmeier B
        • Kohlrausch A.
        Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers.
        J Acoust Soc Am. 1997; 102: 2892-2905
        • Wolfe V
        • Cornell R
        • Fitch J
        Sentence/vowel correlation in the evaluation of dysphonia.
        J Voice. 1995; 9: 297-303
        • Zraick RI
        • Wendel K
        • Smith-Olinde L
        The effect of speaking task on perceptual judgment of the severity of dysphonic voice.
        J Voice. 2005; 19: 574-581
        • Law T
        • Kim JH
        • Lee KY
        • et al.
        Comparison of rater's reliability on perceptual evaluation of different types of voice sample.
        J Voice. 2012; 26: 666.e13-666.e21
        • Lu FL
        • Matteson S.
        Speech tasks and interrater reliability in perceptual voice evaluation.
        J Voice. 2014; 28: 725-732
        • Maryn Y
        • Roy N.
        Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity.
        J Soc Bras Fonoaudiol. 2012; 24: 107-112
        • Kreiman J
        • Gerratt B.
        Sources of listener disagreement in voice quality assessment.
        J Acoust Soc Am. 2000; 108: 1867-1876
        • Murton O
        • Hillman R
        • Mehta D.
        Cepstral peak prominence values for clinical voice evaluation.
        Am J Speech Lang Pathol. 2020; 29: 1596-1607
        • Watts CR
        • Awan SN
        • Maryn Y.
        A comparison of cepstral peak prominence measures from two acoustic analysis programs.
        J Voice. 2017; 31: 387-3e1
        • Rubin AD
        • Jackson-Menaldi C
        • Kopf LM
        • et al.
        Comparison of pitch strength with perceptual and other acoustic metric outcome measures following medialization laryngoplasty.
        J Voice. 2019; 33: 795-800
        • Shrivastav R
        • Eddins DA
        • Anand S.
        Pitch strength of normal and dysphonic voices.
        J Acoust Soc Am. 2012; 131: 2261-2269
        • Awan SN
        • Roy N
        • Jetté ME
        • et al.
        Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: comparisons with auditory-perceptual judgements from the CAPE-V.
        Clin Linguist Phon. 2010; 24: 742-758
        • Watts CR
        • Awan SN.
        Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts.
        J Speech Lang Hear Res. 2011; 54: 1525-1537
        • da Silva Paz KE
        • de Almeida AAF
        • Almeida LNA
        • et al.
        Auditory perception of roughness and breathiness by dysphonic women.
        J Voice. 2022; (Available online 23 January 2022)