Relating Cepstral Peak Prominence to Cyclical Parameters of Vocal Fold Vibration from High-Speed Videoendoscopy Using Machine Learning: A Pilot Study

  • Peter S. Popolo
    Correspondence
    Address correspondence and reprint requests to Peter S. Popolo, Department of Communication Sciences and Disorders, Montclair State University, 1515 Broad Street, Building B, Bloomfield, NJ 07003.
    Affiliations
    Department of Communication Sciences and Disorders, Montclair State University, Montclair, New Jersey
    Search for articles by this author
  • Aaron M. Johnson
    Affiliations
    Department of Otolaryngology-Head and Neck Surgery, New York State University, New York, New York
    Search for articles by this author

      Summary

      Objective

      Smoothed cepstral peak prominence (CPPs) has been shown to be an effective indicator of breathiness (Hillenbrand and Houde, 1996). High-speed videoendoscopy (HSV) is frequently being used as a complement to stroboscopy especially when asymmetric or aperiodic vocal fold vibration is present in dysphonic voices. In an HSV image data set obtained with normal (nondisordered) voice subjects, we have observed that some degree of asymmetry is present in many of the vocal fold displacement curves extracted from the HSV exam videos; therefore, we have used this data set for a pilot study to investigate the relationship of CPPs to cyclical vocal fold vibration parameters, including left-right vocal fold (LVRF) phase asymmetry, in subjects with normal (nondisordered) voices.

      Methods

      Twenty subjects with normal (nondisordered) voices produced sustained vowel phonations while undergoing a transoral HSV examination of the vocal folds with synchronized recording of the voice signal. Glottal area waveform (GAW) and cyclical parameters open quotient (OQ), closed quotient (CQ), speed quotient (SQ), and LVRF skew were extracted from the HSV exam videos, and CPPs measures were obtained from acoustic analysis of the audio recordings. Correlations among the cyclical parameters and CPPs values were investigated using machine learning with the Regression Learner application in the MATLAB© Statistics and Machine Learning Toolbox (version 9.5.0.944444, R2018b, August 28, 2018, (c) 1984-2018, The MathWorks, Inc., Natick, MA).

      Results

      Because the sample size of the data set used for this study was small, and because there possibly was multicollinearity among the predictor variables used, the only meaningful result that was obtained with the data set of 20 normal subjects in the four predictor variables was the constant model (ie, the best prediction of CPPs was just the average value of the 20 observations), when the model validation feature of the app was turned on to protect against overfitting. In order to fully investigate the usefulness of the Regression Learner App, however, the validation feature was turned off and 48 more model types were investigated. While these were not necessarily indicative of the best regression model for the current data set, the results obtained in this manner nevertheless demonstrated the utility of the automated approach for finding a regression model for a larger data set to be collected in the future.

      Conclusion

      Further work is warranted to collect a data set from a larger sample size of disordered voice patients with breathy and/or rough voice. It is speculated that a correlation between CPPs and cyclical parameters of vocal fold vibration may be more evident with disordered voices, because there will be more asymmetry in LRVF displacement with an effect on the acoustic voice signal.

      Key Words

      To read this article in full you will need to make a payment
      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Behrman A
        Speech and Voice Science.
        Plural Publishing, Inc., San Diego, CA2018
        • Mehta DD
        • Zeitels SM
        • Burns JA
        • et al.
        High-speed videoendoscopic analysis of relationships between cepstral-based acoustic measures and voice production mechanisms in patients undergoing phonomicrosurgery.
        Ann Otol Rhinol Laryngol. 2012; 121: 341-347https://doi.org/10.1016/j.biotechadv.2011.08.021.Secreted
        • Mehta DD
        • Deliyski DD
        • Zeitels SM
        • et al.
        Voice production mechanisms following phonosurgical treatment of early glottic cancer.
        Ann Otol Rhinol Laryngol. 2010; 119: 1-9https://doi.org/10.1177/000348941011900101
        • Verdonck-De Leeuw IM
        • Festen JM
        • Mahieu HF
        Deviant vocal fold vibration as observed during videokymography: the effect on voice quality.
        J Voice. 2001; 15: 313-322https://doi.org/10.1016/S0892-1997(01)00033-9
        • Niimi S
        • Miyaji M
        Vocal fold vibration and voice quality. The 75th IALP Anniversary Göteborg Composium August 26-29, 1999.
        Folia Phoniatr Logop. 2000; 52: 32-38
        • Khosla S
        • Murugappan S
        • Gutmark E
        What can vortices tell us about vocal fold vibration and voice production.
        Curr Opin Otolaryngol Head Neck Surg. 2008; 16: 183-187https://doi.org/10.1097/MOO.0b013e3282ff5fc5
        • Khosla S
        • Murugappan S
        • Paniello R
        • et al.
        Role of vortices in voice production: normal versus asymmetric tension.
        Laryngoscope. 2009; 119: 216-221https://doi.org/10.1002/lary.20026
        • Hillenbrand J
        • Cleveland RA
        • Erickson RL
        Acoustic correlates of breathy vocal quality.
        J Speech Hear Res. 1994; 37: 769-778
        • Hillenbrand J
        • Houde RA
        Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech.
        J Speech, Lang Hear Res. 1996; 39: 311-321https://doi.org/10.1044/jshr.3902.311
        • Patel RR
        • Awan SN
        • Barkmeier-Kraemer J
        • et al.
        Recommended protocols for instrumental assessment of voice: American speech-language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function.
        Am J Speech-Language Pathol. 2018; 27: 887-905https://doi.org/10.1044/2018_AJSLP-17-0009
        • Haben CM
        • Kost K
        • Papagiannis G
        Lateral phase mucosal wave asymmetries in the clinical voice laboratory.
        J Voice. 2003; 17: 3-11https://doi.org/10.1016/S0892-1997(03)00032-8
        • Bonilha HS
        • Deliyski DD
        • Gerlach TT
        Phase asymmetries in normophonic speakers: visual judgments and objective findings.
        Am J Speech-Language Pathol. 2008; 17: 367-376https://doi.org/10.1044/1058-0360(2008/07-0059)
        • Ciaburro G
        Matlab for Machine Learning.
        Packt Publishing Ltd., Birmingham, UK2017
        • Kleinbaum DG
        • Kupper LL
        • Muller KE
        • et al.
        Applied Regression Analysis and Other Multivariable Methods.
        3rd ed. Duxbury Press, Pacific Grove, CA1998
        • Glantz SA
        • Slinker BK
        Primer of Applied Regression and Analysis of Variance.
        2nd ed. McGraw-Hill, New York, NY2001
      1. Frost J. Overfitting regression models: problems, detection, and avoidance. Statistics by Jim: Making Statistics Intuitive. Available at:https://statisticsbyjim.com/regression/overfitting-regression-models/#comment-4753. Accessed September 26, 2019.