Advertisement

Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy

Published:September 23, 2022DOI:https://doi.org/10.1016/j.jvoice.2022.08.022

      Summary

      Objective

      Adductor spasmodic dysphonia (AdSD) is a neurogenic dystonia, which causes spasms of the laryngeal muscles. This disorder mainly affects production of connected speech. To understand how AdSD affects vocal fold (VF) movements and hence, the speech signal, it is necessary to study VF kinematics during the running speech. This paper introduces an automated method for analysis of VF vibrations in AdSD using laryngeal high-speed videoendoscopy (HSV) in running speech.

      Methods

      A monochrome HSV system was used to obtain video recordings from vocally normal individuals and AdSD patients during production of the six CAPE-V sentences and the “Rainbow Passage.” A deep neural network was designed based on the UNet architecture. The network was developed for glottal area segmentation in HSV data providing a tool for quantitative analysis of VF vibrations in both norm and AdSD. The network was trained and validated using the manually labeled HSV frames. After training the network, the segmentation quality was quantitatively evaluated against visual analysis results of a test dataset including segregated HSV frames and a short sequence of VF vibrations in consecutive frames.

      Results

      The developed convolutional network was successfully trained and demonstrated an accurate segmentation on the testing dataset with a mean Intersection over Union (IoU) of 0.81 and a mean Boundary-F1 score of 0.93. Moreover, the visual assessment of the automated technique showed an accurate detection of the glottal edges/area in the HSV data even with challenging image quality and excessive laryngeal maneuvers of AdSD patients during the running speech.

      Conclusion

      The introduced automated approach provides an accurate representation of the glottal edges/area during connected speech in HSV data for norm and AdSD patients. This method facilitates the development of HSV-based measures to quantify VF dynamics in AdSD. Using HSV to automatically analyze VF vibrations in AdSD can allow for understanding AdSD vocal mechanisms and characteristics.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Chetri DK
        • Merati AL
        • Blumin JH
        • et al.
        Reliability of the perceptual evaluation of adductor spasmodic dysphonia.
        An Otol Rhinol Laryngol. 2008; 117: 159-165
        • Roy N
        • Gouse M
        • Mauszycki SC
        • et al.
        Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia.
        Laryngoscope. 2005; 115: 311-316
        • Chhetri DK
        • Mendelsohn AH
        • Blumin JH
        • et al.
        Long-term follow-up results of selective laryngeal adductor denervation–reinnervation surgery for adductor spasmodic dysphonia.
        Laryngoscope. 2006; 116: 635-642
        • Roy N
        • Bless DM
        • Heisey D
        • et al.
        Manual circumlaryngeal therapy for functional dysphonia: an evaluation of short- and long-term treatment outcomes.
        J Voice. 1997; 11: 321-331
        • Roy N
        • Mazin A
        • Awan SN.
        Automated acoustic analysis of task dependency in adductor spasmodic dysphonia versus muscle tension dysphonia.
        Laryngoscope. 2014; 124: 718-724
        • Boutsen F
        • Cannito MP
        • Taylor M
        • et al.
        Botox treatment in adductor spasmodic dysphonia: a meta-analysis.
        J Sp Lang Hear Res. 2002; 45: 469-481
        • Sapienza CM
        • Walton S
        • Murry T.
        Adductor spasmodic dysphonia and muscular tension dysphonia: acoustic analysis of sustained phonation and reading.
        J Voice. 2000; 14: 502-520
        • Rees CJ
        • Blalock PD
        • Kemp SE
        • et al.
        Differentiation of adductor-type spasmodic dysphonia from muscle tension dysphonia by spectral analysis.
        Otolaryngol Head Neck Surg. 2007; 137: 576-581
        • Leonard R
        • Kendall K.
        Differentiation of spasmodic and psychogenic dysphonias with phonoscopic evaluation.
        Laryngoscope. 1999; 109: 295-300
        • Higgins MB
        • H CD
        • Shulte L.
        Phonatory air flow characteristics of adductor spasmodic dysphonia and muscle tension dysphonia.
        J Speech Lang Hear Res. 1999; 42: 101-111
        • Yousef A
        • Deliyski DD
        • Zacharias SR
        • et al.
        Detection of vocal fold image obstructions in high-speed videoendoscopy during connected speech in adductor spasmodic dysphonia: a convolutional neural networks approach.
        J Voice. 2022; (S0892-1997(22)00023-7)
        • Naghibolhosseini M
        • Heinz N
        • Brown C
        • et al.
        Glottal attack time and glottal offset time comparison between vocally normal speakers and patients with adductor spasmodic dysphonia during connected speech.
        in: 50th Anniversary Symposium: Care of the Professional Voice. 2021 (Philadelphia, PA, June 2-6, 2021)
        • Morrison MD
        • Rammage LA.
        Muscle misuse voice disorders: description and classification.
        Acta oto-laryngologica. 1993; 113: 428-434
        • Yiu E
        • Worrall L
        • Longland J
        • et al.
        Analysing vocal quality of connected speech using Kay's computerized speech lab: a preliminary finding.
        Clin Linguist & Phon. 2000; 14: 295-305
        • Halberstam B.
        Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.
        ORL. 2004; 66: 70-73
        • Maryn Y
        • Corthals P
        • Van Cauwenberge P
        • et al.
        Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels.
        J Voice. 2010; 24: 540-555
        • Lowell SY.
        The acoustic assessment of voice in continuous speech.
        SIG 3 Perspectives on Voice and Voice Disorders. 2012; 22: 57-63
        • Pietruszewska W
        • Just M
        • Morawska J
        • et al.
        Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study.
        Sci Rep. 2021; 11: 1-14
        • Patel R
        • Dailey S
        • Bless D.
        Comparison of high-speed digital imaging with stroboscopy for laryngeal imaging of glottal disorders.
        Ann. of Otol., Rhinol & Laryngol. 2008; 117: 413-424
        • Zacharias SRC
        • Myer CM
        • Meinzen-Derr J
        • et al.
        Comparison of videostroboscopy and high-speed videoendoscopy in evaluation of supraglottic phonation.
        Ann. of Otol., Rhinol & Laryngol. 2016; 125: 829-837
        • Deliyski DD.
        Laryngeal high-speed videoendoscopy.
        Laryngeal Evaluation: Indirect Laryngoscopy to High-speed Digital Imaging. Thieme Medical Publishers, New York2010: 243-270
        • Echternach M
        • Döllinger M
        • Sundberg J
        • et al.
        Vocal fold vibrations at high soprano fundamental frequencies.
        J Acoustical Soc Am. 2013; 133: EL82-EL87
        • Deliyski DD.
        Clinical feasibility of high-speed videoendoscopy.
        Perspectives on Voice and Voice Disorders. 2007; 17: 12-16
        • Deliyski DD
        • Petrushev PP
        • Bonilha HS
        • et al.
        Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution.
        Folia Phoniatr. et Logop. 2007; 60: 33-44
        • Deliyski DD
        • Hillman RE.
        State of the art laryngeal imaging: Research and clinical implications.
        Curr Opin Otolaryngol Head Neck Surg. 2010; 18: 147-152
        • Deliyski DD
        • Petrushev PP
        • Bonilha HS
        • et al.
        Clinical imple mentation of laryngeal high-speed videoendoscopy: Challenges and evolution.
        Folia Phoniatrica et Logopaedica. 2008; 60: 33-44
        • Woo P.
        Objective measures of stroboscopy and high speed video.
        Advances in Oto-Rhino-Laryngology. 2020; 85: 25-44
        • Deliyski DD
        • Powell ME
        • Zacharias SR
        • et al.
        Experimental investigation on minimum frame rate requirements of high-speed videoendoscopy for clinical voice assessment.
        Biomed. Signal. Process. and Control. 2015; 17: 51-59
        • Zañartu M
        • Mehta DD
        • Ho JC
        • et al.
        Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study.
        J Acoustical Soc Am. 2011; 129: 326-339
        • Mehta DD
        • Deliyski DD
        • Zeitels SM
        • et al.
        Integration of transnasal fiberoptic high-speed videoendoscopy with time-synchronized recordings of vocal function, innormal & abnormal vocal folds Kinematics: High speed digital phonoscopy (HSDP), optical coherence tomography (OCT) & narrow band imaging. 12. Pacific Voice & Speech Foundation, San Fransisco, CA2015: 105-114
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SR
        • et al.
        Temporal segmentation for laryngeal high-speed videoendoscopy in connected speech.
        J Voice. 2018; 32: 256.e1-256.e12
        • Yousef AM
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Spatial segmentation for laryngeal high-speed videoendoscopy in connected speech.
        J Voice. 2020; ([Epub Ahead of Print], Nov 27S0892-1997(20)30402-8)
        • Yousef AM
        • Deliyski DD
        • Zacharias SR
        • et al.
        A hybrid machine-learning-based method for analytic representation of the vocal fold edges during connected speech.
        Appl Sci. 2021; 11: 1179
        • Yousef AM
        • Deliyski DD
        • Zacharias SR
        • et al.
        Automated detection and segmentation of glottal area using deep-learning neural networks in high-speed videoendoscopy during connected speech.
        in: 14TH INTERNATIONAL CONFERENCE ADVANCES IN QUANTITATIVE LARYNGOLOGY, VOICE AND SPEECH RESEARCH (AQL). 2021 (Bogota, Colombia)
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SR
        • et al.
        A method for analysis of the vocal fold vibrations in connected speech using laryngeal imaging.
        in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications MAVEBA. Firenze University Press, Firenze, Italy2017
        • Yousef AM
        • Deliyski DD
        • Zacharias SR
        • et al.
        A deep learning approach for quantifying vocal fold dynamics during connected speech using laryngeal high-speed videoendoscopy.
        J Speech Lang Hear Res. 2022; 65: 2098-2113
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Studying vocal fold non-stationary behavior during connected speech using high-speed videoendoscopy.
        J Acoust Soc Am. 2018; 144: 1766
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Glottal attack time in connected speech.
        in: The 11th International Conference on Voice Physiology and Biomechanics ICVPB. 2018 (East Lansing, MI)
        • Brown C
        • Naghibolhosseini M
        • Zacharias SRC
        • et al.
        Investigation of high-speed videoendoscopy during connected speech in norm and neurogenic voice disorder.
        in: Michigan Speech-Language-Hearing Association (MSHA) Annual Conference. 2019 (East Lansing, MI)
        • Mehta DD
        • Deliyski DD
        • Quatieri TF
        • et al.
        Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.
        J Speech Lang Hear Res. 2011; 54: 47-54
        • Olthoff A
        • Woywod C
        • Kruse E.
        Stroboscopy versus high-speed glottography: a comparative study.
        Laryngo scope. 2007; 117: 1123-1126
        • Popolo PS.
        Investigation of flexible high-speed video nasolaryngoscopy.
        J Voice. 2018; 32: 529-537
        • Mehta DD
        • Deliyski DD
        • Zeitels SM
        • et al.
        Voice production mechanisms following phonosurgical treatment of early glottic cancer.
        Annal Otol Rhinol Laryngol. 2010; 119: 1-9
        • Larsson H
        • Hertegard S
        • Lindestad PA
        • et al.
        Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report.
        Laryngoscope. 2000; 110: 2117-2122
        • Lohscheller J
        • Toy H
        • Rosanowski F
        • et al.
        Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos.
        Med Image Analysis. 2007; 11: 400-413
        • Yan Y
        • Chen X
        • Bless D.
        Automatic tracing of vocal-fold motion from high-speed digital images.
        IEEE Trans Biomed Eng. 2006; 53: 1394-1400
        • Yan Y
        • Damrose E
        • Bless D.
        Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings.
        J Voice. 2007; 21: 604-616
        • Osma-Ruiz V
        • Godino-Llorente JI
        • Sáenz-Lechón N
        • et al.
        Segmentation of the glottal space from laryngeal images using the watershed transform.
        Computerized Med Imag Graph. 2008; 32: 193-201
        • Demeyer J
        • Dubuisson T
        • Gosselin B
        • et al.
        Glottis segmentation with a high-speed glottography: A fullyautomatic method.
        in: 3rd Adv. Voice Funct. Assess. Int. Workshop. 2009
        • Shi T
        • Kim HJ
        • Murry T
        • et al.
        Tracing vocal fold vibrations using level set segmentation method.
        Int J Numerical Methods Biomed Eng. 2015; 31: e02715
        • Karakozoglou S-Z
        • Henrich N
        • D'Alessandro C
        • et al.
        Automatic glottal segmentation using local-based active contours and application to glottovibrography.
        Speech Communication. 2012; 54: 641-654
        • Moukalled HJ
        • Deliyski DD
        • Schwarz RR
        • et al.
        Segmentation of laryngeal high-speed videoendoscopy in temporal domain using paired active contours.
        in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of VocaL Emissions for Biomedical Applications MAVEBA. Firenze University Press, Firenze, Italy2009
        • Manfredi C
        • Bocchi L
        • Bianchi S
        • et al.
        Objective vocal fold vibration assessment from videokymographic images.
        Biomedical Signal Processing and Control. 2006; 1: 129-136
        • Schenk F
        • Aichinger P
        • Roesner I,
        • et al.
        Automatic high-speed video glottis segmentation using salient regions and 3d geodesic active contours.
        Annals of the BMVA. 2015; 2015: 1-15
        • Hinton G.
        Deep learning — a technology with the potential to transform health care.
        J Am Med Assoc. 2018; 320: 1101
        • Fehling MK
        • Grosch F
        • Schuster ME
        • et al.
        Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.
        PLoS ONE. 2020; 15e0227791
        • Gómez P
        • Kist AM
        • Schlegel P
        • et al.
        BAGLS, a multihospital benchmark for automatic glottis segmentation.
        Scientific Data. 2020; 7: 186
        • Kist AM
        • Zilker J
        • Gómez P
        • et al.
        Rethinking glottal midline detection.
        Sci Rep. 2020; 10: 20723
        • Kist AM
        • Döllinger M.
        Efficient biomedical image segmentation on EdgeTPUs at point of care.
        IEEE Access. 2020; 8: 139356-139366
        • Kist A
        • Gómez P
        • Dubrovskiy D
        • et al.
        A deep learning enhanced novel software tool for laryngeal dynamics analysis.
        J Speech Lang Hear Res. 2021; 64: 1889-1903
        • Cannito M
        • Kondraske G.
        Rapid manual abilities in spasmodic dysphonic and normal female subjects.
        J Speech Hear Res. 1990; 33: 123-133
        • Roy N.
        Differential diagnosis of muscle tension dysphonia and spasmodic dysphonia.
        Curr Opin Otolaryngol Head Neck Surg. 2010; 18: 165-170
        • Ronneberger O
        • Fischer P
        • Brox T.
        U-Net: Convolutional networks for biomedical image segmentation.
        International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, Cham, Munich, Germany2015: 234-241
        • Kingma DP
        • Ba J.
        Adam: a method for stochastic optimization.
        arXiv preprint. 2014; (arXiv: 1412.6980)