Summary
Objective
Adductor spasmodic dysphonia (AdSD) is a neurogenic dystonia, which causes spasms
of the laryngeal muscles. This disorder mainly affects production of connected speech.
To understand how AdSD affects vocal fold (VF) movements and hence, the speech signal,
it is necessary to study VF kinematics during the running speech. This paper introduces
an automated method for analysis of VF vibrations in AdSD using laryngeal high-speed
videoendoscopy (HSV) in running speech.
Methods
A monochrome HSV system was used to obtain video recordings from vocally normal individuals
and AdSD patients during production of the six CAPE-V sentences and the “Rainbow Passage.”
A deep neural network was designed based on the UNet architecture. The network was
developed for glottal area segmentation in HSV data providing a tool for quantitative
analysis of VF vibrations in both norm and AdSD. The network was trained and validated
using the manually labeled HSV frames. After training the network, the segmentation
quality was quantitatively evaluated against visual analysis results of a test dataset
including segregated HSV frames and a short sequence of VF vibrations in consecutive
frames.
Results
The developed convolutional network was successfully trained and demonstrated an accurate
segmentation on the testing dataset with a mean Intersection over Union (IoU) of 0.81
and a mean Boundary-F1 score of 0.93. Moreover, the visual assessment of the automated technique showed
an accurate detection of the glottal edges/area in the HSV data even with challenging
image quality and excessive laryngeal maneuvers of AdSD patients during the running
speech.
Conclusion
The introduced automated approach provides an accurate representation of the glottal
edges/area during connected speech in HSV data for norm and AdSD patients. This method
facilitates the development of HSV-based measures to quantify VF dynamics in AdSD.
Using HSV to automatically analyze VF vibrations in AdSD can allow for understanding
AdSD vocal mechanisms and characteristics.
Key Words
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Journal of VoiceAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Reliability of the perceptual evaluation of adductor spasmodic dysphonia.An Otol Rhinol Laryngol. 2008; 117: 159-165
- Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia.Laryngoscope. 2005; 115: 311-316
- Long-term follow-up results of selective laryngeal adductor denervation–reinnervation surgery for adductor spasmodic dysphonia.Laryngoscope. 2006; 116: 635-642
- Manual circumlaryngeal therapy for functional dysphonia: an evaluation of short- and long-term treatment outcomes.J Voice. 1997; 11: 321-331
- Automated acoustic analysis of task dependency in adductor spasmodic dysphonia versus muscle tension dysphonia.Laryngoscope. 2014; 124: 718-724
- Botox treatment in adductor spasmodic dysphonia: a meta-analysis.J Sp Lang Hear Res. 2002; 45: 469-481
- Adductor spasmodic dysphonia and muscular tension dysphonia: acoustic analysis of sustained phonation and reading.J Voice. 2000; 14: 502-520
- Differentiation of adductor-type spasmodic dysphonia from muscle tension dysphonia by spectral analysis.Otolaryngol Head Neck Surg. 2007; 137: 576-581
- Differentiation of spasmodic and psychogenic dysphonias with phonoscopic evaluation.Laryngoscope. 1999; 109: 295-300
- Phonatory air flow characteristics of adductor spasmodic dysphonia and muscle tension dysphonia.J Speech Lang Hear Res. 1999; 42: 101-111
- Detection of vocal fold image obstructions in high-speed videoendoscopy during connected speech in adductor spasmodic dysphonia: a convolutional neural networks approach.J Voice. 2022; (S0892-1997(22)00023-7)
- Glottal attack time and glottal offset time comparison between vocally normal speakers and patients with adductor spasmodic dysphonia during connected speech.in: 50th Anniversary Symposium: Care of the Professional Voice. 2021 (Philadelphia, PA, June 2-6, 2021)
- Muscle misuse voice disorders: description and classification.Acta oto-laryngologica. 1993; 113: 428-434
- Analysing vocal quality of connected speech using Kay's computerized speech lab: a preliminary finding.Clin Linguist & Phon. 2000; 14: 295-305
- Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.ORL. 2004; 66: 70-73
- Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels.J Voice. 2010; 24: 540-555
- The acoustic assessment of voice in continuous speech.SIG 3 Perspectives on Voice and Voice Disorders. 2012; 22: 57-63
- Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study.Sci Rep. 2021; 11: 1-14
- Comparison of high-speed digital imaging with stroboscopy for laryngeal imaging of glottal disorders.Ann. of Otol., Rhinol & Laryngol. 2008; 117: 413-424
- Comparison of videostroboscopy and high-speed videoendoscopy in evaluation of supraglottic phonation.Ann. of Otol., Rhinol & Laryngol. 2016; 125: 829-837
- Laryngeal high-speed videoendoscopy.Laryngeal Evaluation: Indirect Laryngoscopy to High-speed Digital Imaging. Thieme Medical Publishers, New York2010: 243-270
- Vocal fold vibrations at high soprano fundamental frequencies.J Acoustical Soc Am. 2013; 133: EL82-EL87
- Clinical feasibility of high-speed videoendoscopy.Perspectives on Voice and Voice Disorders. 2007; 17: 12-16
- Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution.Folia Phoniatr. et Logop. 2007; 60: 33-44
- State of the art laryngeal imaging: Research and clinical implications.Curr Opin Otolaryngol Head Neck Surg. 2010; 18: 147-152
- Clinical imple mentation of laryngeal high-speed videoendoscopy: Challenges and evolution.Folia Phoniatrica et Logopaedica. 2008; 60: 33-44
- Objective measures of stroboscopy and high speed video.Advances in Oto-Rhino-Laryngology. 2020; 85: 25-44
- Experimental investigation on minimum frame rate requirements of high-speed videoendoscopy for clinical voice assessment.Biomed. Signal. Process. and Control. 2015; 17: 51-59
- Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study.J Acoustical Soc Am. 2011; 129: 326-339
- Integration of transnasal fiberoptic high-speed videoendoscopy with time-synchronized recordings of vocal function, innormal & abnormal vocal folds Kinematics: High speed digital phonoscopy (HSDP), optical coherence tomography (OCT) & narrow band imaging. 12. Pacific Voice & Speech Foundation, San Fransisco, CA2015: 105-114
- Temporal segmentation for laryngeal high-speed videoendoscopy in connected speech.J Voice. 2018; 32: 256.e1-256.e12
- Spatial segmentation for laryngeal high-speed videoendoscopy in connected speech.J Voice. 2020; ([Epub Ahead of Print], Nov 27S0892-1997(20)30402-8)
- A hybrid machine-learning-based method for analytic representation of the vocal fold edges during connected speech.Appl Sci. 2021; 11: 1179
- Automated detection and segmentation of glottal area using deep-learning neural networks in high-speed videoendoscopy during connected speech.in: 14TH INTERNATIONAL CONFERENCE ADVANCES IN QUANTITATIVE LARYNGOLOGY, VOICE AND SPEECH RESEARCH (AQL). 2021 (Bogota, Colombia)
- A method for analysis of the vocal fold vibrations in connected speech using laryngeal imaging.in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications MAVEBA. Firenze University Press, Firenze, Italy2017
- A deep learning approach for quantifying vocal fold dynamics during connected speech using laryngeal high-speed videoendoscopy.J Speech Lang Hear Res. 2022; 65: 2098-2113
- Studying vocal fold non-stationary behavior during connected speech using high-speed videoendoscopy.J Acoust Soc Am. 2018; 144: 1766
- Glottal attack time in connected speech.in: The 11th International Conference on Voice Physiology and Biomechanics ICVPB. 2018 (East Lansing, MI)
- Investigation of high-speed videoendoscopy during connected speech in norm and neurogenic voice disorder.in: Michigan Speech-Language-Hearing Association (MSHA) Annual Conference. 2019 (East Lansing, MI)
- Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.J Speech Lang Hear Res. 2011; 54: 47-54
- Stroboscopy versus high-speed glottography: a comparative study.Laryngo scope. 2007; 117: 1123-1126
- Investigation of flexible high-speed video nasolaryngoscopy.J Voice. 2018; 32: 529-537
- Voice production mechanisms following phonosurgical treatment of early glottic cancer.Annal Otol Rhinol Laryngol. 2010; 119: 1-9
- Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report.Laryngoscope. 2000; 110: 2117-2122
- Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos.Med Image Analysis. 2007; 11: 400-413
- Automatic tracing of vocal-fold motion from high-speed digital images.IEEE Trans Biomed Eng. 2006; 53: 1394-1400
- Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings.J Voice. 2007; 21: 604-616
- Segmentation of the glottal space from laryngeal images using the watershed transform.Computerized Med Imag Graph. 2008; 32: 193-201
- Glottis segmentation with a high-speed glottography: A fullyautomatic method.in: 3rd Adv. Voice Funct. Assess. Int. Workshop. 2009
- Tracing vocal fold vibrations using level set segmentation method.Int J Numerical Methods Biomed Eng. 2015; 31: e02715
- Automatic glottal segmentation using local-based active contours and application to glottovibrography.Speech Communication. 2012; 54: 641-654
- Segmentation of laryngeal high-speed videoendoscopy in temporal domain using paired active contours.in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of VocaL Emissions for Biomedical Applications MAVEBA. Firenze University Press, Firenze, Italy2009
- Objective vocal fold vibration assessment from videokymographic images.Biomedical Signal Processing and Control. 2006; 1: 129-136
- Automatic high-speed video glottis segmentation using salient regions and 3d geodesic active contours.Annals of the BMVA. 2015; 2015: 1-15
- Deep learning — a technology with the potential to transform health care.J Am Med Assoc. 2018; 320: 1101
- Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.PLoS ONE. 2020; 15e0227791
- BAGLS, a multihospital benchmark for automatic glottis segmentation.Scientific Data. 2020; 7: 186
- Rethinking glottal midline detection.Sci Rep. 2020; 10: 20723
- Efficient biomedical image segmentation on EdgeTPUs at point of care.IEEE Access. 2020; 8: 139356-139366
- A deep learning enhanced novel software tool for laryngeal dynamics analysis.J Speech Lang Hear Res. 2021; 64: 1889-1903
- Rapid manual abilities in spasmodic dysphonic and normal female subjects.J Speech Hear Res. 1990; 33: 123-133
- Differential diagnosis of muscle tension dysphonia and spasmodic dysphonia.Curr Opin Otolaryngol Head Neck Surg. 2010; 18: 165-170
- U-Net: Convolutional networks for biomedical image segmentation.International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, Cham, Munich, Germany2015: 234-241
- Adam: a method for stochastic optimization.arXiv preprint. 2014; (arXiv: 1412.6980)
Article info
Publication history
Published online: September 23, 2022
Accepted:
August 17,
2022
Publication stage
In Press Corrected ProofIdentification
Copyright
© 2022 The Voice Foundation. Published by Elsevier Inc. All rights reserved.