Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

Published:November 27, 2020DOI:



      This study proposes a new computational framework for automated spatial segmentation of the vocal fold edges in high-speed videoendoscopy (HSV) data during connected speech. This spatio-temporal analytic representation of the vocal folds enables the HSV-based measurement of the glottal area waveform and other vibratory characteristics in the context of running speech.


      HSV data were obtained from a vocally normal adult during production of the “Rainbow Passage.” An algorithm based on an active contour modeling approach was developed for the analysis of HSV data. The algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds to detect the edges of the vibrating vocal folds across the frames. This edge detection method follows a set of deformation rules for the active contours to capture the edges of the vocal folds through an energy optimization procedure. The detected edges in the kymograms were then registered back to the HSV frames. Subsequently, the glottal area waveform was calculated based on the area of the glottis enclosed by the vocal fold edges in each frame.


      The developed algorithm successfully captured the edges of the vocal folds in the HSV kymograms. This method led to an automated measurement of the glottal area waveform from the HSV frames during vocalizations in connected speech.


      The proposed algorithm serves as an automated method for spatial segmentation of the vocal folds in HSV data in connected speech. This study is one of the initial steps toward developing HSV-based measures to study vocal fold vibratory characteristics and voice production mechanisms in norm and disorder in the context of connected speech.

      Key words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Mafee MF
        • Valvassori GE
        • Becker M
        Imaging of the Neck and Head.
        2nd ed. Thieme, Stuttgart2005
        • Uloza V
        • Saferis V
        • Uloziene I
        Perceptual and acoustic assessment of voice pathology and the efficacy of endolaryngeal phonomicrosurgery.
        J Voice. 2005; 19: 138-145
        • Verikas A
        • Uloza V
        • Bacauskiene M
        • et al.
        Advances in laryngeal imaging.
        Eur Arch Otorhinolaryngol. 2009; 266: 1509-1520
        • Slonimsky E
        Laryngeal imaging.
        Operat Techn Otolaryngol Head Neck Surg. 2019; 30: 237-242
        • Kitzing P
        Stroboscopy–a pertinent laryngological examination.
        J Otolaryngol. 1985; 14: 151-157
        • Bless DM
        • Hirano M
        • Feder RJ
        Videostroboscopic evaluation of the larynx.
        Ear Nose Throat J. 1987; 66: 289-296
        • Woo P
        • Casper J
        • Colton R
        • et al.
        Aerodynamic and stroboscopic findings before and after microlaryngeal phonosurgery.
        J Voice. 1994; 8: 186-194
        • Stemple JC
        • Roy N
        • Klaben BG
        Clinical Voice Pathology: Theory and Management.
        Plural Publishing, San Diego, CA2000
        • Stojadinovic A
        • Shaha AR
        • Orlikoff RF
        • et al.
        Prospective functional voice assessment in patients undergoing thyroid surgery.
        Ann Surg. 2002; 236: 823-832
        • Mehta DD
        • Hillman RE
        Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods.
        Curr Opin Otol Head Neck Surg. 2008; 16: 211-215
        • Aronson AE
        • Bless D
        Clinical Voice Disorders.
        Thieme, New York, NY2011
        • Patel R
        • Dailey S
        • Bless D
        Comparison of high-speed digital imaging with stroboscopy for laryngeal imaging of glottal disorders.
        Ann Otol Rhinol Laryngol. 2008; 117: 413-424
        • Zacharias SRC
        • Myer CM
        • Meinzen-Derr J
        • et al.
        Comparison of videostroboscopy and high-speed videoendoscopy in evaluation of supraglottic phonation.
        Ann Otol Rhinol Laryngol. 2016; 125: 829-837
        • Deliyski DD
        Laryngeal high-speed videoendoscopy.
        Laryngeal Evaluation: Indirect Laryngoscopy to High-Speed Digital Imaging. Thieme Medical Publishers, New York2010: 243-270
        • Deliyski DD
        Clinical feasibility of high-speed videoendoscopy.
        Perspectives on Voice and Voice Disorders. 2007; 17: 12-16
        • Deliyski DD
        • Petrushev PP
        • Bonilha HS
        • et al.
        Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution.
        Folia Phoniatr et Logop. 2007; 60: 33-44
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SR
        • et al.
        Temporal segmentation for laryngeal high-speed videoendoscopy in connected speech.
        J Voice. 2018; 32: 256.e1-256.e12
        • Zañartu M
        • Mehta DD
        • Ho JC
        • et al.
        Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study.
        J Acoust Soc Am. 2011; 129: 326-339
        • Mehta DD
        • Deliyski DD
        • Zeitels SM
        • et al.
        Integration of Transnasal Fiberoptic High-Speed Videoendoscopy With Time-Synchronized Recordings of Vocal Function, Innormal & Abnormal Vocal Folds Kinematics: High Speed Digital Phonoscopy (HSDP), Optical Coherence Tomography (OCT) & Narrow Band Imaging. 12. Pacific Voice & Speech Foundation, San Fransisco, CA2015: 105-114
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SR
        • et al.
        A method for analysis of the vocal fold vibrations in connected speech using laryngeal imaging.
        in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications MAVEBA, Firenze, Italy Firenze University Press, 2017
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Studying vocal fold non-stationary behavior during connected speech using high-speed videoendoscopy.
        J Acoust Soc Am. 2018; 144 (1766-1766)
        • Morrison MD
        • Rammage LA
        Muscle misuse voice disorders: description and classification.
        Acta oto-laryngologica. 1993; 113: 428-434
        • Yiu E
        • Worrall L
        • Longland J
        • et al.
        Analysing vocal quality of connected speech using Kay's computerized speech lab: a preliminary finding.
        Clin Linguist Phon. 2000; 14: 295-305
        • Halberstam B
        Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.
        J Oto-Rhino-Laryngol Relat Spec. 2004; 66: 70-73
        • Roy N
        • Gouse M
        • Mauszycki SC
        • et al.
        Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia.
        Laryngoscope. 2005; 115: 311-316
        • Maryn Y
        • Corthals P
        • Van Cauwenberge P
        • et al.
        Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels.
        J Voice. 2010; 24: 540-555
        • Lowell SY
        The acoustic assessment of voice in continuous speech.
        SIG 3 Perspect Voice Voice Disord. 2012; 22: 57-63
        • Brown C
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Glottal attack and offset time during connected speech in adductor spasmodic dysphonia.
        in: Virtual Voice Symposium: Care of the Professional Voice, Philadelphia2020
        • Naghibolhosseini M
        • Deliyski DD
        • Zacharias SRC
        • et al.
        Glottal attack time in connected speech.
        in: The 11th International Conference on Voice Physiology and Biomechanics ICVPB, East Lansing, MI2018
        • Brown C
        • Naghibolhosseini M
        • Zacharias SRC
        • et al.
        Investigation of high-speed videoendoscopy during connected speech in norm and neurogenic voice disorder.
        in: Michigan Speech-Language-Hearing Association (MSHA) Annual Conference, East Lansing, MI2019
        • Koç T
        • Çiloğlu T
        Automatic segmentation of high speed video images of vocal folds.
        J Appl Math. 2014; 2014: 1-16
        • Lohscheller J
        • Toy H
        • Rosanowski F
        • et al.
        Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos.
        Med Image Anal. 2007; 11: 400-413
        • Mehta DD
        • Deliyski DD
        • Quatieri TF
        • et al.
        Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.
        J Speech Lang Hear Res. 2011; 54: 47-54
        • Karakozoglou S-Z
        • Henrich N
        • D'Alessandro C
        • et al.
        Automatic glottal segmentation using local-based active contours and application to glottovibrography.
        Speech Commun. 2012; 54: 641-654
        • Moukalled HJ
        • Deliyski DD
        • Schwarz RR
        • et al.
        Segmentation of laryngeal high-speed videoendoscopy in temporal domain using paired active contours.
        in: Manfredi C Proceedings of the 10th International Workshop on Models and Analysis of VocaL Emissions for Biomedical Applications MAVEBA, Firenze, Italy Firenze University Press, 2009
        • Yan Y
        • Chen X
        • Bless D
        Automatic tracing of vocal-fold motion from high-speed digital images.
        IEEE Transact Biomed Eng. 2006; 53: 1394-1400
        • Yan Y
        • Damrose E
        • Bless D
        Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings.
        J Voice. 2007; 21: 604-616
        • Mehta DD
        • Deliyski DD
        • Zeitels SM
        • et al.
        Voice production mechanisms following phonosurgical treatment of early glottic cancer.
        Ann Otol Rhinol Laryngol. 2010; 119: 1-9
        • Demeyer J
        • Dubuisson T
        • Gosselin B
        • et al.
        Glottis segmentation with a high-speed glottography: a fullyautomatic method.
        in: 3rd Adv. Voice Funct. Assess. Int. Workshop. 2009
        • Yan Y
        • Du G
        • Zhu C
        • et al.
        Snake based automatic tracing of vocal-fold motion from high-speed digital images.
        in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12). 2012
        • Zhang Y
        • Bieging E
        • Tsui H
        • et al.
        Efficient and effective extraction of vocal fold vibratory patterns from high-speed digital imaging.
        J Voice. 2010; 24: 21-29
        • Zhou S
        • Wang J
        • Zhang S
        • et al.
        Active contour model based on local and global intensity information for medical image segmentation.
        Neurocomputing. 2016; 186: 107-118
        • Sulong G
        • Abdulaali H
        • Hassan S
        Edge detection algorithms vs-active contour for sketch matching: comparative study.
        Res J Appl Sci Eng Technol. 2015; 11: 759-764
        • Kass M
        • Witkin A
        • Terzopoulos D
        Active contour models.
        Int J Comput Vision. 1988; 1: 321-331
        • Manfredi C
        • Bocchi L
        • Bianchi S
        • et al.
        Objective vocal fold vibration assessment from videokymographic images.
        Biomed Signal Process Control. 2006; 1: 129-136
        • Deliyski DD
        • Powell ME
        • Zacharias SR
        • et al.
        Experimental investigation on minimum frame rate requirements of high-speed videoendoscopy for clinical voice assessment.
        Biomed Signal Process Control. 2015; 17: 21-28
        • Deliyski DD
        Endoscope motion compensation for laryngeal high-speed videoendoscopy.
        J Voice. 2005; 19: 485-496
        • Deliyski DD
        • Petrushev P
        Methods for objective assessment of high-speed videoendoscopy.
        Proc Adv Quant Laryngol. 2003; : 1-16
        • Amini A
        • Weymouth T
        • Jain R
        Using dynamic programming for solving variational problems in vision.
        IEEE Transact Pattern Analysis Mach Intellig. 1990; 12: 855-867
        • Schenk Fabian
        • Aichinger Philipp
        • Roesner Imme
        • Urschler Martin
        Automatic high-speed video glottis segmentation using salient regions and 3D geodesic active contours.
        Ann Br Mach Vision Assoc. 2015; 2015: 1-15
        • Shi Tailong
        • Kim Hyun June
        • Murry Thomas
        • Woo Peak
        • Yan Yuling
        Tracing vocal fold vibrations using level set segmentation method.
        Int J Numer Methods Biomed Eng. 2015; 31: e02715