Advertisement

Pathological Voice Detection Based on Phase Reconstitution and Convolutional Neural Network

Published:October 21, 2022DOI:https://doi.org/10.1016/j.jvoice.2022.08.028

      Summary

      The nonlinear dynamic features can effectively describe the acoustic characteristics of normal and pathological voice. In this paper, the phase space reconstruction and convolution neural network are used to classify the normal and pathological voice. The phase space information of normal and pathological voice is reconstructed using delay time and embedding dimension, the one-dimensional signal is converted to a two-dimensional matrix, and the reconstructed trajectory graph sample of the signal is generated. The trajectory graph samples are used as the input of the VGG-like convolutional neural network, and the graphical features are extracted to achieve a classification of normal and pathological voice. In order to overcome the lack of clinical data, a data enhancement scheme is used. The experiment which classifies the normal and pathological voice is carried out on three pathological databases respectively, i.e. the Massachusetts eye and ear infirmary (MEEI) database, Saarbrücken voice database (SVD) database, and a clinical database collected by the authors. Five-fold cross validation is used and the average recognition rates on the three databases are 99.42%, 97.30% and 95.88% respectively. The average recognition rates are 96.04% and 92.27% for normal, vocal fold paralysis and vocal fold non-paralysis voice in MEEI database and SVD database. The experimental results show that the method has high classification recognition rate and good robustness, and has certain universal applicability for the recognition of the normal and pathological voice.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Markaki M
        • Stylianou Y.
        Voice pathology detection and discrimination based on modulation spectral features[J].
        IEEE Trans Audio Speech Langu Process. 2011; 19: 1938-1948
        • Chen L L
        • Chen J J
        Deep neural network for automatic classification of pathological voice signals[J].
        J Voice. 2020; 36: 15-24
        • Boyanov B
        • Ivanov T
        • Hadjitodorov S
        • et al.
        Robust hybrid pitch detector[J].
        Electron Letters. 1993; 29: 1924-1926
        • Guido R C
        Enhancing teager energy operator based on a novel and appealing concept: Signal mass[J].
        J Franklin Inst. 2019; 356: 2346-2352
        • Fang S H
        • Tsao Y
        • Hsiao M J
        • et al.
        Detection of pathological voice using cepstrum vectors: a deep learning approach[J].
        J Voice. 2019; 33: 634-641
        • Muhammad G
        • Melhem M.
        Pathological voice detection and binary classification using MPEG-7 audio features[J].
        Biomed Signal Process Control. 2014; 11: 1-9
        • Ali Z
        • Elamvazuthi I
        • Alsulaiman M
        • et al.
        Detection of voice pathology using fractal dimension in a multiresolution analysis of normal and disordered speech signals[J].
        J Med Syst. 2016; 40: 20
        • Muhammad G
        • Alsulaiman M
        • Mahmood A
        • et al.
        Automatic voice disorder classification using vowel formants[C].
        in: 2011 IEEE International Conference on Multimedia and Expo. 2011: 1-6
        • Zhang T
        • Liu X N
        • Liu G J
        • et al.
        PVR-AFM: a pathological voice repair system based on non-linear structure[J].
        J Voice. 2021; 17: 1186-1195
        • Huang N
        • Yu Z
        • Calawerts W
        • et al.
        Optimized nonlinear dynamic analysis of pathologic voices with laryngeal paralysis based on the minimum embedding dimension[J].
        J Voice. 2017; 31 (e1-249.e7): 249
        • Guido R C
        Paraconsistent feature engineering [lecture notes] [J].
        IEEE Signal Process Mag. 2019; 36: 154-158
        • Abdel-hamid O
        • Mohamed A
        • Jiang H
        • et al.
        Convolutional neural networks for speech recognition[J].
        IEEE/ACM Trans Audio Speech Langu Process. 2014; 22: 1533-1545
        • Liu M
        • Li S
        • Shan S
        • et al.
        AU-inspired deep networks for facial expression feature learning[J].
        Neurocomputing. 2015; 159: 126-136
        • Zhu A
        • Wang G
        • Dong Y
        • et al.
        Detecting text in natural scene images with conditional clustering and convolution neural network[J].
        J Electron Imaging. 2015; 24: 053019
        • Harar P
        • Alonso-Hernandezy J B
        • Mekyska J
        • et al.
        Voice pathology detection using deep learning: a preliminary study[C].
        in: 2017 International Conference and Workshop on Bioinspired Intelligence. 2017: 1-4
        • Wu H
        • Soraghan J
        • Lowit A
        • et al.
        A deep learning method for pathological voice detection using convolutional deep belief network[C].
        Interspeech. 2018; : 446-450
        • Guan H
        • Lerch A.
        Learning strategies for voice disorder detection[C].
        in: IEEE 13th International Conference on Semantic Computing. 2019: 295-301
        • Thompson C
        • Mulpur A
        • Mehta V
        • et al.
        Transition to chaos in acoustically driven flows[J].
        J Acoust Soc Am. 1991; 90: 2097-2108
        • Takens F.
        Detecting strange attractors in turbulence [M]//Dynamical systems and turbulence, Warwick 1980.
        Springer, Berlin, Heidelberg1981
        • Kennel M B
        • Abarbanel H.D.
        False neighbors and false strands: a reliable minimum embedding dimension algorithm[J].
        Phys Rev E. 2002; 66: 026209
        • Kantz H
        • Schreiber T.
        Dimension estimates and physiological data[J].
        Chaos. 1995; 5: 143-154
        • Gan D Y
        Recognition and Study of Pathoogical Voice Based on Vowel /a/and /i/[D].
        Guangxi Normal University, Guilin, Guangxi2014
        • Neumaier A.
        Solving ill-conditioned and singular linear systems: a tutorial on regularization[J].
        Soc Indust Appl Mathemat. 1998; 40: 636-666
        • Ioffe S
        • Szegedy C.
        Batch normalization: accelerating deep network training by reducing internal covariate shift[C].
        in: Proceedings of the 32th International Conference on Machine Learning, Lille, France2015: 448-456
        • Kohavi R.
        A Study of cross-validation and bootstrap for accuracy estimation and model selection[C].
        in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Canada1995: 1137-1143
        • Kohavi R
        • Sommerfield D.
        Feature subset selection using the wrapper method: overfitting and dynamic search space topology[C].
        in: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, Canada1995: 192-197
        • Aho K
        • Derryberry D
        • Peterson T.
        Model selection for ecologists : the worldviews of AIC and BIC[J].
        Ecology. 2014; 95: 631-636
        • Massachusetts Eye and Ear Infirmary
        Voice Disorders Database, Version 1.03 (cd-rom) [J].
        Kay Elemetrics Corporation, Lincoln Park, NJ1994
        • Parsa V
        • Jamieson D G
        Identification of pathological voices using glottal noise measures[J].
        J Speech Langu Hear Res. 2000; 43: 469-485
        • Barry W J
        • Pützer M.
        Saarbrucken voice database. Institute of Phonet- ics.
        University of Saarland, Saarbrücken, Saarland2017 (Available at:)
        • Ullah I
        • Hussain M
        • Qazi E
        • et al.
        An automated system for epilepsy detection using EEG brain signals based on deep learning approach[J].
        Expert Syst Appl. 2018; 107: 61-71
        • Zhao B X
        • Hu W P
        Recognition of pathological voice based on entropy and support vector machine[J].
        Chinese J Biomed Enginee. 2013; 32: 546-552
        • Ali Z
        • Elamvazuthi I
        • Alsulaiman M
        • et al.
        Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model[J].
        J Voice. 2015; 30 (e7-757.e19): 757
        • Muhammad G
        • Alsulaiman M
        • Ali Z
        • et al.
        Voice pathology detection using interlaced derivative pattern on glottal source excitation[J].
        Biomed Signal Process Control. 2017; 31: 156-164
        • Al-Nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions[J].
        IEEE Access. 2017; 6: 6961-6974
        • Leonardo AFM
        • Kohler M
        • Vellasco MMBR
        • et al.
        Analysis and classification of voice pathologies using glottal signal parameters[J].
        J Voice. 2016; 30: 549-556
        • Dahmani M
        • Guerti M.
        Vocal folds pathologies classification using Naïve Bayes Networks[C].
        in: 2017 6th International Conference on Systems and Control (ICSC). 2017: 426-432