Research Article| Volume 33, ISSUE 5, P634-641, September 2019

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach



      Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms.


      This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms.


      The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms.


      By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Titze I.R.
        Workshop on acoustic voice analysis: Summary statement.
        (National Center for Voice and Speech)1995
        • Stemple J.C.
        • Roy N.
        • Klaben B.K.
        Clinical Voice Pathology Theory and Management.
        Plural Publishing, San Diego2014
        • Schwartz S.R.
        • Cohen S.M.
        • Dailey S.H.
        • et al.
        Clinical practice guideline: hoarseness (dysphonia).
        Otolaryngol Head Neck Surg. 2009; 141: S1-S31
        • Vaziri G.
        • Almasganj F.
        • Behroozmand R.
        Pathological assessment of patients' speech signals using nonlinear dynamical analysis.
        Comput Biol Med. 2010; 40: 54-63
        • Elemetrics K.
        Disordered voice database.
        • Umapathy K.
        • Krishnan S.
        • Parsa V.
        • et al.
        Discrimination of pathological voices using a time-frequency approach.
        IEEE Trans Biomed Eng. 2005; 52: 421-430
        • Godino-Llorente J.I.
        • Gomez-Vilda P.
        • Blanco-Velasco M.
        Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters.
        IEEE Trans Biomed Eng. 2006; 53: 1943-1953
        • Costa S.C.
        • Neto B.G.A.
        • Fechine J.M.
        Pathological voice discrimination using cepstral analysis, vector quantization and hidden Markov models.
        in: IEEE International Conference on Bioinformatics and Bioengineering. 2008: 1-5 (Athens, Greece)
        • Salhi L.
        • Mourad T.
        • Cherif A.
        Voice disorders identification using multilayer neural network.
        Int Arab J Inf Technol. 2010; 7: 177-185
        • Fraile R.
        • Saenz-Lechon N.
        • Godino-Llorente J.I.
        • et al.
        Automatic detection of laryngeal pathologies in records of sustained vowels by means of Mel-frequency cepstral coefficient parameters and differentiation of patients by sex.
        Folia Phoniatr Logop. 2009; 61: 146-152
        • Arias-Londono J.D.
        • Godino-Llorente J.I.
        • Saenz-Lechon N.
        • et al.
        Automatic detection of pathological voices using complexity measures, noise parameters, and Mel-cepstral coefficients.
        IEEE Trans Biomed Eng. 2011; 58: 370-379
        • Arias-Londono J.D.
        • Godino-Llorente J.I.
        • Markaki M.
        • et al.
        On combining information from modulation spectra and Mel-frequency cepstral coefficients for automatic detection of pathological voices.
        Logoped Phoniatr Vocol. 2011; 36: 60-69
        • Markaki M.
        • Stylianou Y.
        Voice pathology detection and discrimination based on modulation spectral features.
        IEEE Trans Audio, Speech, Language Proc. 2011; 19: 1938-1948
        • Muhammad G.
        • Mesallam T.A.
        • Malki K.H.
        • et al.
        Multidirectional regression (MDR)-based features for automatic voice disorder detection.
        J Voice. 2012; 26: e819-e827
        • Arjmandi M.K.
        • Pooyan M.
        An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine.
        Biomed Signal Process Control. 2012; 7: 3-19
        • Hinton G.
        • Deng L.
        • Yu D.
        • et al.
        Deep neural networks for acoustic modeling in speech recognition.
        IEEE Signal Proc Mag. 2012; 29: 82-97
        • Silver D.
        • Huang A.
        • Maddison C.J.
        • et al.
        Mastering the game of Go with deep neural networks and tree search.
        Nature. 2016; 529: 484-489
        • Fang S.H.
        • Fei Y.X.
        • Xu Z.Z.
        • et al.
        Learning transportation modes from smartphone sensors based on deep neural network.
        IEEE Sens J. 2017; 17: 6111-6118
        • Li B.
        • Tsao Y.
        • Sim K.C.
        An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.
        (INTERSPEECH)2013: 3002-3006
        • Tawalbeh L.A.
        • Mehmood R.
        • Benkhlifa E.
        • et al.
        Mobile cloud computing model and big data analysis for healthcare applications.
        IEEE Access. 2016; 4: 6171-6180
        • Sahoo P.K.
        • Mohapatra S.K.
        • Wu S.L.
        Analyzing healthcare big data with prediction for future health condition.
        IEEE Access. 2017; 99: 1
        • Ma Y.
        • Wang Y.
        • Yang J.
        • et al.
        Big health application system based on health internet of things and big data.
        IEEE Access. 2016; PP: 1
        • Fu S.
        • Theodoros D.G.
        • Ward E.C.
        Delivery of intensive voice therapy for vocal fold nodules via telepractice: a pilot feasibility and efficacy study.
        J Voice. 2015; 29: 696-706
        • Davis S.
        • Mermelstein P.
        Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.
        IEEE Trans Acoust. 1980; 28: 357-366
        • Hamawaki S.
        • Funasawa S.
        • Katto J.
        • et al.
        Feature Analysis and Normalization Approach for Robust Content-Based Music Retrieval to Encoded Audio with Different Bit Rates.
        in: Huet B. Smeaton A. Mayer-Patel K. Advances in Multimedia Modeling: 15th International Multimedia Modeling Conference, MMM 2009, Sophia-Antipolis, France, January 7–9, 2009. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg2009: 298-309
        • Boril H.
        • Hansen J.H.L.
        Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments.
        IEEE Trans Audio, Speech Lang Proc. 2010; 18: 1379-1393
        • Zhang D.
        • Gatica-Perez D.
        • Bengio S.
        • et al.
        Semisupervised adapted HMMS for unusual event detection.
        IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2005; 1: 611-618
        • Chan C.P.
        • Wong Y.W.
        • Tan L.
        • et al.
        Two-dimensional multi-resolution analysis of speech signals and its application to speech recognition.
        in: IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, Vol. 401. ICASSP99 (Cat. No.99CH36258). Phoenix, AZ, USA1999: 405-408
        • Dahmani M.
        • Guerti M.
        Vocal folds pathologies classification using Naïve Bayes Networks Systems and Control (ICSC).
        2017: 426-432
        • Lu X.
        • Tsao Y.
        • Matsuda S.
        • et al.
        Speech enhancement based on deep denoising autoencoder.
        2013: 436-440
        • Godino-Llorente J.I.
        • Gomez-Vilda P.
        Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors.
        IEEE Trans Biomed Eng. 2004; 51: 380-384
        • Li J.
        • Deng L.
        • Gong Y.
        • et al.
        An overview of noise-robust automatic speech recognition.
        IEEE/ACM Trans Audio, Speech Lang Proc. 2014; 22: 745-777