Advertisement

Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices

  • Danilo Rangel Arruda Leite
    Affiliations
    Department of Statistics, Graduate Program in Health Decision Models, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil

    Brazilian Hospital Services Company- Ebserh, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil
    Search for articles by this author
  • Ronei Marcos de Moraes
    Affiliations
    Department of Statistics, Graduate Program in Health Decision Models, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil

    Department of Statistics, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil
    Search for articles by this author
  • Leonardo Wanderley Lopes
    Correspondence
    Address correspondence and reprint requests to Leonardo Wanderley Lopes. Department of Speech-Language and Hearing Sciences, Health Sciences Center, University City-Campus I, Bairro Castelo Branco, João Pessoa (PB), Brazil, CEP: 58051-900.
    Affiliations
    Department of Statistics, Graduate Program in Health Decision Models, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil

    Department of Speech-Language and Hearing Sciences, Graduate Program in Linguistics, Universidade Federal da Paraíba – UFPB, João Pessoa, Paraíba, Brasil
    Search for articles by this author
Published:December 10, 2022DOI:https://doi.org/10.1016/j.jvoice.2022.11.001

      Summary

      Objective

      To analyze the performance of 10 different machine learning (ML) classifiers for discrimination between dysphonic and non-dysphonic voices, using a variance threshold as a method for the selection and reduction of acoustic measurements used in the classifier.

      Method

      We analyzed 435 samples of individuals (337 female and 98 male), with a mean age of 41.07 ± 13.73 years, of which 384 were dysphonic and 51 were non-dysphonic. From the sustained /ε/ vowel sample, 34 acoustic measurements were extracted, including traditional perturbation and noise measurements, cepstral/spectral measurements, and measurements based on nonlinear models. The variance method was used to select the best set of acoustic measurements. We tested the performance of the best-selected set with 10 ML classifiers using precision, sensitivity, specificity, accuracy, and F1-Score measurements. The kappa coefficient was used to verify the reproducibility between the two datasets (training and testing).

      Results

      The naive Bayes (NB) and stochastic gradient descent classifier (SGDC) models performed best in terms of accuracy, AUC, sensitivity, and specificity for a reduced dataset of 15 acoustic measures compared to the full dataset of 34 acoustic measures. SGDC and NB obtained the best performance results, with an accuracy of 0.91 and 0.76, respectively. These two classifiers presented moderate agreement, with a Kappa of 0.57 (SGDC) and 0.45 (NB).

      Conclusion

      Among the tested models, the NB and SGDC models performed better in discriminating between dysphonic and non-dysphonic voices from a set of 15 acoustic measures.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      REFERENCES

        • Verdolini K
        • Ramig LO.
        Review: occupational risks for voice problems.
        Logoped Phoniatr Vocol. 2001; 26: 37-46
        • Patel RR
        • Awan SN
        • Barkmeier-Kraemer J
        • et al.
        Recommended protocols for instrumental assessment of voice: American speech-language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function.
        Am J Speech Lang Pathol. 2018; 27: 887-905
        • Roy N
        • Barkmeier-Kraemer J
        • Eadie T
        • et al.
        Evidence-based clinical voice assessment: a systematic review.
        Am J Speech Lang Pathol. 2013; 22: 212-226
        • Melley LE
        • Sataloff RT.
        Beyond the Buzzwords: artificial Intelligence in Laryngology.
        J Voice. 2022; 36 (Available at:) (Accessed May 12, 2022): 2-3
      1. Lopes L, Cavalcante D, CoDAS PC. Severity of voice disorders: integration of perceptual and acoustic data in dysphonic patients. SciELO Brasil. 2014. Available at: https://www.scielo.br/j/codas/a/kGTm3ryX49stcPVt9YvC5vS/abstract/?lang=en. Accessed February 6, 2022.

      2. Lopes L, Simões L, Voice J da SJ. Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses. Elsevier. 2017. Available at: https://www.sciencedirect.com/science/article/pii/S0892199716301588. Accessed February 6, 2022.

      3. Lopes L, Vieira V, Costa S, et al. Effectiveness of recurrence quantification measures in discriminating subjects with and without voice disorders. Elsevier. 2020. Available at: https://www.sciencedirect.com/science/article/pii/S0892199718303448?casa_token=l3factj6UCEAAAAA:9ZyDPtjY6T_FZaAZIAel9LYgTyWZCk2nUFkNEO_wcVwpO1hGFA3QgXQMRt_DGpZevK5nao7Q. Accessed May 12, 2022.

        • Lopes LW
        • Batista Simões L
        • Delfino da Silva J
        • et al.
        Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses.
        J Voice. 2017; 31: 382.e15-382.e26
      4. Stuart Russell and Peter Norvig - Artificial intelligence: a modern approach. 3rd ed. Available at: https://www.academia.edu/download/61853459/Artificial-Intelligence-A-Modern-Approach-3rd-Edition-by-Stuart-Russell-Peter-Norvig20200121-107745-13gd7bj.pdf. Accessed July 3, 2022.

      5. Jo T. Machine learning foundations. 2021. Available at: https://link.springer.com/content/pdf/10.1007/978-3-030-65900-4.pdf. Accessed July 3, 2022.

        • Jordan MI
        • Mitchell TM.
        Machine learning: trends, perspectives, and prospects.
        Science. 1979; 349: 255-260
      6. Mitchell T, Mitchell T. Machine learning. 1997. Available at: https://profs.info.uaic.ro/∼ciortuz/SLIDES/2017s/ml0.pdf. Accessed July 3, 2022.

        • Hegde S
        • Shetty S
        • Rai S
        • et al.
        A survey on machine learning approaches for automatic detection of voice disorders.
        J Voice. 2019; 33 (Available from:): 947.e11-947.e33
        • Al-Nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        Voice pathology detection and classification using auto- correlation and entropy features in different frequency regions.
        IEEE Access. 2018; 6: 6961-6974
        • Al-nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification.
        J Voice. 2017; : 31
        • Bertelsen C
        • Zhou S
        • Hapner ER
        • et al.
        Sociodemographic characteristics and treatment response among aging adults with voice disorders in the United States.
        JAMA Otolaryngol Head Neck Surg. 2018; 144: 719-726
        • Bainbridge KE
        • Roy N
        • Losonczy KG
        • et al.
        Voice disorders and associated risk markers among young adults in the United States.
        Laryngoscope. 2017; 127: 2093-2099
        • Kridgen S
        • Hillman RE
        • Stadelman-Cohen T
        • et al.
        Patient-reported factors associated with the onset of hyperfunctional voice disorders.
        Ann Otology Rhinol Laryngol. 2021; 130: 389-394
        • Gavidia-Ceballos L
        • Hansen JH.
        Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection.
        IEEE Trans Biomed Eng. 1996; 43: 373-383
        • Ritchings RT
        • McGillion M
        • Moore CJ.
        Pathological voice quality assessment using artificial neural networks.
        Med Eng Phys. 2002; 24: 561-564
        • Nayak J
        • Bhat PS.
        Identification of voice disorders using speech samples.
        IEEE, 2003: 951-953
        • Ananthakrishna T
        • Shama K
        • Niranjan UC
        K-means Nearest Neighbor Classifier for Voice Pathology.
        IEEE, Kharagpur, India2004: 352-354
        • Godino-Llorente J
        • Vilda P.
        Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors.
        IEEE Trans Biomed Eng. 2004; 51: 380-384
        • Behroozmand R
        • Almasganj F.
        Comparison of Neural Networks and Support Vector Machines Applied to Optimized Features Extracted From Patients’ Speech Signal for Classification of Vocal Fold Inflammation.
        IEEE, Athens, Greece2005: 844-8449
        • Fonseca ES
        • Guido RC
        • Silvestre AC
        Discrete Wavelet Transform and Support Vector Machine Applied to Pathological Voice Signals Identification.
        IEEE, Irvine, CA2005: 5
        • Schlotthauer G
        • Torres ME
        • Jackson-Menaldi MC.
        Automatic diagnosis of pathological voices.
        WSEAS Trans Signal Process. 2006; 2: 1260-1267
        • Behroozmand R
        • Almasganj F.
        Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis.
        Comput Biol Med. 2007; 37: 474-485
        • Moran RJ
        • Reilly RB
        • Chazal P de
        • et al.
        Telephony-based voice pathology assessment using automated speech analysis.
        IEEE Trans Biomed Eng. 2006; 53: 468-477
        • Hemmerling D
        • Skalski A
        • Gajda J.
        Voice data mining for laryngeal pathology assessment.
        Comput Biol Med. 2016; 69: 270-276
        • Das R.
        A comparison of multiple classification methods for diagnosis of Parkinson disease.
        Expert Syst Appl. 2010; 37: 1568-1572
      7. Wroge T, Özkanca Y, Demiroglu C, et al. Parkinson's disease diagnosis using machine learning and voice. 2018. Available at: https://ieeexplore.ieee.org/abstract/document/8615607. Accessed February 8, 2022.

        • Chen HL
        • Wang G
        • Ma C
        • et al.
        An efficient hybrid kernel extreme learning machine approach for early diagnosis of Parkinson׳ s disease.
        Neurocomputing. 2016; 184: 131-144
        • Orozco-Arroyave JR
        • Belalcazar-Bolanos EA
        • Arias-Londoño JD
        • et al.
        Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases.
        IEEE J Biomed Health Inf. 2015; 19: 1820-1828
        • Kaleem M
        • Ghoraani B
        • Guergachi A
        • et al.
        Pathological speech signal analysis and classification using empirical mode decomposition.
        Med Biol Eng Comput. 2013; 51: 811-821
        • Kojima T
        • Fujimura S
        • Hasebe K
        • et al.
        Objective assessment of pathological voice using artificial intelligence based on the GRBAS scale.
        J Voice. 2021; (Available at:) (Accessed April 17, 2022)
      8. Fezari M, Amara F. Acoustic analysis for detection of voice disorders using adaptive features and classifiers. 2014. Available at: https://www.researchgate.net/profile/Mohamed-Fezari-2/publication/272093756_wwwinaseorg_library_2014_interlaken_bypaper_CSC_CSC-19/links/54db0ae00cf2ba88a68ee10a/wwwinaseorg-library-2014-interlaken-bypaper-CSC-CSC-19.pdf. Accessed May 18, 2022.

        • el Emary IMM
        • Fezari M
        • Amara F.
        Towards developing a voice pathologies detection system.
        J Commun Technol Electron. 2014; 59: 1280-1288
      9. Chen L, Wang C, Chen J, et al. Voice disorder identification by using hilbert-huang transform (HHT) and K Nearest Neighbor (KNN). Elsevier. 2021. Available at: https://www.sciencedirect.com/science/article/pii/S0892199720301016?casa_token=4K5XDK2tDzEAAAAA:KpXPOyAXyhRkL5XxgqNICGmjhmJIU2nSxy39zv7bd2Qn_zOI04Ho1xyuJEgXRmqYKEY6k7DJ. Accessed May 18, 2022.

        • Sonu Sharma RK
        Disease detection using analysis of voice parameters.
        Int J Comput Sci Commun Technol. 2012; 4: 6-10
      10. Kadiri S, Alku P. Mel-frequency cepstral coefficients of voice source waveforms for classification of phonation types in speech. 2019. p. 2508–2512. https://www.apiit.edu.in/downloads/all%20chapters/CHAPTER-91.pdf

        • Arias-Londoño J
        • Godino-Llorente J
        • Markaki M
        • et al.
        On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.
        Logopedics Phoniatrics Vocol. 2011; 36: 60-69
      11. Kantardzic M. Data Reduction. New York, NY:John Wiley & Sons, Inc; 2003:53–86.

      12. Chen L, Wang C, Chen J, et al. Voice disorder identification by using hilbert-huang transform (HHT) and K nearest neighbor (KNN). J Voice. 2020;35(6)

        • Dibazar AA
        • Narayanan S
        • Berger TW.
        Feature Analysis for Automatic Detection of Pathological Speech.
        IEEE, Houston, TX, USA2002: 182-183
        • Murugesapandian P
        • Yaacob S
        • Hariharan M.
        Feature Extraction Based on Mel-Scaled Wavelet Packet Transform for the Diagnosis of Voice Disorders.
        Springer, Berlin, Heidelberg2008: 790-793
        • Ghoraani B
        • Krishnan S.
        A joint time-frequency and matrix decomposition feature extraction methodology for pathological voice classification.
        EURASIP J Adv Signal Process. 2009; 2009
        • Hariharan M
        • Paulraj MP
        • Yaacob S.
        Detection of vocal fold paralysis and edema using time-domain features and probabilistic neural network.
        Int J Biomed Eng Technol. 2011; 6: 46-57
        • Markaki M
        • Stylianou Y.
        Voice pathology detection and discrimination based on modulation spectral features.
        IEEE Trans Audio Speech Lang Process. 2011; 19: 1938-1948
      13. Tsanas A, Little MA, McSharry PE, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. 2012. Available at: https://ieeexplore.ieee.org/abstract/document/6126094/. Accessed May 19, 2022.

        • Arjmandi MK
        • Pooyan M.
        An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine.
        Biomed Signal Process Control. 2012; 7: 3-19
        • Akbari A
        • Arjmandi MK.
        An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features.
        Biomed Signal Process Control. 2014; 10: 209-223
        • Fang SH
        • Tsao Y
        • Hsiao MJ
        • et al.
        Detection of pathological voice using cepstrum vectors: a deep learning approach.
        J Voice. 2019; 33 (Available at:) (Accessed May 19, 2022): 634-641
        • Florencio V
        • Almeida AA
        • Balata P
        • et al.
        Differences and reliability of linear and nonlinear acoustic measures as a function of vocal intensity in individuals with voice disorders.
        J Voice. 2021; (8:S0892-1997(21)00144-2)
        • Ferrer Riesgo CA
        • Nöth E.
        What makes the cepstral peak prominence different to other acoustic correlates of vocal quality?.
        J Voice. 2020; 34: 806.e1-806.e6
        • Zhang Z.
        Voice feature selection to improve performance of machine learning models for voice production inversion.
        J Voice. 2021; (10:S0892-1997(21)00097-7)
        • Forero M.LA
        • Kohler M
        • Vellasco MMBR
        • et al.
        Analysis and classification of voice pathologies using glottal signal parameters.
        J Voice. 2016; 30: 549-556
        • Deliyski Dimitar.
        Endoscope motion compensation for laryngeal high-speed videoendoscopy.
        J Voice. 2005; 19: 485-496
        • de Almeida AAF
        • Fernandes LR
        • Azevedo EHM
        • et al.
        Characteristics of voice and personality of patients with vocal fold immobility.
        Codas. 2015; 27 (Available at:) (Accessed August 3, 2022): 178-185
        • Heckman JJ
        • Pinto R
        • Savelyev PA.
        Recommended protocols for instrumental assessment of voice: american speech- language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function.
        Angew Chem Int Ed. 1967; 6: 951-952
        • Pontes L
        • Vieira VP
        • Pontes ADL
        • et al.
        Transfer function of Brazilian Portuguese oral vowels: a comparative acoustic analysis. 2009; 75: 680-684
        • Bland JM
        • Altman DG.
        A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement.
        Comput Biol Med. 1990; 20: 337-340
      14. Florencio V de O, Almeida A, Voice PBJ. Differences and reliability of linear and nonlinear acoustic measures as a function of vocal intensity in individuals with voice disorders. Elsevier. 2021. Available at: https://www.sciencedirect.com/science/article/pii/S0892199721001442?casa_token=bPaGKrqFaW4AAAAA:FRcA97bUvP-WKiV_QT8S4wGht6IJJFNQS15vmubgmMvlEiqakKkhUe13A_ug1NFw7M9Q3lyA. Accessed May 19, 2022.

        • Vieira VJD
        • Costa SC
        • Correia SLN
        • et al.
        Exploiting nonlinearity of the speech production system for voice disorder assessment by recurrence quantification analysis.
        Chaos. 2018; 28: 085709-1-085709-10
        • Lopes LW
        • Vieira VJD
        • Costa SL do NC
        • et al.
        Effectiveness of recurrence quantification measures in discriminating subjects with and without voice disorders.
        J Voice. 2018; 34: 208-220
      15. Chris Albon. Machine learning with python cookbook practical solutions from preprocessing to deep learning. 2018:304. https://www.docdroid.net/Z87gYoF/machine-learning-with-python-cookbook-en-pdf

        • Mitchell TM.
        Machine learning. McGraw-Hill Science, New York1997: 432p
        • Singh D
        • Singh B.
        Investigating the impact of data normalization on classification performance.
        Appl Soft Comput. 2020; 97105524
        • Borkin D
        • Némethová A
        • Michaľčonok G
        • et al.
        Impact of data normalization on classification model accuracy.
        Research Papers Faculty of Materials Science and Technology Slovak University of Technology. 2019; 27: 79-84
        • Singh D
        • Singh B.
        Investigating the impact of data normalization on classification performance.
        Appl Soft Comput. 2020; 97105524https://doi.org/10.1016/j.asoc.2019.105524
      16. Kuhn M, Johnson K. Feature engineering and selection. feature engineering and selection: Boca Raton, Florida. 2020. http://www.feat.engineering/77

        • Hegde S
        • Shetty S
        • Rai S
        • et al.
        A survey on machine learning approaches for automatic detection of voice disorders.
        J Voice. 2019; 33: 947.e11-947.e33https://doi.org/10.1016/j.jvoice.2018.07.014
        • Chauhan R
        • Kaur H.
        Predictive analytics and data mining.
        Business Intelligence. 2015; 15: 359-374
      17. Li J, Cheng K, Wang S, et al. Feature selection: a data perspective. Vol. 50, ACM computing surveys. Association for Computing Machinery; 2017.

      18. Steve Jadav. Voice-based gender identification using machine learning. https://ieeexplore.ieee.org/xpl/conhome/8766336/proceeding

        • Iyer R
        • Hosmer DW
        • Lemeshow S.
        Applied Logistic Regression.
        The Statistician. 1991; 40: 458
        • Landis JR
        • Koch GG.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159
        • Zhang Z.
        Voice feature selection to improve performance of machine learning models for voice production inversion.
        J Voice. 2021; https://doi.org/10.1016/j.jvoice.2021.03.004
        • Davis JC
        • Hubbaed CM.
        On the measurement of discrimination against women.
        Am J Econ Sociol. 1979; 38: 287-292
        • Verde L
        • De Pietro G
        • Sannino G.
        Voice disorder identification by using machine learning techniques.
        IEEE Access. 2018; 6: 16246-16255
        • de Abreu SR
        • Sousa ES da S
        • de Moraes RM
        • et al.
        Performance of acoustic measures for the discrimination among healthy, rough, breathy, and strained voices using the feedforward neural network.
        J Voice. 2022; (Available at:) (Accessed October 27, 2022)
        • Lopes L
        • Vieira V
        • Behlau M.
        Performance of different acoustic measures to discriminate individuals with and without voice disorders.
        J Voice. 2022; 36 (Available at:) (Accessed July 24, 2022)
        • Lopes L
        • Sousa E
        • Silva A
        • et al.
        Cepstral measures in the assessment of severity of voice disorders.
        SciELO Brasil. 2019; (Available at:) (Accessed February 6, 2022)
      19. Verde L, Pietro G de. Voice disorder identification by using machine learning techniques. 2018. Available at: https://ieeexplore.ieee.org/abstract/document/8316845/. Accessed February 8, 2022.

        • Siti Ambarwati Y
        • Uyun S.
        Feature selection on magelang duck egg candling image using variance threshold method.
        in: 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2020. 2020: 694-699
        • Hegde S
        • Shetty S
        • Rai S
        • et al.
        A survey on machine learning approaches for automatic detection of voice disorders.
        J Voice. 2019; 33: 947.e11-947.e33
        • Leite DRA
        • Moraes RM de
        • Lopes LW
        Método de Aprendizagem de Máquina para Classificação da intensidade do desvio vocal utilizando Random Forest.
        J Health Inform. 2020; (Available at:) (Accessed July 9, 2021): 196-201
        • Fang SH
        • Tsao Y
        • Hsiao MJ
        • et al.
        Detection of pathological voice using cepstrum vectors: a deep learning approach.
        J Voice. 2019; 33: 634-641