Advertisement

Pathological Voice Detection and Classification Based on Multimodal Transmission Network

  • Lei Geng
    Affiliations
    School of Life Sciences, Tiangong University, Tianjin, China

    Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
    Search for articles by this author
  • Yan Liang
    Affiliations
    Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China

    School of Electronic and Information Engineering, Tiangong University, Tianjin, China
    Search for articles by this author
  • Hongfeng Shan
    Affiliations
    Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China

    School of Electronic and Information Engineering, Tiangong University, Tianjin, China
    Search for articles by this author
  • Zhitao Xiao
    Affiliations
    School of Life Sciences, Tiangong University, Tianjin, China

    Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
    Search for articles by this author
  • Wei Wang
    Affiliations
    Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China

    Institute of Otolaryngology of Tianjin, Tianjin, China

    Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China

    Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China

    Otolaryngology Clinical Quality Control Centre, Tianjin, China
    Search for articles by this author
  • Mei Wei
    Correspondence
    Address correspondence and reprint requests to: Mei Wei, Tianjin First Central Hospital, No. 24, Fukang Rd, Tianjin, Nankai, 300192 China.
    Affiliations
    Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China

    Institute of Otolaryngology of Tianjin, Tianjin, China

    Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China

    Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China

    Otolaryngology Clinical Quality Control Centre, Tianjin, China
    Search for articles by this author
Published:December 03, 2022DOI:https://doi.org/10.1016/j.jvoice.2022.11.018

      Abstract

      Objectives

      Describing pronunciation features from multiple perspectives can help doctors accurately diagnose the pathological type of a patient's voice. According to the two modal information of sound signal and electroglottography (EGG) signal, this paper proposes a pathological voice detection and classification algorithm based on multimodal transmission network.

      Methods

      Firstly, we used the short-time Fourier transform (STFT) to map the features of the two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed multimodal transmission network extracted features from Mel spectogram and applied Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate multimodal information, and the full connection layer diagnoses and classifies voice pathology according to the fused features.

      Results

      The experiment was based on 1179 subjects in Saarbrücken voice database (SVD), and the average accuracy, recall, specificity and F1 score of pathological voice classification reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms, the classification accuracy is significantly improved.

      Conclusions

      The proposed model can integrate multiple modal information to obtain more comprehensive and stable voice features and improve the accuracy of pathological voice classification. Future research will further explore in reducing the time-consuming and complexity of the model.

      Keywords

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Srinivasan V
        • Ramalingam V
        • Arulmozhi P.
        Artificial neural network based pathological voice classification using MFCC features.
        Int J Sci Environ Technol. 2014; 3: 291-302
        • Ai O C
        • Hariharan M
        • Yaacob S
        • et al.
        Classification of speech dysfluencies with MFCC and LPCC features.
        Expert Syst Appl. 2012; 39: 2157-2165
        • Silva D G
        • Oliveira L C
        • Andrea M
        Jitter estimation algorithms for detection of pathological voices.
        EURASIP J Advan Signal Process. 2009; 2009: 1-9
        • Teixeira J P
        • Gonçalves A.
        Algorithm for jitter and shimmer measurement in pathologic voices.
        Procedia Comp Sci. 2016; 100: 271-279
        • Vashani K
        • Murugesh M
        • Hattiangadi G
        • et al.
        Effectiveness of voice therapy in reflux-related voice disorders.
        Dis Esophagus. 2010; 23: 27-32
        • Syed S A
        • Rashid M
        • Hussain S
        • et al.
        Inter classifier comparison to detect voice pathologies.
        Mathemat Biosci Enginee. 2021; 18: 2258-2273
        • Amami R
        • Smiti A.
        An incremental method combining density clustering and support vector machines for voice pathology detection.
        Comp Elect Enginee. 2017; 57: 257-265
        • Kim H B
        • Jeon J
        • Han Y J
        • et al.
        Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy.
        J Clin Med. 2020; 9: 3415
        • Ahmed I
        • Aljahdali S
        • Khan M S
        • et al.
        Classification of parkinson disease based on patient's voice signal using machine learning.
        Intell AutomatSoft Comp. 2022; 32: 705-722
        • Chuang Z Y
        • Yu X T
        • Chen J Y
        • et al.
        Dnn-based approach to detect and classify pathological voice.
        in: 2018 IEEE international conference on big data (big data). IEEE, 2018: 5238-5241
        • Dahmani M
        • Guerti M.
        Vocal folds pathologies classification using Naïve Bayes Networks.
        in: 2017 6th international conference on systems and control (ICSC). IEEE, 2017: 426-432
        • Al-Nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        Investigation of voice pathology detection and classification on different frequency regions using correlation functions.
        J Voice. 2017; 31: 3-15
        • Fang S H
        • Tsao Y
        • Hsiao M J
        • et al.
        Detection of pathological voice using cepstrum vectors: adeep learning approach.
        J Voice. 2019; 33: 634-641
        • Harar P
        • Alonso-Hernandezy J B
        • Mekyska J
        • et al.
        Voice pathology detection using deep learning: a preliminary study.
        in: 2017 international conference and workshop on bioinspired intelligence (IWOBI). IEEE, 2017: 1-4
        • Lee JY
        • Choi HJ
        Deep learning approaches for pathological voice detection using heterogeneous parameters.
        IEICE Trans Inf Syst. 2020; 103: 1920-1923
        • Taylor M E
        • Stone PH.
        Transfer learning for reinforcement learning domains: a survey.
        J Mach Lear Res. 2009; 10: 1633-1685
        • Alhussein M
        • Muhammad G.
        Voice pathology detection using deep learning on mobile healthcare framework.
        IEEE Access. 2018; 6: 41034-41041
        • Szkiełkowska A
        • Krasnodębska P
        • Miaśkiewicz B
        • et al.
        Electroglottography in the diagnosis of functional dysphonia.
        Eur Arch Otorhinolaryngol. 2018; 275: 2523-2528
        • Hossain M S
        • Muhammad G
        • Alamri A.
        Smart health care monitoring: a voice pathology detection paradigm for smart cities.
        Multimedia Sys. 2019; 25: 565-575
        • Muhammad G
        • Alhussein M.
        Convergence of artificial intelligence and internet of things in smart healthcare: acase study of voice pathology detection.
        IEEE Access. 2021; 9: 89198-89209
        • Al-Nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification.
        J Voice. 2017; 31: 113.e9-113.e18
        • Fan Z
        • Wu Y
        • Zhou C
        • et al.
        Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method.
        Appl Sci. 2021; 11: 3450
        • Dahmani M
        • Guerti M.
        Recurrence quantification analysis of glottal signal as non linear tool for pathological voice assessment and classification.
        Int Arab J Infor Technol. 2020; 17: 857-866
        • Joze H R V
        • Shaban A
        • Iuzzolino M L
        • et al.
        MMTM: Multimodal transfer module for CNN fusion.
        in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13289-13299
        • Hu J
        • Shen L
        • Sun G.
        Squeeze-and-excitation networks.
        in: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141
        • Alhussein M
        • Muhammad G.
        Automatic voice pathology monitoring using parallel deep models for smart healthcare.
        IEEE Access. 2019; 7: 46474-46479