Abstract
Objectives
Describing pronunciation features from multiple perspectives can help doctors accurately
diagnose the pathological type of a patient's voice. According to the two modal information
of sound signal and electroglottography (EGG) signal, this paper proposes a pathological
voice detection and classification algorithm based on multimodal transmission network.
Methods
Firstly, we used the short-time Fourier transform (STFT) to map the features of the
two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed
multimodal transmission network extracted features from Mel spectogram and applied
Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate
multimodal information, and the full connection layer diagnoses and classifies voice
pathology according to the fused features.
Results
The experiment was based on 1179 subjects in Saarbrücken voice database (SVD), and
the average accuracy, recall, specificity and F1 score of pathological voice classification
reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms,
the classification accuracy is significantly improved.
Conclusions
The proposed model can integrate multiple modal information to obtain more comprehensive
and stable voice features and improve the accuracy of pathological voice classification.
Future research will further explore in reducing the time-consuming and complexity
of the model.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Journal of VoiceAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Artificial neural network based pathological voice classification using MFCC features.Int J Sci Environ Technol. 2014; 3: 291-302
- Classification of speech dysfluencies with MFCC and LPCC features.Expert Syst Appl. 2012; 39: 2157-2165
- Jitter estimation algorithms for detection of pathological voices.EURASIP J Advan Signal Process. 2009; 2009: 1-9
- Algorithm for jitter and shimmer measurement in pathologic voices.Procedia Comp Sci. 2016; 100: 271-279
- Effectiveness of voice therapy in reflux-related voice disorders.Dis Esophagus. 2010; 23: 27-32
- Inter classifier comparison to detect voice pathologies.Mathemat Biosci Enginee. 2021; 18: 2258-2273
- An incremental method combining density clustering and support vector machines for voice pathology detection.Comp Elect Enginee. 2017; 57: 257-265
- Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy.J Clin Med. 2020; 9: 3415
- Classification of parkinson disease based on patient's voice signal using machine learning.Intell AutomatSoft Comp. 2022; 32: 705-722
- Dnn-based approach to detect and classify pathological voice.in: 2018 IEEE international conference on big data (big data). IEEE, 2018: 5238-5241
- Vocal folds pathologies classification using Naïve Bayes Networks.in: 2017 6th international conference on systems and control (ICSC). IEEE, 2017: 426-432
- Investigation of voice pathology detection and classification on different frequency regions using correlation functions.J Voice. 2017; 31: 3-15
- Detection of pathological voice using cepstrum vectors: adeep learning approach.J Voice. 2019; 33: 634-641
- Voice pathology detection using deep learning: a preliminary study.in: 2017 international conference and workshop on bioinspired intelligence (IWOBI). IEEE, 2017: 1-4
- Deep learning approaches for pathological voice detection using heterogeneous parameters.IEICE Trans Inf Syst. 2020; 103: 1920-1923
- Transfer learning for reinforcement learning domains: a survey.J Mach Lear Res. 2009; 10: 1633-1685
- Voice pathology detection using deep learning on mobile healthcare framework.IEEE Access. 2018; 6: 41034-41041
- Electroglottography in the diagnosis of functional dysphonia.Eur Arch Otorhinolaryngol. 2018; 275: 2523-2528
- Smart health care monitoring: a voice pathology detection paradigm for smart cities.Multimedia Sys. 2019; 25: 565-575
- Convergence of artificial intelligence and internet of things in smart healthcare: acase study of voice pathology detection.IEEE Access. 2021; 9: 89198-89209
- An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification.J Voice. 2017; 31: 113.e9-113.e18
- Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method.Appl Sci. 2021; 11: 3450
- Recurrence quantification analysis of glottal signal as non linear tool for pathological voice assessment and classification.Int Arab J Infor Technol. 2020; 17: 857-866
- MMTM: Multimodal transfer module for CNN fusion.in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13289-13299
- Squeeze-and-excitation networks.in: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141
- Automatic voice pathology monitoring using parallel deep models for smart healthcare.IEEE Access. 2019; 7: 46474-46479
Article info
Publication history
Published online: December 03, 2022
Accepted:
November 11,
2022
Publication stage
In Press Corrected ProofFootnotes
Supported by: This work was supported by Tianjin Health Science and Technology Project (Science and Technology Personnel Training Project, KJ20136).
Identification
Copyright
© 2022 The Voice Foundation. Published by Elsevier Inc. All rights reserved.