Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers

Published:September 05, 2022DOI:



      The purpose of study is to improve the classification accuracy by comparing the results obtained by applying decision tree ensemble learning, which is one of the methods to increase the classification accuracy for a relatively small dataset, with the results obtained by the convolutional neural network (CNN) algorithm for the diagnosis of glottal cancer.


      Pusan National University Hospital (PNUH) dataset were used to establish classifiers and Pusan National University Yangsan Hospital (PNUYH) dataset were used to verify the classifier's performance in the generated model. For the diagnosis of glottic cancer, deep learning-based CNN models were established and classified using laryngeal image and voice data. Classification accuracy was obtained by performing decision tree ensemble learning using probability through CNN classification algorithm. In this process, the classification and regression tree (CART) method was used. Then, we compared the classification accuracy of decision tree ensemble learning with CNN individual classifiers by fusing the laryngeal image with the voice decision tree classifier.


      We obtained classification accuracy of 81.03 % and 99.18 % in the established laryngeal image and voice classification models using PNUH training dataset, respectively. However, the classification accuracy of CNN classifiers decreased to 73.88 % in voice and 68.92 % in laryngeal image when using an external dataset of PNUYH. To solve this problem, decision tree ensemble learning of laryngeal image and voice was used, and the classification accuracy was improved by integrating data of laryngeal image and voice of the same person. The classification accuracy was 87.88 % and 89.06 % for the individualized laryngeal image and voice decision tree model respectively, and the fusion of the laryngeal image and voice decision tree results represented a classification accuracy of 95.31 %.


      The results of our study suggest that decision tree ensemble learning aimed at training multiple classifiers is useful to obtain an increased classification accuracy despite a small dataset. Although a large data amount is essential for AI analysis, when an integrated approach is taken by combining various input data high diagnostic classification accuracy can be expected.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Cook MB
        • McGlynn KA
        • Devesa SS
        • et al.
        Sex disparities in cancer mortality and survival.
        Cancer Epidemiol Biomarkers Prev. 2011; 20: 1629-1637
        • Steuer CE
        • El-Deiry M
        • Parks JR
        • et al.
        An update on larynx cancer.
        CA Cancer J Clin. 2017; 67: 31-50
        • Nocini R
        • Molteni G
        • Camilla C
        • et al.
        Updates on larynx cancer epidemiology.
        Chin J Cancer Res. 2020; 32: 18-25
        • De Vito A
        • Meccariello G
        • Vicini C.
        Narrow band imaging as screening test for early detection of laryngeal cancer: a prospective study.
        Clin Otolaryngol. 2017; 42: 347-353
        • Hancock S
        • Bowman E
        • Prabakaran J
        • et al.
        Use of i-scan endoscopic image enhancement technology in clinical practice to assist in diagnostic and therapeutic endoscopy: a case series and review of the literature.
        Diagn Ther Endosc. 2012;
        • Lin K
        • Cheng DLP
        • Huang Z.
        Optical diagnosis of laryngeal cancer using high wavenumber Raman spectroscopy.
        Biosens Bioelectron. 2012; 35: 213-217
        • Gulshan V
        • Peng L
        • Coram M
        • et al.
        Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
        Jama. 2016; 316: 2402-2410
        • Beede E
        • Baylor E
        • Hersch F
        • et al.
        A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy.
        in: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, New York, NY, USA Association for Computing Machinery, 2022: 1-12
        • Jo CW
        • Kim KG
        • Kim DH
        • et al.
        Screening of pathological voice from ARS using neural networks.
        MAVEBA. 2001; : 241-245
        • Al-Nasheri A
        • Muhammad G
        • Alsulaiman M
        • et al.
        An investigation of multidimensional voice program parameters in three di_erent databases for voice pathology detection and classification.
        J Voice. 2017; 31 (e9): 113
        • Saldanha JC
        • Ananthakrishna T
        • Pinto R.
        Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features.
        J Med Imaging Health Inform. 2014; 4: 168-173
        • Wu H
        • Soraghan J
        • Lowit A
        • et al.
        Convolutional neural networks for pathological voice detection.
        Annu Int Conf IEEE Eng Med Biol Soc. 2018; 2018: 1-4
        • Fang SH
        • Tsao Y
        • Hsiao MJ
        • et al.
        Detection of pathological voice using cepstrum vectors: a deep learning approach.
        J Voice. 2019; 33: 634-641
        • Muhammad G
        • Altuwaijri G
        • Alsulaiman M
        • et al.
        Automatic voice pathology detection and classification using vocal tract area irregularity.
        Biocybern Biomed Eng. 2016; 36: 309-317
        • Kim HB
        • Jeon JH
        • Han YJ
        • et al.
        Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy.
        J Clin Med. 2020; 9: 3415
        • Azam MA
        • Sampieri C
        • Ioppi A
        • et al.
        Deep learning applied to white light and narrow band imaging videolaryngoscopy: toward real-time laryngeal cancer detection.
        Laryngoscope. 2021 Nov 25; (Epub ahead of print. PMID: 34821396)
        • Hu R
        • Zhong Q
        • Xu ZG
        • et al.
        Application of deep convolutional neural networks in the diagnosis of laryngeal squamous cell carcinoma based on narrow band imaging endoscopy.
        Zhonghua Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2021 May 7; 56 (Chinese PMID: 34010998): 454-458
        • Ren J
        • Jing X
        • Wang J
        • et al.
        Automatic recognition of laryngoscopic images using a deep-learning technique.
        Laryngoscope. 2020 Nov; 130 (Epub 2020 Feb 18. PMID:32068890): E686-E693
        • Patrini I
        • Ruperti M
        • Moccia S
        • et al.
        Transfer learning for informative-frame selection in laryngoscopic videos through learned features.
        Med Biol Eng Comput. 2020 Jun; 58 (Epub 2020 Mar 24. PMID: 32212052): 1225-1238
        • Lin J
        • Clancy NT
        • Qi J
        • et al.
        Dual-modality endoscopic probe for tissue surface shape reconstruction and hyperspectral imaging enabled by deep neural networks.
        Med Image Anal. 2018 Aug; 48 (Epub 2018 Jun 15. Erratum in:Med Image Anal. 2021 Aug; 72: 102103. PMID: 29933116): 162-176
        • Ioffe S
        • Szegedy C.
        Batch normalization: accelerating deep network training by reducing internal covariate shift.
        arXiv preprint. 2015; (arXiv:1502.03167)
        • Selvaraju RR
        • Das A
        • Vedantam R
        • et al.
        Grad-CAM: visual explanations from deep networks via gradient-based localization.
        Int J Comput Vis. 2019; 128: 336-359
        • Hafiz AM
        • Bhat GM.
        Deep network ensemble learning applied to image classification using CNN trees.
        arXiv preprint. 2020; (arXiv:2008.00829)
        • Treboux J
        • Genoud D
        • Ingold R.
        Decision tree ensemble vs. nn deep learning: efficiency comparison for a small image dataset.
        in: 2018 International Workshop on Big Data and Information Security (IWBIS). IEEE, 2018, May: 25-30
      1. Pathak DK, Kalita SK, & Bhattacharya DK. Hcec: An Effective Hybrid Cnn-Ensemble Classifier for Hyperspectral Image Classification. 2022. Available at SSRN 4103130.

        • Bui QT
        • Chou TY
        • Hoang TV
        • et al.
        Gradient boosting machine and object-based CNN for land cover classification.
        Remote Sensing. 2021; 13: 2709
        • Moghimi M
        • Belongie SJ
        • Saberian MJ
        • et al.
        Boosted convolutional neural networks.
        In BMVC. 2016, September; 5: 6
        • Harar P
        • Alonso-Hernandezy JB
        • Mekyska J
        • et al.
        ‘Voice pathology detection using deep learning: a preliminary study.
        in: Proc. Int. Conf. Workshop Bioinspired Intell, Funchal, Portugal (IWOBI), Jul. 2017: 1-4
        • Ocak E
        • Beton S
        • Abbasova G
        • et al.
        Reliability of Frozen Section Pathology in Transoral Laser Laryngectomy.
        Turk Arch Otorhinolaryngol. 2015 Jun; 53: 51-54
        • Xiong H
        • Lin P
        • Yu JG
        • et al.
        Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images.
        EBioMedicine. 2019; 48: 92-99
        • Zhao Hh
        • Liu H
        Multiple classifiers fusion and CNN feature extraction for handwritten digits recognition.
        Granul Comput. 2020; 5: 411-418