Advertisement
Research Article| Volume 33, ISSUE 5, P591-602, September 2019

An Objective Parameter to Classify Voice Signals Based on Variation in Energy Distribution

  • Boquan Liu
    Affiliations
    Department of Surgery-Division of Otolaryngology, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
    Search for articles by this author
  • Evan Polce
    Affiliations
    Department of Surgery-Division of Otolaryngology, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
    Search for articles by this author
  • Jack Jiang
    Correspondence
    Address correspondence and reprint requests to Jack Jiang, University of Wisconsin School of Medicine and Public Health, Department of Surgery-Division of Otolaryngology, 1300 University Avenue, 2745 Medical Sciences Center, Madison, WI 53706.
    Affiliations
    Department of Surgery-Division of Otolaryngology, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
    Search for articles by this author

      Summary

      Objectives

      The purpose of this paper is to introduce an iterative nonlinear weighted method based on the variation in spectral energy distribution present in a voice signal to differentiate between four voice types: type 1 voice signals are nearly periodic, type 2 voice signals have strong modulations and subharmonics, type 3 signals are chaotic, and type 4 signals are dominated by stochastic noise.

      Study Design

      A total of 135 voice signal samples of the sustained vowel /a/ were obtained from the Disordered Voice Database and then individually categorized into the appropriate voice types based on the classification system described in Sprecher et al (2010). Voice samples were analyzed using the nonlinear methods of spectrum convergence ratio, rate of divergence, and nonlinear energy difference ratio (NEDR) to investigate classifier efficacy.

      Methods

      An iterative nonlinear weighted method based on the derivative of instantaneous frequency and Fourier transformations is applied to calculate spectral energy distributions. The distribution is then used to calculate the NEDR to classify voice signal types.

      Results

      Statistical analysis revealed that NEDR effectively differentiated between all four voice types (P < 0.001). Subsequent multiclass receiver operating characteristic analysis demonstrated that NEDR (area under the curve [95% CI] = 0.99 [0.96–1.0]) possessed the greatest classification accuracy relative to spectrum convergence ratio and rate of divergence.

      Conclusion

      NEDR was shown to be an effective metric for objective differentiation between all four voice signal types. NEDR calculations occurred approximately instantaneously, constituting a substantial improvement over the tedious computational time required for calculation of previous nonlinear parameters. This metric could assist clinicians in the diagnosis of voice disorders and monitor the efficacy of treatment through observation of voice acoustical improvement over time.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Journal of Voice
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Little M.A.
        • McSharry P.E.
        • Roberts S.J.
        • et al.
        Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection.
        Biomed Eng Online. 2007; 6: 23
        • Titze I.R.
        Workshop on acoustic voice analysis: summary statement.
        in: National Center for Voice and Speech. 1995: 26-27 (Denver, CO; Available at:)
        • Sprecher A.
        • Olszewski A.
        • Zhang Y.
        • et al.
        Updating signal typing in voice: addition of type 4 signals.
        J Acoust Soc Am. 2010; 127: 3710-3716
        • Gerratt B.R.
        • Kreiman J.
        Measuring vocal quality with speech synthesis.
        J Acoust Soc Am. 2001; 110: 2560-2566
        • Kreiman J.
        • Gerratt B.R.
        Validity of rating scale measures of voice quality.
        J Acoust Soc Am. 1998; 104: 1598-1608
        • Millet B.
        • Dejonckere P.H.
        What determines the differences in perceptual rating of dysphonia between experienced rater?.
        Folia Phoniatr Logop. 1998; 50: 305-310
        • Gupta R.
        • Chaspari T.
        • Kim J.
        • et al.
        Pathological speech processing: state-of-the-art, current challenges, and future directions.
        (IEEE Conference Proceedings)2016: 6470-6474
        • Awan S.N.
        • Novaleski C.K.
        • Rousseau B.
        Nonlinear analyses of elicited modal, raised, and pressed rabbit phonation.
        J Voice. 2014; 28: 538-547
        • Choi S.H.
        • Zhang Y.
        • Jiang J.J.
        • et al.
        Nonlinear dynamic-based analysis of severe dysphonia in patients with vocal fold scar and sulcus vocalis.
        J Voice. 2012; https://doi.org/10.1016/j.jvoice.2011.09.006
        • Jiang J.J.
        • Zhang Y.
        • Ford C.N.
        Nonlinear dynamics of phonations in excised larynx experiments.
        J Acoust Soc Am. 2003; 114: 2198-2205
        • Jiang J.J.
        • Zhang Y.
        • McGilligan C.
        Chaos in voice, from modeling to measurement.
        J Voice. 2005; 20: 2-17
        • Mende W.
        • Herzel H.
        • Wermke K.
        Bifurcations and chaos in newborn infant cries.
        Phys Lett. 1990; 145: 418-424
        • Awan S.N.
        • Roy N.
        • Jiang J.J.
        Nonlinear dynamic analysis of disordered voice: the relationship between the correlation dimension (D2) and pre-/post-treatment change in perceived dysphonia severity.
        J Voice. 2010; 24: 285-293
        • Ma E.P.
        • Yiu E.M.
        Suitability of acoustic perturbation measures in analysing periodic and nearly periodic voice signals.
        Folia Phoniatr Logop. 2005; 57: 38-47
        • Lin L.
        • Calawerts W.M.
        • Dodd K.
        • et al.
        An objective parameter for quantifying the turbulent noise portion of voice signals.
        J Voice. 2016; 30: 664-669
        • Calawerts W.M.
        • Lin L.
        • Sprott J.C.
        • et al.
        Using rate of divergence as an objective measure to differentiate between voice signal types based on the amount of disorder in the signal.
        J Voice. 2017; 31: 16-23
        • Herzel H.
        • Reuter R.
        Quantifying correlations in pitch- and amplitude contours of sustained phonation.
        Acta Acustica United Acustica. 2000; 86: 129-135
        • Heman-Ackah Y.D.
        • Michael D.D.
        • Goding G.S.
        The relationship between cepstral peak prominence and selected parameters of dysphonia.
        J Voice. 2002; 16: 20-27
        • Sauder C.
        • Bretl M.
        • Eadie T.
        Predicting voice disorder status from smoothed measures of cepstral peak prominence using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).
        J Voice. 2017; 31: 557-566
        • Yu P.
        • Ouaknine M.
        • Revis J.
        • et al.
        Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements.
        J Voice. 2001; 15: 529-542
        • Oppenheim A.V.
        • Schafer R.W.
        Discrete-Time Signal Processing.
        2nd ed. Prentice Hall, Upper Saddle River, NJ1999
        • Chen J.
        • Li J.
        • Yang S.
        • et al.
        Weighted optimization-based distributed Kalman filter for nonlinear target tracking in collaborative sensor networks.
        IEEE Trans Cybern. 2016; 99: 1-14
        • Liu B.
        • Zeng Y.
        Uncertainty-aware frequency estimation algorithm for passive wireless resonant SAW sensor measurement.
        Sens Actuators A Phys. 2016; 237: 136-146
        • Rafajlowicz E.
        • Pawlak M.
        • Steland A.
        Nonlinear image processing and filtering: a unified approach based on vertically weighted regression.
        Int J Appl Math Comput Sci. 2008; 18: 49-61
        • Shmaliy Y.S.
        Suboptimal FIR filtering of nonlinear models in additive white Gaussian noise.
        IEEE Trans Signal Process. 2012; 60: 5519-5527
        • Kelley K.
        • Preacher K.J.
        On effect size.
        Psychol Methods. 2012; 17: 137-152
        • Rabiner L.R.
        • Schafer R.W.
        Digital Processing of Speech Signals.
        Prentice Hall, Upper Saddle River, NJ1978
        • Zhang Y.
        • Jiang J.J.
        • Biazzo L.
        • et al.
        Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis.
        J Voice. 2005; 19: 519-528
        • Zhang Y.
        • McGilligan C.
        • Zhou L.
        • et al.
        Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps.
        J Acoust Soc Am. 2004; 115: 2270-2277
        • Tao C.
        • Jiang J.J.
        Chaotic component obscured by strong periodicity in voice production system.
        Phys Rev E Stat Nonlin Soft Matter Phys. 2008; 77 (061922)
        • Neubauer J.
        • Edgerton M.
        • Herzel H.
        Nonlinear phenomena in contemporary vocal music.
        J Voice. 2004; 18: 1-12
        • Kumar A.
        • Mullick S.K.
        Nonlinear dynamical analysis of speech.
        J Acoust Soc Am. 1996; 100: 615-629
        • Abdelli-Beruh N.B.
        • Drugman T.
        • Red Owl R.H.
        Occurrence frequencies of acoustic patterns of vocal fry in American English speakers.
        J Voice. 2016; 30 (e11-759.e20): 759
        • Aronsson C.
        • Bohman M.
        • Ternstrom S.
        • et al.
        Loud voice during environmental noise exposure in patients with vocal nodules.
        Logoped Phoniatr Vocol. 2007; 32: 60-70
        • Awan S.N.
        • Roy N.
        Outcomes measurement in voice disorders: application of an acoustic index of dysphonia severity.
        J Speech Hear Res. 2009; 52: 482-499
        • Higgins M.B.
        • Netsell R.
        • Schulte L.
        Vowel-related differences in laryngeal articulatory and phonatory function.
        J Speech Hear Res. 1998; 41: 712-724
        • Moon K.R.
        • Chung S.M.
        • Park H.S.
        • et al.
        Materials of acoustic analysis: sustained vowel versus sentence.
        J Voice. 2012; 26: 563-565
        • MacCallum J.K.
        • Zhang Y.
        • Jiang J.J.
        Vowel selection and its effects on perturbation and nonlinear dynamic measures.
        Folia Phoniatr Logop. 2011; 63: 88-97
        • Kiliç M.A.
        • Öğüt F.
        • Dursun G.
        • et al.
        The effects of vowels on voice perturbation measures.
        J Voice. 2004; 18: 318-324
        • Orlikoff R.F.
        Vocal stability and vocal tract configuration: an acoustic and electroglottographic investigation.
        J Voice. 1995; 9: 173-181
        • Parsa V.
        • Jamieson D.G.
        Acoustic discrimination of pathological voice: sustained vowels versus continuous speech.
        J Speech Hear Res. 2001; 44: 327-339
        • Zhang Y.
        • Jiang J.J.
        Acoustic analyses of sustained and running voices from patients with laryngeal pathologies.
        J Voice. 2008; 22: 1-9
        • Awan S.N.
        • Roy N.
        • Jett M.E.
        • et al.
        Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: comparisons with auditory-perceptual judgements from the CAPE-V.
        Clin Linguist Phon. 2010; 24: 742-758
        • Narayanan S.S.
        • Alwan A.A.
        A nonlinear dynamical systems analysis of fricative consonants.
        J Acoust Soc Am. 1995; 97: 2511-2524