Advertisement

Analysis of COVID-19 Resulting Cough Using Formants and Automatic Speech Recognition System

      SUMMARY

      As part of our contributions to researches on the ongoing COVID-19 pandemic worldwide, we have studied the cough changes to the infected people based on the Hidden Markov Model (HMM) speech recognition classification, formants frequency and pitch analysis. In this paper, An HMM-based cough recognition system was implemented with 5 HMM states, 8 Gaussian Mixture Distributions (GMMs) and 13 dimensions of the basic Mel-Frequency Cepstral Coefficients (MFCC) with 39 dimensions of the overall feature vector. A comparison between formants frequency and pitch extracted values is realized based on the cough of COVID-19 infected people and healthy ones to confirm our cough recognition system results. The experimental results present that the difference between the recognition rates of infected and non-infected people is 6.7%. Whereas, the formant analysis variation based on the cough of infected and non-infected people is clearly observed with F1, F3, and F4 and lower for F0 and F2.

      Key Words

      INTRODUCTION

      The cough is a natural protective mechanism, it helps to clear the secretions from the respiratory tract and prevents entering of noxious particles into the respiratory system. It is generally defined as the sudden expulsion of air accompanied by typical sound. This sound is a characteristic that allows identification and distinguishes it from other vocal manifestations.
      • Korpáš J.
      • Widdicombe J.G.
      • Vrabec M.
      Influence of simulated mucus on cough sounds in cats.
      Effective measurement of cough is needed in order to assess the severity of a particular patient's cough and the effectiveness of treatment. This assessment of cough intensity so far has mainly relied on subjective measures, such as cough reflex sensitivity, and on the patient's symptom perception, which was assessed through visual analog scores for cough, various cough symptoms, and quality of life questionnaires.
      • Chung K.F.
      Assessment and measurement of cough: the value of new tools.
      The authors in
      • Subburaj S.
      • Parvez L.
      • Rajagopalan T.G.
      Methods of recording and analysing cough sounds.
      have described their system uses audio signals sampled at 8 kHz. Data reduction is achieved by selecting 1-s segments that contain signals above an energy threshold. The selected segments of recording are then played back for the identification of cough sounds. Matos. S et al
      • Matos S.
      • Birring S.S.
      • Pavord I.D.
      • et al.
      Detection of cough signals in continuous audio recordings using hidden Markov models.
      have proposed an automatic system based on Hidden Markov Model to detect cough sounds from ambulatory recordings. Their system achieved a success rate of approximately 82%. In
      • Korpas J.
      • Sadlonova J.
      • Salat D.
      • et al.
      The origin of cough sounds.
      ,
      • Kelemen S.A.
      • Cseri T.
      • Marozsan I.
      Information obtained from tussigrams and the possibilities of their application in medical practice.
      several of these have described cough sounds according to their waveforms, finding that the signal envelope appears to differ between patients with different diseases.
      On the other hand, the researchers in
      • Muhammad G.
      • Mesallam T.A.
      • Malki K.H.
      • et al.
      Formant analysis in dysphonic patients and automatic Arabic digit speech recognition.
      have developed an automatic speech recognition system to evaluate the six different types of voice disordered also was calculated the four formants (F1, F2, F3, and F4). The aim of their work is to classify the type of pathology of the voice and to compare distortion in terms of formants. In another study, Automatic Speech Recognition (ASR) system was developed to transcribe speech signals from subjects with a speech disorder into equivalent text.
      • Maier A.
      • Haderlein T.
      • Eysholdt U.
      • et al.
      PEAKS–a system for the automatic evaluation of voice and speech disorders.
      In other similar works,
      • Satori H.
      • Zealouk O.
      • Satori K.
      • et al.
      Voice comparison between smokers and non-smokers using HMM speech recognition system.
      ,
      • Zealouk O.
      • Satori H.
      • Hamidi M.
      • et al.
      Vocal parameters analysis of smoker using Amazigh language.
      ,
      • Ma Z.
      • Bullen C.
      • Chu J.T.W.
      • et al.
      Towards the objective speech assessment of smoking status based on voice features: a review of the literature.
      the authors have evaluated the speech signal of smokers, where different parameters were measured as pitch, four formants frequency and jitter. Moreover, they have employed the ASR technology to develop a system which differentiates between smokers and non-smokers voice based on the Mel frequency spectral coefficients (MFCCs) to determine the voices’ features. Dubuisson et al
      • Dubuisson T.
      • Dutoit T.
      • Gosselin B.
      • et al.
      On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination.
      have analyzed the normal and pathological voices by utilizing the correlation between different acoustic descriptors kinds that are temporal and cepstral. Temporal descriptors consist of energy, mean, standard deviation, and zero-crossing, whereas spectral descriptors contain delta, mean, different moments, spectral decrease, roll-off, etc. Their findings show that the correct classification of pathological voices was 94.7% and the correct classification rate of normal voices it was 89.5%. Costa et al
      • Costa S.C.
      • Neto B.G.A.
      • Fechine J.M.
      • et al.
      Parametric cepstral analysis for pathological voice assessment.
      have discriminated speakers pathological voices influenced by edema of the vocal fold by using linear predictive coding (LPC)-based spectral analysis. Their findings present that the LPC-based cepstral technique is a good method to illustrate changes in the vocal tract by vocal fold edema. Recently, a new epidemic called COVID-19 appeared, and among the most common symptoms at the beginning of this epidemic disease were coughing, fever, etc. This led researchers to make great efforts to understand and combat the phenomenon from a medical and interdisciplinary point of view, as well as computer science and engineering in terms of “digital health” solutions aimed at maximizing the use of available and achievable means.
      In this work, we develop an open-source ASR system able to compare acoustic features of cough sounds producing by healthy and COVID-19 infected people based on Mel-frequency cepstral coefficients and HMM classifier. Also, we carry out a formants frequency and pitch-based analysis for two already presented kinds of cough. The first step is the automated recognition of the resulting COVID-19 cough. The second step is the confirmation of our obtained results by using voice analysis methods.
      Apart from the introduction in section 1, the paper is organized as follows. The overview of COVID-19 is presented in section 2. Section 3 gives a brief cough production. Section 4 introduces the techniques and methods employed in this study. The system architecture is described in section 5. Section 6 investigates the experimental results. We finished with a conclusion.

      THE OVERVIEW OF COVID-19

      Coronaviruses (CoV) are a large family of viruses that cause illnesses ranging from the common cold to more serious diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A new coronavirus (nCoV) corresponds to a new strain that has not previously been identified in humans. In late December 2019, an outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections occurred in Wuhan, Hubei Province, China and rapidly spread in China and outside.
      • Lai C.C.
      • Shih T.P.
      • Ko W.C.
      • et al.
      Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges.
      On February 12, 2020, WHO officially named the disease caused by the novel coronavirus as Coronavirus Disease 2019 (COVID-19) also declared the epidemic of COVID-19 as a pandemic on March 12th 2020.
      • Gautret P.
      • Lagier J.C.
      • Parola P.
      • et al.
      Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial.
      Since most COVID-19 patients have been diagnosed with pneumonia and characteristic CT scans, radiological examinations and laboratory analyzes have become vital in early diagnosis and disease course evaluation.
      • Zu Z.Y.
      • Jiang M.D.
      • Xu P.P.
      • et al.
      Coronavirus disease 2019 (COVID-19): a perspective from China.
      The common symptoms of pandemic COVID-19 are fever, dry cough and fatigue and breathing difficulties. In the most serious cases, the infection may cause pneumonia, severe acute respiratory syndrome, kidney failure, and even death.
      • Pan L.
      • Mu M.
      • Yang P.
      • et al.
      Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study.

      COUGH PRODUCTION

      The cough is a defense reaction mechanism that allows clearing the breathing ways of irritants, particles and microbes by an air expulsion from the lungs via the epiglottis with fast speed. Cough is generated by three stages; Inhalation (breathing in), increased pressure in the throat and lungs with the vocal cords closed, and an explosive release of air when the vocal cords open, giving a cough its characteristic sound.
      • Pantaleo T.
      • Bongianni F.
      • Mutolo D.
      Central nervous mechanisms of cough.
      Cough is a very important feature of more than 100 diseases and other medical symptoms. As reflex-generated perturbation of the respiratory function, cough is an important symptom in many respiratory diseases or irritations.
      Cough can be produced by several mechanisms like receptors where the activation of specific these receptors will generate action potentials that will be carried by the vague nerve, in particular, to the nucleus of the solitary tract (NTS) in the brainstem. This has connections to neurons in the respiratory and coordinating centers of cortical and subcortical coughs. Once the information has been integrated, the signal is transmitted by the efferent channels to all of the actors (muscles of the upper airways, accessory respiratory tract, phrenic muscle and abdominal muscles) allowing the cough motor effort via effector motor neurons.
      • Chung K.F.
      • Pavord I.D.
      Prevalence, pathogenesis, and causes of chronic cough.
      The schematic description of the cough reflex with the location of the receptors, the afferent pathways, the nerve centres, the efferent pathways and the effectors is shown in Figure 1.
      FIGURE 1
      FIGURE 1Anatomical description of cough pathways
      • Pantaleo T.
      • Bongianni F.
      • Mutolo D.
      Central nervous mechanisms of cough.

      TECHNOLOGY AND METHOD

      In this study, in the first case, popular HMM statistical method in machine learning systems were used to classify COVID-19 and non-COVID-19 cough sounds. In the second case, the cough sound acoustic measurement pitch and formants are exploited to confirm our ASR obtained results.

      Pitch

      Pitch describes the sound perceived fundamental frequency (F0) and is one of the main auditory attributes of sounds along with loudness and quality.
      • Gerhard D.
      Pitch Extraction and Fundamental Frequency: History and Current Techniques.
      It is defined as the vocal cords vibration rate under the flux of the glottis air out. Usually, the pitch is ignored by ASR systems and is considered as irrelevant to the recognition tasks. Although much sounds information passed through pitch that is above the levels of lexical and phonetic. In addition to providing the necessary information on the nature of the vocal signal excitation source, the speech pitch contour can be exploited for the speaker's identification, emotion state recognition, voice activity detection tasks and many different applications. Different Pitch extraction methods were referred to in the literature.
      • Bořil H.
      • Pollák P.
      Direct time domain fundamental frequency estimation of speech in noisy conditions.
      The autocorrelation approach is one of the most widely utilized time-domain methods to estimate the speech signal pitch period.
      • Rabiner L.
      On the use of autocorrelation analysis for pitch detection.
      This approach is based on the detection of the highest autocorrelation function value in the interest region. For a known discrete signal {s(q), i= 0, 1, …, QS− 1}, generally, we define the autocorrelation function as:
      RS(m)=logQ0012Q+1q=QQs(q)s(q+m)0mM0


      Where Q is the analysed sequence length and M0 is the autocorrelation point's number that we want to calculate.
      For estimation of pitch, if s(q) was supposed as a periodic sequence with period P, s(q)=s(q + P) for all q, the autocorrelation function is also periodic RS (m) = RS (m + P). Conversely, the periodicity in the signal is indicated by the periodicity in the autocorrelation function.

      Formants

      By changing the shape of the vocal tracts, several shapes of a perfect tube are generated, which in turn can be utilized to change the required vibration frequencies. Each of the vocal tract preferred resonant frequencies (corresponding to the relevant bump in the frequency response curve) is known as formant.
      • Khelifa M.O.
      • Elhadj Y.M.
      • Abdellah Y.
      • et al.
      Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system.
      The cough feature sound results from the vibrations of the vocal cords, mucosal folds above and below the glottis, and the accumulated secretions. The variation in cough sounds is due to various factors that include the secretions nature and amount, anatomical differences and pathological changes in the larynx and another respiratory tract, and the strength of the cough. Cough vibrations also help dislodge secretions from the walls of the airways. There are divers formants, each at a separate frequency; formants occur at intervals of approximately 1000 Hz. At any point in time (as with spectra) there may be any number of formants, in the case of speech, most of the information relating to vowels is determined in the first four formants, called F1, F2, F3, and F4. These are generally called F1 that indicates the first formant, F2 presents the second formant, F3 indicating the third formant, etc. That is, by moving around the body of the tongue and the lips, the position of the formants can be changed.
      • Alotaibi Y.A.
      • Hussain A.
      Comparative analysis of Arabic vowels using formants and an automatic speech recognition system.

      Praat

      PRAAT

      “Praat”, Version 6.1.03 64-bits. 2020, Available at https://www.fon.hum.uva.nl/praat/download_linux.html.

      is an open source software widely used by phoneticians and researchers for determining various phonetic features of the speech. It is a flexible tool for analysis and reconstruction of acoustic speech signals. It performs Analyze of speech, synthesis, manipulation, labeling and segmentation, graphics and has much other functionality.
      • Sauder C.
      • Bretl M.
      • Eadie T.
      Predicting voice disorder status from smoothed measures of cepstral peak prominence using praat and analysis of dysphonia in speech and voice (ADSV).
      Praat was used to record and analyze the wav files to obtain all the parameters presented in this work.

      ASR

      Speech processing is defined as a study of the speech signals and their treatment methods.
      • Hamidi M.
      • Satori H.
      • Zealouk O.
      • et al.
      Speech coding effect on Amazigh alphabet speech recognition performance.
      The signals are generally treated in a digital representation, so speech processing can be assumed as a special event of digital signal treatment, applied to a speech signal. On the other hand, Automatic Speech Recognition (ASR) is considered as one of the thrust research fields in speech processing. ASR is the procedure through which a sound is converted into a word sequence through an algorithm performed as a computer program. The main role of an ASR system is the hypothesize of the most probable discrete sequence of symbols out of all valid sequences in a target language, from the given input acoustic speech vector. In the automatic speech recognition approach, the most common productive learning method is based on hidden Markov models combined with the Gaussian-Mixture Model (GMM). This combination is exploited by the conventional ASR systems for the representation of the speech signals sequential structure. Typically, the Gaussian mixture is utilized by each HMM state for the modulization of sound wave spectral representation. The GMM-HMM model is parameterized by λ(A,B,μ) μ is the state prior probability vector; A=(aij) is a transition probability matrix; B= {(b1, …, bn}and is an ensemble where bj represents the state GMM j. The state is generally related to a phone sub-segment in speech.
      • Karpagavalli S.
      • Chandra E.
      A review on automatic speech recognition architecture and approaches.
      • Hamidi M.
      • Satori H.
      • Zealouk O.
      • et al.
      Amazigh digits through interactive speech recognition system in noisy environment.
      The same topology is utilized for all the HMMs and the defined topology contains 3 active states with observation functions and two non-emitting states (initial and the last state with no observation function), See Figure 2
      FIGURE 2
      FIGURE 2Hidden Markov Model (HMM)—five-states.

      SYSTEM ARCHITECTURE

      System overview

      In this study, An HMM-based ASR was implemented to evaluate the difference of cough between healthy and COVID-19 infected people. The system was divided into three phases depending on their performances. The first one is the training phase, whose function is to create knowledge about the cough and their type to be used in the system. The second one is the HMM model bank, which organizes the system knowledge produced by the first step. Finally, there is the recognition phase whose function is to figure the feature matched with the trained model of each and every class. The parameters of the system were 25 millisecond Hamming window duration with a step size of 10 milliseconds, MFCC coefficients with 22 sets as the length of cepstral filtering and 26 filter bank channels, 13 as the number of MFCC coefficients, and 0.97 as the pre-emphasis coefficients.
      In addition, the first four formants (F1, F2, F3, and F4) were extracted to make analysis and compare the formants frequencies for both groups (Normal and patients by COVID-19). These formants were manually measured using spectrograms, automatic forming track detection and spectra with analysis parameters set to: maximum number of formants, 5; maximum formant frequencies 6000 Hz; window of analysis 0.025 Values were taken in a central and stable part of cough, also Pitch or (F0) was extracted using the command ‘Get Pitch’, with analysis parameters set to: pitch floor 75 Hz, pitch ceiling 500 Hz. The cough sound is divided into three phases (See Figure 3), the first one is an explosive expiration due to the glottis suddenly opening, the second is the intermediate phase with the attenuation of cough sounds, and the third is the voiced phase due to the closing of the vocal cord.
      • Shi Y.
      • Liu H.
      • Wang Y.
      • et al.
      Theory and application of audio-based assessment of cough.
      We based especially on phase three for measuring these parameters.

      Cough recording protocol

      Our study includes the compilation of a data corpus of coughing sounds recorded in a controlled environment. During recording the cough, we used a microphone and a laptop with 4GB of RAM and an Intel Core i5 CPU of 1.2GHz speed. Besides the operating system used in our experience is Ubuntu 14.04 LTS. The microphone was placed at a distance of 20 cm to the mouth of the subjects. The actual distance could vary from 10 cm to 30 cm due to the subject's movement. We kept the sampling rate at Fs = 16 k samples/s and 16-bit resolution to obtain the best sound quality. The database used in our system contains 10 people, divided into two categories, the first consists of 5 normal people and the second contains 5 people infected with the COVID-19 disease, for more detail see Table 1. Concerning the recoding with healthy subjects, we have recorded the cough in our laboratory. In the case of infected people, we have recorded the sound data in the quarantine rooms. All subjects were without any respiratory disease according to personal history and basic examination. At this stage, we have faced several difficulties, most notably reaching people affected during the onset of symptoms, as well as recording the resulting natural cough from volunteer patients. The coughs were extracted from the recordings by detecting bursts of audio energy delimited by silence, and then manually validating and adjusting the start and end times of the detected region. This resulted in 10 segments of audio per person, with each segment corresponding to one complete cough. An audio recording for each cough was saved in “.wav” files.
      TABLE 1Database by Gender Distribution
      MaleFemaleAge
      Healthy32Ranged from 27 to 49 years
      COVID-19 diseases32Ranged from 30 to 52 years

      Cough acoustic model

      This step involved the generation of acoustic models using Sphinxbase and Sphinxtrain. We have exploited lexicon, language model, filler dictionary, phone list, transcription, fileids files and wave audio data. The a generated model includes the information needed to extract the probabilities of recordings Figure 4 summarizes the cough acoustic model preparation step.
      FIGURE 4
      FIGURE 4Cough acoustic model preparation structure.

      Lexicon file

      The lexicon allows the given correspondence between the transcribed word file and the phonemes exploited in the file extension phone. The dictionary extends pronunciations for each word that is presented in the Language Model and it contains the words we want to train followed by their pronunciation, it separates words into subword sequences units. Our dictionary includes the proposed symbolic representations of the cough sound. The pronunciation dictionary plays the role of an intermediary among the language and acoustic models.

      Binary language model file

      The language model determines the used word in a speech application where each word must be mentioned in the lexicon file. A language model indicates a limitation set on the sequence of the words accepted in a given language.
      • Zealouk O.
      • Hamidi M.
      • Satori H.
      • et al.
      Amazigh digits speech recognition system under noise car environment.
      These limitations can be exploited for example by the grammar rules via statistics on each word-estimated on a training speech data. The Language model in binary is used to change the language model file to the N-gram language form.

      Transcription and fileid files

      The training and testing transcription files include coughing sounds that are organized in sequence with capital letters, punctuation and tagging symbols for initialization and ending sentences and followed by the cough corpus file-name.fileids files contain the path of sound files where for each recording file we generate a line with the file name and the path in the control file as it is presented in Figure 5, the extension for this file is “.fleids”.

      Filler file

      In the filler file, we list the silent events as “words.” In our work, this file includes the entries shown in Figure 6 that are explained as follow:

      EXPERIMENTAL RESULTS

      The formants of cough will be analyzed to determine their values. These are expected to confirm helpful in subsequent speech processing tasks such as COVID-19 cough recognition and classification.

      Cough sound recognition

      In this section, we describe our cough recognition system that allows us to illustrate the difference between the cough of healthy people and the cough of infected ones. Our designed system is based on the Mel frequency spectral coefficients in the extraction phase that are modeled by Gaussian-Mixture Models. In addition, the hidden Markov model is used as classifier for the cough classification. The conducted experiments were based on the cough sounds in the training and testing phases where each cough sound was modeled by five HMM states. The state transition was left-to-right and the Gaussian-Mixture Model was used in the modelization of observation probability density functions. The number of mixtures in this model is 8. All training and recognition experiments were implemented with the CMU SPHINX. The training was performed using a cough of patient people, while testing was performed using healthy cough and people who have COVID-19 diseases. For the first experiment, the system was trained by using the cough sound of healthy people (five speakers) and tested by the coughing of healthy ones (three speakers). In the second experiment, the system was trained by the same data of first experiment people (five speakers) and tested with the cough sounds of COVID-19 infected people (three patients). the recognition rate of each experiment was recorded.
      Figure 7 illustrates the cough recognition rates of the two experiments. For the first one, the system accuracy is 93.33% and for the second, the recognition rate is 86.66%. The difference between the obtained recognition rates based on two experiments is 7%. Both experiments have shown that it is possible to observe the difference between healthy people and COVID-19 patients. The small observed difference can be caused by other factors like influence of coronavirus on the vocal cord or glottis, Also, the database size can play an important role in the obtained results.
      FIGURE 7
      FIGURE 7Cough recognition rates of non-infected and infected people.

      Formant based cough analysis

      The aim of this part of the experiments was to perform and evaluate the acoustic analysis of the values of pitch (F0) that were extracted as well as the measurements of the formant frequencies F1, F2, F3, and F4 for two types of cough. For the calculation of these frequencies (in Hz), we calculated the average of ten coughs for each person and concentrated on phase three which indicate the voiced phase, the duration of this phase was about 60 msec. Figure 8 presents the extracted features of male coughing sounds based on the average of three patients and three healthy people where Figure 9 shows the same features based based on female coughing data with two patients and two healthy people.
      FIGURE 8
      FIGURE 8The extracted features of males coughing sounds.
      FIGURE 9
      FIGURE 9The extracted features of females coughing sounds.
      Based on the overall measurement results. In the case of males, the pitch (F0) average is lower by 21 Hz for healthy people compared to those of patients. While the opposite was for women where the value for healthy people is a little higher than patients by Approximate 6 Hz. Also, it can be seen from Figures 8 and 9 that the F1 is small for healthy than COVID-19 infected people with a difference of 100 Hz by males also we observed the same for females with difference 88 Hz. But for F2 values were relatively high for patients with a difference of 20 and 14 for both males and females, respectively. Concerning the F3 and F4 values, the observed differences between the healthy and patients are presented as follows; for the males, the estimated difference is at 175 and 279 Hz for both values respectively. For females, the illustrated differences are 44 Hz and 138 Hz for both values respectively. These findings show that the extracted values of healthy people are higher than the patients' ones for both genders.
      Figure 10 presents the obtained values from the F0 analysis severed into quartiles that is dependent on the vocal fold vibration. The median values of males are 300 Hz and 325.11 Hz, whereas the obtained results for females are 321.5 Hz and 328 Hz, respectively for healthy and patients. The differences between the medians of two types are 25.11 Hz and 7 Hz for males and females respectively. The lower quartile (middle value of the lower half) for infected females is 301.25 Hz and for non-infected females is 305.75 Hz, for infected males is 257 Hz and for healthy males is 286 Hz. On the other hand, the difference between the upper quartile (middle value of the upper half) for females is 19.75 Hz and for males, it is 52.5 Hz.
      FIGURE 10
      FIGURE 10Boxplot of pitch (F0) values based cough of infected and non-infected people including both males and females.
      Figure 11 presents the boxplot with F3 values. The presented median for non-infected females is 2694.5 Hz that is higher than the infected ones by 128.5 Hz. For males, the lower median value is 2829 observed with patients with a variation of 186 Hz. The inter-quartile ranges for healthy and patients females are 291 Hz and 266 Hz respectively, in the case of males we observed 217 Hz for healthy people and 366 Hz for patients.
      FIGURE 11
      FIGURE 11Boxplot of F3 values based cough of infected and non-infected people including both males and females.
      Figure 12 presents a comparison between F4 formant extracted values based on cough data of COVID-19 infected people and non-infected ones including females and males. The boxplots show that the median of healthy females is greater than the infected one by a difference of 124 Hz, whereas the healthy male median is 4029 Hz that is higher than patients by 350.5 Hz. The inter-quartile range for males of healthy people is 232 Hz and patients is 353 Hz, as well as for females of healthy people is 215 Hz and patients is 182 Hz.
      FIGURE 12
      FIGURE 12Boxplot of F4 values based cough of infected and non-infected people including both males and females.
      The median is generally considered to be the best representative of the data central location. The more skewed the distribution, the greater the variation among the median and mean, and the greater affirmation should be placed on utilizing the median as opposed to the mean. For the F0 values, the mean is higher than the median for females with a difference of 12 Hz for infected and 11 Hz for non-infected ones. Concerning males findings, the variation of mean and median is 3 Hz for healthy people and 0 Hz for patients. For F3 formant extracted values, the median is lower than the mean for females with a variance of 100 Hz for patients and 16 for healthy, whereas the opposite was observed with males by a difference of 72 Hz and 42 Hz for patients and healthy one, respectively. For F4 observed values, the differences between the mean and median are noticed with all experimental sets, for females the calculated difference is 37 Hz and 17 Hz for patients and healthy, respectively. For males, the variations are 102 Hz for patients and 46 Hz for healthy ones. In the F4 formant case, the mean is higher than the median for all sets except the healthy males set where is the opposite.
      The coughing sound gives information about the pathophysiological mechanisms of coughing by several parameters as well as the structural nature of the tissue, The pitch of the vibration is determined mainly by the degree of the stretch of vocal cords, by their approximation one to another and by the mass of their edges. In the literature studies,
      • Korpáš J.
      • Sadloňová J.
      • Vrabec M.
      Analysis of the cough sound: an overview.
      ,
      • Braga P C
      • Allegra L
      Clinical Methods for the Study of Cough.
      thier obtained value's value of the pitch (F0) defined by different authors ranges from 300 to 700 Hz in normal condition whereas in cough sounds of bronchitis the bands between 500-1200 Hz are the most expressive. Based on our findings, we can say that for the healthy people we are in agreement with those of
      • Korpáš J.
      • Sadloňová J.
      • Vrabec M.
      Analysis of the cough sound: an overview.
      but for infected ones, we have observed a difference between COVID-19 infected people's values and other diseases like bronchitis the bands, perhaps that the behavior of glottis behaves differently in COVID-19 similar to other pathological conditions. Moreover, the four formants values obtained by the females in both groups healthy and COVID-19 infected are lower than obtained by men with an except in F4 for patients. while in the vowels the four formants for females usually are higher than compared those of men as mentioned in,
      • Vorperian H.K.
      • Kent R.D.
      • Lee Y.
      • et al.
      Corner vowels in males and females ages 4 to 20 years: fundamental and F1–F4 formant frequencies.
      ,

      Bonzi, E. V., Grad, G. B., Maggi, A. M., et al. (2014). Study of the characteristic parameters of the normal voices of Argentinian speakers. arXiv preprint arXiv:1508.06226.

      through our findings, we noticed a difference between healthy and COVID-19 cough sounds by physical sound features, especially in F3 and F4. These formants are able to change in relation to the vocal tract dimension cavity and their reduction would lead to increased frequencies. On the other hand, we cannot compare our cough COVID-19 diseases formant values with published results because currently, we could not find results for this type in the researches.

      CONCLUSION

      Generally, it was interesting to find out the changes in cough sound in pathological conditions caused by COVID-19. In this paper, we have presented HMM automatic speech recognition and formant based analysis of cough sounds by exploiting a spectrogram technique. Agreement between the recognition outputs and formants analysis can be inferred from these results. The overall accuracy of the cough recognition system was 93.33 % for healthy cough and 86.66% for COVID-19 cough sounds with a difference of 6.67%. In addition, the pitch and formant analysis show that in the case of females, F0, F1, F3, and F4 values are higher for the healthy by the difference of 6 Hz, 88 Hz, 44 Hz, and 138 Hz respectively, in the opposite of F2 that is higher for the infected people. In the case of males, the lower values are observed with F0 and F2 for healthy people with a variation of 21 Hz and 20 Hz, respectively, whereas, the F1, F3, and F4 are lower for patients by a diversity of 100 Hz, 250 Hz, and 279 Hz. Our obtained results present an agreement among the conclusions drawn from cough recognition results and formants analysis. To the best of our knowledge, this is the first work that tries to examine the accuracy of ASR and parameters sounds for coughs of people with COVID-19 voices.

      REFERENCES

        • Korpáš J.
        • Widdicombe J.G.
        • Vrabec M.
        Influence of simulated mucus on cough sounds in cats.
        Respir Med. 1993; 87: 49-54
        • Chung K.F.
        Assessment and measurement of cough: the value of new tools.
        Pulm Pharmacol Ther. 2002; 15: 267-272
        • Subburaj S.
        • Parvez L.
        • Rajagopalan T.G.
        Methods of recording and analysing cough sounds.
        Pulm Pharmacol. 1996; 9: 269-279
        • Matos S.
        • Birring S.S.
        • Pavord I.D.
        • et al.
        Detection of cough signals in continuous audio recordings using hidden Markov models.
        IEEE Trans Biomed Eng. 2006; 53: 1078-1083
        • Korpas J.
        • Sadlonova J.
        • Salat D.
        • et al.
        The origin of cough sounds.
        Bull Eur Physiopathol Respir. 1987; 23: 47s-50s
        • Kelemen S.A.
        • Cseri T.
        • Marozsan I.
        Information obtained from tussigrams and the possibilities of their application in medical practice.
        Bull Eur Physiopathol Respir. 1987; 23: 51s-56s
        • Muhammad G.
        • Mesallam T.A.
        • Malki K.H.
        • et al.
        Formant analysis in dysphonic patients and automatic Arabic digit speech recognition.
        Biomed eng online. 2011; 10: 41
        • Maier A.
        • Haderlein T.
        • Eysholdt U.
        • et al.
        PEAKS–a system for the automatic evaluation of voice and speech disorders.
        Speech Communication. 2009; 51: 425-437
        • Satori H.
        • Zealouk O.
        • Satori K.
        • et al.
        Voice comparison between smokers and non-smokers using HMM speech recognition system.
        Int J Speech Technol. 2017; 20: 771-777
        • Zealouk O.
        • Satori H.
        • Hamidi M.
        • et al.
        Vocal parameters analysis of smoker using Amazigh language.
        Int J Speech Technol. 2018; 21: 85-91
        • Ma Z.
        • Bullen C.
        • Chu J.T.W.
        • et al.
        Towards the objective speech assessment of smoking status based on voice features: a review of the literature.
        J Voice. 2021;
        • Dubuisson T.
        • Dutoit T.
        • Gosselin B.
        • et al.
        On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination.
        EURASIP J advances in signal processing. 2009; 173967
        • Costa S.C.
        • Neto B.G.A.
        • Fechine J.M.
        • et al.
        Parametric cepstral analysis for pathological voice assessment.
        in: Proceedings of the 2008 ACM symposium on Applied computing. 2008: 1410-1414
        • Lai C.C.
        • Shih T.P.
        • Ko W.C.
        • et al.
        Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges.
        Int J Antimicrob Agents. 2020; 105924
        • Gautret P.
        • Lagier J.C.
        • Parola P.
        • et al.
        Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial.
        Int J Antimicrob Agents. 2020; 105949
        • Zu Z.Y.
        • Jiang M.D.
        • Xu P.P.
        • et al.
        Coronavirus disease 2019 (COVID-19): a perspective from China.
        Radiology. 2020; 200490
        • Pan L.
        • Mu M.
        • Yang P.
        • et al.
        Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study.
        Am J Gastroenterol. 2020; : 115
        • Pantaleo T.
        • Bongianni F.
        • Mutolo D.
        Central nervous mechanisms of cough.
        Pulm Pharmacol Ther. 2002; 15: 227-233
        • Chung K.F.
        • Pavord I.D.
        Prevalence, pathogenesis, and causes of chronic cough.
        Lancet North Am Ed. 2008; 371: 1364-1374
        • Gerhard D.
        Pitch Extraction and Fundamental Frequency: History and Current Techniques.
        Department of Computer Science, University of Regina, Regina, Canada2003: 0-22
        • Bořil H.
        • Pollák P.
        Direct time domain fundamental frequency estimation of speech in noisy conditions.
        in: 2004 12th European Signal Processing Conference. IEEE, 2004: 1003-1006
        • Rabiner L.
        On the use of autocorrelation analysis for pitch detection.
        IEEE Trans Acoust Speech Signal Process. 1977; 25: 24-33
        • Khelifa M.O.
        • Elhadj Y.M.
        • Abdellah Y.
        • et al.
        Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system.
        Int J Speech Technol. 2017; 20: 937-949
        • Alotaibi Y.A.
        • Hussain A.
        Comparative analysis of Arabic vowels using formants and an automatic speech recognition system.
        Int J Signal Processing, Image Processing and Pattern Recognition. 2010; 3: 11-22
      1. “Praat”, Version 6.1.03 64-bits. 2020, Available at https://www.fon.hum.uva.nl/praat/download_linux.html.

        • Sauder C.
        • Bretl M.
        • Eadie T.
        Predicting voice disorder status from smoothed measures of cepstral peak prominence using praat and analysis of dysphonia in speech and voice (ADSV).
        J Voice. 2017; 31: 557-566
        • Hamidi M.
        • Satori H.
        • Zealouk O.
        • et al.
        Speech coding effect on Amazigh alphabet speech recognition performance.
        J. Adv. Res. Dyn. Control Syst. 2019; 11: 1392-1400
        • Huang X.
        • Acero A.
        • Hon H.W.
        • et al.
        Spoken Language Processing: A Guide to Theory, Algorithm, and System Development.
        Prentice hall PTR, 2001
        • Karpagavalli S.
        • Chandra E.
        A review on automatic speech recognition architecture and approaches.
        Int J Signal Processing, Image Processing and Pattern Recognition. 2016; 9: 393-404
        • Hamidi M.
        • Satori H.
        • Zealouk O.
        • et al.
        Amazigh digits through interactive speech recognition system in noisy environment.
        Int J Speech Technol. 2020; 23: 101-109
        • Shi Y.
        • Liu H.
        • Wang Y.
        • et al.
        Theory and application of audio-based assessment of cough.
        J Sensors. 2018;
        • Zealouk O.
        • Hamidi M.
        • Satori H.
        • et al.
        Amazigh digits speech recognition system under noise car environment.
        Embedded Systems and Artificial Intelligence. Springer, Singapore2020: 421-428
        • Korpáš J.
        • Sadloňová J.
        • Vrabec M.
        Analysis of the cough sound: an overview.
        Pulm Pharmacol. 1996; 9: 261-268
        • Braga P C
        • Allegra L
        Clinical Methods for the Study of Cough.
        in: Braga P L Allegra L Cough. Raven Press, New York1989: 73-93
        • Vorperian H.K.
        • Kent R.D.
        • Lee Y.
        • et al.
        Corner vowels in males and females ages 4 to 20 years: fundamental and F1–F4 formant frequencies.
        J Acoust Soc Am. 2019; 146: 3255-3274
      2. Bonzi, E. V., Grad, G. B., Maggi, A. M., et al. (2014). Study of the characteristic parameters of the normal voices of Argentinian speakers. arXiv preprint arXiv:1508.06226.