Summary
Graphical Abstract

Key Words
INTRODUCTION
- Suendermann-Oeft D.
- Ramanarayanan V.
- Teckenbrock M.
- et al.
- 1.What makes one believe that the swings in BG trigger voice changes?
- 2.What are the existing approaches to BG estimation from voice? Are there any research biases?
- 3.What are the definite successes in the field? What remains to be an open problem?
THE SEARCH PROTOCOL
- 1To search for the potentially applicable studies, we have searched in five databases of scientific literature covering the publications up to the article submission date in 2020: Cochrane library, PubMed, Scopus, Web of Science, and Google Scholar. The logical formula was: (speech OR voice) AND (diabetes OR glucose OR sugar OR hyperglycemia OR hypoglycemia). The queries are listed in Table 1, and the query was kept as large as possible, retrieving the articles containing the keywords anywhere. In Google Scholar the search was completed only over the titles, but the formula was extended with synonyms.TABLE 1Queries for the Literature Search
Database Query # Hits (Not Listed, If Found in the Previous Database) # Manually Selected as Relevant and Relevant Articles That Cite the Hit Cochrane Library #1 speech MeSH
#2 glucose OR hypoglycemia OR sugar OR hypoglycemia OR diabetes OR SMBG OR self-monitoring of glucose
#3 #1 AND #2223 N/A PubMed (("speech"[MeSH Terms] OR "speech"[All Fields]) OR ("voice"[MeSH Terms] OR "voice"[All Fields])) AND (("glucose"[MeSH Terms] OR "glucose"[All Fields]) OR ("hyperglycemia"[All Fields] OR "hyperglycemia"[MeSH Terms] OR "hyperglycemia"[All Fields]) OR ("hypoglycemia"[All Fields] OR "hypoglycemia"[MeSH Terms] OR "hypoglycemia"[All Fields]) OR ("sugars"[MeSH Terms] OR "sugars"[All Fields] OR "sugar"[All Fields]) OR ("diabetes mellitus"[MeSH Terms] OR ("diabetes"[All Fields] AND "mellitus"[All Fields]) OR "diabetes mellitus"[All Fields] OR "diabetes"[All Fields] OR "diabetes insipidus"[MeSH Terms] OR ("diabetes"[All Fields] AND "insipidus"[All Fields]) OR "diabetes insipidus"[All Fields]) OR SMBG[All Fields] OR (("ego"[MeSH Terms] OR "ego"[All Fields] OR "self"[All Fields]) AND monitoring[All Fields] AND ("glucose"[MeSH Terms] OR "glucose"[All Fields]))) 956 (more recent than 1999) N/A Web of Science (speech OR voice) AND (diabetes OR glucose OR sugar OR hyperglycemia OR hypoglycemia) 808 20,21,24Scopus (“speech” OR “voice”) and (“diabetes” OR “glucose” OR “sugar” OR “hyperglycemia” OR “hypoglycemia”) 886 hits (more recent than 2014) 19Google Scholar allintitle: diabetes OR glucose OR hypoglycemia OR hyperglycemia OR sugar vocal OR acoustic OR perceptual OR speech OR voice; time span of 2009-2019 in articles and patents. 221 22,26,27,28,29 - 2Once the potentially applicable studies were retrieved, the irrelevant ones were discarded based on the manual scanning of the abstracts and full texts. A conservative approach was undertaken of including and summarizing every piece of research that describes either the construction of a vocal biomarker that detects glucose swings from voice or the voice changes in response to the change in glucose concentration.
- 3The cited articles and the ones that cite the relevant articles were added to the pool of potentially applicable studies.
- 4The research biases in the retrieved studies were analyzed: the description of the design alternatives is detailed with a Transparency and Reliability Matrix that was built to account for missing values, and the issues of data collection are addressed.
WHY BG AFFECTS VOICE?
- •the change in glucose level in the blood flowing in the larynx and the cords causes changes in the elastic properties of the biological tissue of these organs, which in turn results in the changes of spectral characteristics in compliance to the Hooke's law of physics.26
- •hypoglycemia is often accompanied by a feeling of anxiety, which causes people to speak faster and with greater urgency, whereas hyperglycemia on the contrary is often accompanied by feelings of lethargy thereby causing speech patterns to be slower or slurred.
EXISTING SYSTEMS FOR GLUCOSE ESTIMATION FROM VOICE
Ground truth
Study | Ground Truth | Blood Reading or CBGM |
---|---|---|
Motorin, 2017 25 | Numeric | BG |
Ulanovsky et al, 2009 26 , 27 | Low: <3.5 mmol/L, Norm: 3.5-6.0 mmol/L, High: >6 mmol/L | BG |
Michaelis, 2014 28 | Mild and extreme hypo- and hyperglycemia | BG |
Rasmusson et al, 2019 29 | Numeric values are implied | CBGM |
Tschope et al, 2015 24 | Numeric | BG |
Czupryniak et al, 2019 23 | Low: <70 mg/dL (hypoglycemia), Norm: 70-200 mg/dL, High: >200 mg/dL (extreme hyperglycemia) | CGBM |
The converter among blood glucose units in different system. Available at:http://www.unit-conversion.info/blood-sugar.html. Accessed January 22, 2020.
Patient groups
Study | Clinical Study | T1D or T2D |
---|---|---|
Motorin, 2017 25 | Yes | Both implied |
Ulanovsky et al, 2009 26 , 27 | Unspecified | Both |
Michaelis, 2014 28 | Unspecified | Unspecified |
Rasmusson et al, 2019 29 | Unspecified | Both |
Tschope et al, 2015 24 | No, volunteers | T1D |
Czupryniak et al, 2019 23 | Unspecified | T1D |
Speech corpora
- •one phoneme /a/,23
- •matched sentences for the ease of comparison,24
Study | Speech Unit | Recording Device |
---|---|---|
Motorin, 2017 25 | Fragments of speech, where the vowels are more frequent | Smartphone |
Ulanovsky et al, 2009 26 , 27 | Free speech mobile phone | Smartphone |
Michaelis, 2014 28 | Any fragment of free speech | Unspecified |
Rasmusson et al, 2019 29 | Unspecified | Unspecified |
Tschope et al, 2015 24 | Matching sentences in German | Unspecified |
Czupryniak et al, 2019 23 | Vowel /a/ for 3 seconds | Professional voice recorder |
Computational approaches
Study | Voice Parameters |
---|---|
Motorin, 2017 25 | From the Fourier spectrum of the voice, ie, values in the coordinates of intensity vs frequency, the coefficients are determined for the solution of equations describing a physical system. |
Ulanovsky et al, 2009 26 , 27 | The Fourier spectrum. (The voice is transformed into spectrum, sound spectrum peaks in the areas of low (100-1,500 Hz) and high (7k-10k) frequencies are sampled, intensities of the selected peaks are determined by frequency, a ratio between the peaks of the selected low and high frequencies is obtained.) |
Michaelis, 2014 28 | Short-term features from formants, pitch, articulation rate (eg, number transitions between voiced and unvoiced sounds), intensity, number speech errors, response time, nonfluency, speech quality, “S” sounds are shifted to “SH”, “R” goes to “L”, “EZ” goes to “ES”, delayed responsiveness. |
Rasmusson et al, 2019 29 | Features include frequency patterns and amplitude patterns in speech spectrum. |
Tschope et al, 2015 24 | openSMILES extractor, 2,375 features. |
Czupryniak et al, 2019 23 | Number of fundamental periods, time of fundamental periods, fundamental frequency, energy, amplitude of fundamental frequency, indicator of voiced probability, simple voice quality, relative average perturbation, shimmer, amplitude perturbation quotient, F1-F4 frequencies, harmonic perturbation quotients, residual to harmonic ratio, unharmonic to harmonic ratio, subharmonic to harmonic ratio, noise to harmonic ratio, F1 to F4 harmonic to all energy ratio. |
Study | Number Subjects | Experiments | How Generic the Method Is |
---|---|---|---|
Motorin, 2017 25 | 7,000 | Unspecified | Speaker-dependent |
Ulanovsky et al, 2009 26 , 27 | Five subjects T1D, two subjects T2D, three healthy | Some testing examples provided | Speaker-dependent |
Michaelis, 2014 28 | Unspecified | Unspecified | Speaker-dependent |
Rasmusson et al, 2019 29 | Unspecified | Unspecified | Unspecified |
Tschope et al, 2015 24 | Two | Yes | Two persons |
Czupryniak et al, 2019 23 | 4 men and 5 women | Unspecified | Within a gender group, but limited to 4-5 speakers |
Study | Computational Approach | Conclusion |
---|---|---|
Tschope et al, 2015 24 | R2 in linear regression | Not a random relation was detected between glucose and a set of voice features |
Czupryniak et al, 2019 23 | Multivariate statistical comparison | The values of those acoustic parameters are significantly altered for hypoglycemia and extreme hyperglycemia. |
Ulanovsky et al, 2009 26 , 27 | Generalized statistically average functional dependencies | Individual examples on which the method works are given |
Michaelis, 2014 28 | Conditional probabilities, Hidden Markov Models | Hypo- and hyperglycemia detection in individual speakers |
Rasmusson et al, 2019 29 | Unspecified | It is taken for granted that glucose can be accurately approximated from voice, the emphasis is laid on its further uses. |
Motorin, 2017 25 | Not probabilistic, no pattern recognition, a system of differential equations instead | 98% accurate glucose estimation in 7,000 patients |
- •Approach I: Hidden Markov Models working on window-based features.
- •Approach II: a classification with a large set of global features for emotion recognition.
- •Approach III: a small set of features and statistical tests to check the significance of the found changes.23
- •Approach V: a set of differential equations was proposed describing a physical model of the speech organs, which links the BG concentration and the coefficients from the Fourier transform.25
Experimental results
MISSING VALUES
Study | TID or TID | Speech Unit | Lab or Mobile | Experiments | Speaker-independency | Comp. Appr. |
---|---|---|---|---|---|---|
Tschope et al 2015 24 | missing | |||||
Czupryniak et al, 2019 23 | missing | |||||
Ulanovsky et al, 2009 26 , 27 | ||||||
Michaelis, 2014 28 | missing | missing | missing | |||
Rasmusson et al, 2019 29 | missing | missing | missing | missing | missing | |
Motorin, 2017 25 | missing | missing |
BIASES OF DATA COLLECTION
CBGM values need to be correctly retrofitted
Subjects must be blinded to their glucose value
FUTURE PATHS
Special success/error measure

Fusion of different technologies
DISCUSSION
- 1)All the studies on glucose estimation from voice report positive results: from a nonrandom nature of the relation between the acoustic patterns and BG value in few subjects to 98% correct estimation in 7,000 subjects. Due to the novelty of the field and the authors being unaware of the published studies (eg, several claim they are the first ones), there are five types of computational system designs that come from different research fields.
- 2)Unlike other vocal biomarkers, generalization beyond one speaker in instantaneous glucose estimation can be impossible, because in endocrinology research the trend is that many aspects regarding diabetes are highly individual and average responses are useless.
- 3)The community of glucose estimation from voice should be aware of the specific practices in endocrinology regarding success/error measure, when it comes to comparing different methods, as well as avoid the bias of data collection, namely, (i) the subjects must be blinded about their BG; (ii) if CBGM values are used, then BG traces need to be correctly retrofitted.
CONCLUSIONS
Future Work
- •to collect data via recording either a short sentence or the phoneme /a/ with a smartphone;
- •extracting a large number of acoustic features from the speech samples with the openSMILES extractor44(or similar software); and then
- •depending on the size of the database either train a deep neuronal network (several thousand voice samples) or a classical classification algorithm such as the Support Vector Machine or Naïve Bayes (several hundred voice samples), with appropriate problem formulation, e. g.42
ACKNOWLEDGMENTS
AUTHOR DISCLOSURE STATEMENT
REFERENCES
- Towards measuring stress with smartphones and wearable devices during workday and sleep.BioNanoScience. 2013; 3: 172-183
- Classification of Cognitive Load From Speech Using an I-Vector Framework.Interspeech, Singapore2014: 751-755
- Speech Stress Assessment Using Physiological and Psychological Measures.Ubicomp, Zurich, Switzerland2013: 921-930
- Random Subset Feature Selection in Automatic Recognition of Developmental Disorders, Affective States, and Level of Conflict From Speech.Interspeech, Lyon, France2013: 210-214
- Classifying Language-Related Developmental Disorders From Speech Cues: The Promise and the Potential Confounds.Interspeech, Lyon, France2013: 182-186
- Automatic Recognition of Speaker Physical Load Using Posterior Probability Based Features From Acoustic and Phonetic Tokens.Interspeech, Singapore2014: 437-441
- Fully automated assessment of the severity of Parkinson's disease from speech.Comput Speech Lang. 2015; 29: 172-185
- Automatic Evaluation of Parkinson's Speech-Acoustic, Prosodic and Voice Related Cues.Interspeech, Lyon, France2013: 1149-1153
- Automatic detection of Parkinson's disease in running speech spoken in three different languages.J Acoust Soc Am. 2016; 139: 481-500
- Automatic Estimation of Parkinson's Disease Severity From Diverse Speech Tasks.Interspeech, Dresden, Germany2015: 914-918
- On automatic diagnosis of Alzheimer's disease based on spontaneous speech analysis and emotional temperature.Cogn Comput. 2015; 7: 44-55
- Speaker state recognition using an HMM-based feature extraction method.Comput Speech Lang. 2013; 27: 135-150
- HALEF: an open-source standard-compliant telephony-based modular spoken dialog system: a review and an outlook.in: Geunbae Lee G. Kook Kim H. Jeong M. Kim J.-H. Natural Language Dialog Systems and Intelligent Assistants. Springer International Publishing, Switzerland2015: 53-61 (ch. 5)
- Towards disorder-independent automatic assessment of emotional competence in neurological patients with a classical emotion recognition system: application in foreign accent syndrome.IEEE Trans Affect Comput. 2019;
- Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Biomarkers Definitions Working Group..Clin Pharmacol Ther. 2001; 69: 89-95
- Syntactic learning for ESEDA.1, tool for enhanced speech emotion detection and analysis.in: Proceedings of Internet Technology and Secured Transactions Conference, UK, London2009 (1-6)
- Non-invasive glucose monitoring: a review of challenges and recent advances.Curr Trends Biomed Eng Biosci. 2017; 6: 1-8
- Continuos glucose monitoring: a review of successes, challenges, and opportunities.Diab Technol Ther. 2016; 18 (S2-3-13)
- Effect of diabetes mellitus on voice: a systematic review.Pract Diab. 2019; 36: 177-180
- Voice based detection of type 2 diabetes mellitus.in: International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics, Chennai, India. 2016: 83-87
- Instrumental acoustic voice characteristics in adults with type 2 diabetes.J Voice. 2019; (In press)
- Vocal characteristics in patients with type 2 diabetes mellitus.Eur Arch Otorhinolaryngol. 2012; 269: 1489-1495
- 378-P: human voice is modulated by hypoglycemia and hyperglycemia in type 1 diabetes.in: Poster Presentation. American Diabetes Association, San Francisco, California2019
- Estimating blood sugar from voice samples: a preliminary study.in: International Conference on Computational Science and Computational Intelligence, Las Vegas, USA. 2015: 804-805
- Scientific solutions for the parameter’s automation in biochemical and biomechanical processes of the operational estimation of blood glucose from human voice.Theory Pract Modern Sci. 2016; 7 (in Russian): 214-226
Y. Ulanovsky, A. Frolov, A. Kozlova, et al., “Device for blood glucose level determination”, Patent WO2014072823, 2014.
Y. Ulanovsky, A. Frolov, A. Kozlova, “Method of non-invasive determination of glucose concentration in blood and device for the implementation of thereof”, Patent WO2014/049438.
P. R. Michaelis, “Detection of extreme hypoglycemia and hyperglycemia based on automatic analysis of speech patterns”, US patent US 7, 925,508 B1, 2011.
J. Rasmusson, P. Karlsson, M. Svensson, et al., “Method and device for blood glucose monitoring”, Patient EP 3 574 830 A1, 2019.
- Blood glucose estimation and symptoms during hyperglycemia and hypoglycemia in patients with insulin-dependent diabetes mellitus.Am J Med. 1995; 98: 22-31
- Speech emotion recognition with TGI+.2 classifier.in: Proceedings of European Association for Computational Linguistics, Student Session, Athens, Greece. 2009: 54-60
- ESEDA: tool for enhanced speech emotion detection and analysis.in: The 4th International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution, Italy, Florence. 2008: 17-19
- A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose.Diabetes Care. 2000; 23: 1143-1148
The converter among blood glucose units in different system. Available at:http://www.unit-conversion.info/blood-sugar.html. Accessed January 22, 2020.
C. Cobelli, S. Del Favero, A. Facchinetti, et al., “Retrospective retrofitting method to generate a continuous glucose concentration profile by exploiting continuous glucose monitoring sensor data and blood glucose measurements”, patent US 2019/0223807.
- Rate-of-change dependence of the performance of two CGM systems during induced glucose swings.J Diab Sci Technol. 2015; 9: 801-807
- Correlates of hypoglycemic fear in type I and type II diabetes mellitus.Health Psychol. 1992; 11: 199-202
- Speech emotion recognition using hidden Markov models.in: Proceedings of Eurospeech, Aalborg, Denmark. 2001
- Impact of diabetes mellitus on voice: a methodological commentary.J Voice. 2020;
Sidorova J, Arlos P, Vendrell J, et al., “Collection and Analysis of Voice Data for Medical Research”, manuscript in preparation.
- Knowledge extraction and improved data fusion for sales prediction in local agricultural markets.Sensors. 2019; 19: 286
- Architecture for trajectory-based fishing ship classification with AIS data.Sensors. 2020; 20: 3782
- Diabetes and metabolic syndrome.Clin Res Rev. 2020; 14 (739e751)
- Recent developments in opensmile, the munich open-source multimedia feature extractor.MM’ 13, Barcelona, Spain. 2013; : 835-838
- Optimization techniques for speech emotion recognition.PhD thesis, Universidad Pompeu Fabra. 2009;
- DUPROSY: Dual probabilistic system for biochemical activity prediction.Procs. of the 8th International Conference on Computing Technology and Information Management. 2012; : 800-803
- Automatic Recognition of Emotive Voice and Speech.Emotions in the Human Voice, Culture and Perception. 3. Plural Publishing, USA2007: 217-242
- Speech emotion recognition, DEA Report. Universidad Pompeu Fabra, 2007
Article info
Publication history
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy