The Effects of Caffeine on Voice: A Systematic Review

,


INTRODUCTION Caffeine
Caffeine is one of the most consumed substances in the world 1 . It consists of three central nervous system (CNS) stimulants; paraxanthine (84%), theobromine (12%) and theophylline (4%). 2,3 Caffeine consumption has been linked with a lower risk of particular types of cancer, minimized risk of Type II diabetes and a reduced risk of developing Parkinson's disease, Alzheimer's disease and depression. 4 In spite of its beneficial effects, consumption has also been associated with bone loss, reduced bone density, increased pregnancy risks, behavioural changes and sleep deprivation. 5,6 The impact of caffeine on the body's fluid balance as a result of increased diuresis has been investigated. A review by Maughan & Griffin 7 reported that 300mg of caffeine can induce diuresis, while a more recent meta-analysis suggested that 300mg of caffeine intake does not influence diuresis. 8 Three studies have demonstrated that higher dosages (>450 mg) of caffeine intake have subsequent effects on total body water (TBW) volume, fluid balance and urine output volume. [9][10][11] Conversely, two studies reported to hydration status alterations were not noticed following consumption of moderate dosage of caffeine (244 mg-370 mg). 12,13 Side-effects of caffeine include diuresis, increased alertness and sleep deprivation, individuals can develop tolerance to these. The degree of tolerance varies amongst individuals and depends on an individual age and sex 14 . According to a review by Nehlig, 15 the metabolic, pharmacokinetic, functional and physiological effects of caffeine may vary due to age, sex, diet, lifestyle and genetic factors.
Caffeine consumption is noted to be increasing worldwide, since caffeine is contained in numerous sources such as coffee, tea, chocolate, sodas, energy drinks and medications. 16 Despite a lack of consensus regarding safe levels of intake, Health Canada (HC) provide advice about the amount of caffeine that is considered safe to be consumed 17 with values based on the review by Nawrot et al. 18 (Table 1).

Hydration and voice
Hydration refers to the TBW concentrations in the human body. Hydration status is described with the following terms: euhydration, dehydration and hyperhydration. Euhydration is the presence of fluid equilibrium and refers to adequate hydration levels within the human body. 19 Research has shown that age, sex, adiposity levels and population characteristics (eg, occupation, ethnicity) are some of the factors contributing to different TBW volumes. 20,21 On average, the TBW volume comprises approximately 63.3% of total body weight (0.5-0.6L per kg), of which 24.9% is located extracellularly and 38.4% intracellularly. 22 More specifically, the extracellular volume contains 5% of plasma water and about 20% interstitial fluid. 22 Voice-related studies of hydration have been concerned with systemic and superficial hydration. Systemic hydration refers to adequate fluid located within body tissues and is predominantly achieved through water intake 23 . Conversely, superficial (or surface) hydration is defined as the hydration of the surface of laryngeal mucosa that keeps the epithelial cells moisturized and lubricated, and is achieved via steam inhalation or increased environmental humidity. 24 Inadequate hydration can adversely affect vocal fold viscoelasticity, oscillation threshold and voice quality, since vocal folds are covered with a thin mucosal surface layer that has biomechanical and protective properties. 24 In vivo studies on excised animal larynges examined the physiology and biomechanics of the vocal folds after induced dehydration and rehydration challenges. The desiccation challenges utilized to dehydrate the excised animal larynges resulted in increased tissue stiffness with consequent increased phonation threshold pressure (PTP), vocal fold tension and viscosity alterations. [25][26][27][28][29][30] Human studies have demonstrated that both systemic and superficial dehydration adversely affects phonation. Reduced hydration over a fasting period greatly affected maximum phonation time (MPT) and perceived phonatory effort (PPE) in males and females. 30,31 Inadequate hydration also negatively impacted acoustic, perceptual and aerodynamic measures in professional singers following a 2hour rehearsal. 32 Three studies demonstrated that superficial dehydration due to low relative humidity (RH) adversely influenced PPE, PTP and other acoustic measures compared to moderate or high RH. [33][34][35] Two more studies investigated the effects of oral breathing combined with various levels of RH and the results indicated that PPE and PTP measures were negatively affected, however the influence of low RH exacerbate the outcomes. 36,37 While increased water intake is a common recommendation by clinicians, especially if caffeinated beverages are consumed, the available evidence on the effects of caffeine on the body's hydration levels has yielded inconsistent results. Clinical advice is based on the anecdotal notion that caffeine increases diuresis, hence fluid balance within the body is affected. This systematic review seeks to summarize the available evidence about the effects of caffeine on voicerelated measures in healthy participants.

MATERIALS AND METHODS
Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines were used in this systematic review. 38 A protocol was developed to reduce author bias and increase methodological quality. 39 The protocol was registered in the PROSPERO database 40 (PROS-PERO ID: CRD42020196488).

Search strategy
Prior to search strategy development, scoping (preliminary) searches were utilized, using simple terms, in order to exclude the possibility of an existing SR in the same topic and to locate essential studies. 41 Next, a search strategy was developed in collaboration with a subject librarian (IH). This search was piloted and refined to minimize irrelevant results. The search was devised to search for synonyms and related terms for: "healthy adults AND caffeine AND decaffeinated AND voice outcome measures". To ensure the search strategy's accuracy, the Peer Review of Electronic Search Strategy (PRESS) guidelines were used by the second reviewer (NK) to appraise the quality of the search and offer recommendations for change. 42 Overall, the search strategy was deemed appropriate and no revisions were proposed. Six databases (PubMed, Cinahl Complete, Web of Science Core Collection, EMBASE, Cochrane Central, ProQuest Dissertation and Theses A&I) were searched for eligible studies in June 2020. The finalized search strategy consisted of 9 thesaurus (or subject heading) terms and 24 title & abstract terms. The 'human' filter option was used across databases to exclude animal studies. Otherwise, no filters or restrictions were used. A sample search strategy for the PubMed database is in Appendix. Backward citation chaining was also used. This refers to a system of identifying important articles through scanning the reference list of included studies. 43

Eligibility criteria
The PICOS framework 44 was used to guide study eligibility. Participants had to be vocally healthy without diagnosed voice disorders. No age or sex restrictions applied. Studies were eligible for inclusion only if the substance type (e.g. coffee, caffeine tablets, energy drinks) and dosage of caffeine consumed were reported. Although comparators were not a prerequisite for inclusion, if the study had a control group, the comparator could be a decaffeinated beverage, placebo or water. Outcomes of interest were any acoustic, aerodynamic, auditory-perceptual or self-perceptual voice measures.
Randomized control trials (RCTs) and non-randomized studies (NRS) were eligible irrespective of publication, date or language status.

Study selection
All retrieved studies were imported into Covidence and were automatically de-duplicated. The software de-duplicates the articles on the basis of author, title, year and volume of publication. Title and abstract, as well as full text screening were conducted independently by two reviewers (VG & NK) in

ARTICLE IN PRESS
order to minimize the possibility of human error and bias. 39 Disagreements were resolved by consensus. In cases where potential disagreements could not be resolved, a third reviewer (CK) acted as an arbitrator by providing a casting vote.

Data analysis and synthesis
A data extraction form was designed to capture information about participants, study methods, outcome measures and results. Each section was revised and tailored to the review's needs by the principal investigator (VG). Prior to data extraction, the form was piloted on three randomly chosen included studies. Since the data and methodology of each study may vary, piloting was necessary in order to ensure that the data collection form could meet each study's requirements. 39 After piloting, the final data collection form included the following domains: population and setting, study methodology, participant information, study characteristics and experimental procedures, outcome measures. Data were extracted independently by two reviewers (VG, NK). Missing data was addressed by the principal investigator (VG), who contacted authors in order to request information essential for this review. The extracted data were tabulated and summarized narratively, since a meta-analysis was not possible due to heterogeneity across studies.

Quality assessment
Risk of bias Risk of bias was independently appraised by two reviewers (VG, NK). The Risk of Bias (RoB) tool by Cochrane Collaboration was selected in for RCTs. For NRS methodologies, the Downs & Black (D&B) checklist was selected an appropriate tool that can be implemented in an array of study methodologies. 45 A modified version of D&B checklist was utilized. 46 This adapted version of D&B that was implemented in this SR consists of a 28-point scale compared to the original 32point scale. The variation of the total score is a result of the altered scoring system of the last domain Power. In the original version by D&B the score in the domain Power ranges from 0 to 5 depending on the number of subjects allocated to each group. Conversely, in this adapted version the last question in the domain Power was scored on the basis of whether the study reported a power analysis or not. Thus, the score in this domain ranges from 0 to 1 (refer to Table 2 for a detailed description of the tool).

Levels of evidence
The Oxford Centre of Evidence Based Medicine (OCEBM) guide appraises the evidence level of a study based on type of evidence and study methodology. 47 This guide consists of seven questions. Each is a typical question a clinician might encounter when providing advice. For this review the question What are the COMMON harms? was chosen on the basis that caffeine is considered to be potentially harmful to the voice (Table 3). Other questions from the guide were not included as they were irrelevant. It was agreed by the review team to include pilot studies at Level 5 of evidence, as pilot studies, much like "mechanism-based reasoning" studies, explore the feasibility of larger scale RCTs and summarize information for future hypothesis testing. 48 Level 1 Systematic review of randomized trials, systematic review of nested case-control studies, n-of-1 trial with the patient you are raising the question about, or observational study with dramatic effect Level 2 Individual randomized trial or (exceptionally) observational study with dramatic effect Level 3 Non-randomized controlled cohort/follow-up study (post-marketing surveillance) provided there are sufficient numbers to rule out a common harm. (For long-term harms the duration of follow-up must be sufficient.) Level 4 Case-series, case-control, or historically controlled studies Level 5 Mechanism-based reasoning & pilot studies a a Pilot study designs were added by the review team.

Database search results
The total search results retrieved from all databases was n=1818. Following de-duplication, n = 1443 title and abstracts were screened and n = 1435 of them were rejected based on inclusion/exclusion criteria. A total of n=8 studies were deemed eligible for full-text screening. The full text of one study was not available online, 49 thus the primary investigator (VG) corresponded with the primary author. This study was excluded due to lack of response, therefore n = 7 studies were included for full-text review. During full-text review, two further studies were removed as they did not meet eligibility criteria. No disagreements between the reviewers occurred. A total of n = 5 studies were included for data extraction. The citation chaining method did not identify additional articles. The PRISMA flow diagram (Figure 1) illustrates the review phases.

Study characteristics
All eligible studies had non-randomized study designs. [50][51][52][53][54] Three of five were pilot studies, 50,51,53 while the remaining two were true experimental studies. 52,54 The findings of the pilot study by Ahmed et al. 50 were published as a "Letter to the Editor".
Sample size and participant characteristics A total of n=155 healthy participants were recruited amongst the included studies. The age range was reported in four out of five studies and the overall range was 18 to 55 years, 51-54 with the most common age range between 18-35 years. Mean participant age was reported in only one study 53 and was 23. In another paper raw data were available, so mean age and standard deviation were computed as 22.7 §3.86. 52 Participant sex was reported in only four out of five studies and was 91% female, 9% male [51][52][53][54] (Table 4).  With regards to participant eligibility, perceptual or selfreported normal voice with absence of voice pathology was the main inclusion criterion in every study. In the majority of the studies, subjects were excluded if high blood pressure and coronary disease were reported. 50,51,53,54 Reflux symptoms, smoking and medication (except oral contraceptives) were exclusion criteria in one study. 52 Lastly, three studies deemed participants ineligible for inclusion on the basis of self-reported or diagnosed respiratory disorders. [50][51][52] (Table 4)

Experimental procedures
Variation was noted in the experimental procedures employed in each study (Table 5). With regards to the intervention, the type and dose of the caffeine varied amongst the studies. The participants in three of the studies consumed caffeine tablets 51,53,54 while in the rest of the studies subjects ingested coffee. 50,52 The milligrams (mg) of caffeine ingested by participants in included studies ranged from 100mg to 480mg. The comparator intervention consisted of placebo, water, decaffeinated coffee and no intervention at all. While all studies had pre-caffeine baseline measures, only four of them collected voice measures on the same day. The duration of the study procedures varied from two hours to two days, though duration was not reported in two studies 53,54 . Abstention from caffeine to better control experimental procedures was used in four out of five studies, but with varying instructions.

Outcome measures
The primary outcome measures utilized in the included studies were aerodynamic, acoustic and perceptual. Acoustic measures like jitter and shimmer were the most frequently reported outcomes, while perceptual measures were utilized in only one study. Perceptual and acoustic measures were obtained under different conditions, such as sustained "ah" sound, singing, reading or speaking. PTP and aerodynamic airflow were the only aerodynamic measures collected. PTP and airflow outcomes were obtained while the subjects repeatedly uttered the syllables /pi/ or /pa/ respectively. Secondary outcome measures were considered in one study, where the participants were requested to rate their vocal effort using a visual analog scale 52 . Overall, all studies reported non-significant effects of caffeine on voice-related measures (P ≥ 0.05). To be noted, only two studies reported the exact p values for each outcome measure 53,54 . One study reported subtle changes in irregularity of fundamental frequency, however the authors attributed these irregularities to individual characteristics 51 . Detailed description of the results is provided in Table 6.
Methodological quality assessment [ Table 7 near this section] Risk of bias appraisal was conducted using the D&B checklist. Overall, none of the studies was deemed of 'Good' or 'Excellent' quality ( Table 7). Three of the studies were rated as 'Fair' and the remaining two as 'Poor' quality. Power, External Validity and Internal Validity-Confounding domains presented the highest risk of bias between domains in all studies. Conversely, low risk of bias was noted in the Reporting domain in three studies, however in the rest of the studies the risk of bias in the same domain was unclear.
External Validity was at high risk due to lack of information regarding the sampling method, nonrepresentative population and non-representative experimental conditions. Random allocation of participants into intervention or control groups was not performed in any of the experiments and potential confounders were not provided, thus high risk of bias was observed in the Internal Validity-Confounding domain. Lastly, none of the studies presented evidence of power analysis. Despite the fact that a power analysis was not conducted, it can be deduced that the power level was reduced as it was influenced by small sample sizes, which may have increased the chance for a Type II error.
Using the OCEBM Levels of Evidence, the two experimental studies were placed at Level 3 of evidence, since they were non-randomized experimental studies. Pilot studies were automatically assigned in the Level 5 of evidence, as their study design explores the feasibility of a larger scale studies and do not provide substantial evidential information.

DISCUSSION
The aim of this study was to identify and critically appraise the available evidence regarding the potential effects of caffeine on voice-related measures. Due to small number of included studies, lack of methodological integrity and high risk of bias, the evidence regarding the effects of caffeine on phonation is unreliable. This review cannot therefore provide robust advice about the effects of caffeine on voice.

Methodological considerations
Caffeine is known to have a potentially systemic dehydrating effect. The degree of localised dehydration within the vocal folds is however unknown. Vocal fold dehydration can lead to aberrant voice quality through mechanisms such as reduction in vocal fold lubrication, reduced oscillation and increased risk of trauma through vocal fold collision. 23 These findings have been confirmed in a canine model, where dehydrated vocal folds were found to be stiffer and more viscous. 25 The studies in this systematic review are thus based on the hypothesis that systemic dehydration may induce vocal fold dehydration, which would manifest as abnormal voice production due to the factors above.
All included studies had a repeated measures experimental design in order to demonstrate the effects of caffeine intake. This is considered acceptable to measure the causeand-effect relationships between independent and dependent variables between groups. 55 Three of these however employed the pre-post design in a pilot study methodology. Pilot studies provide non-evidential information, as they are  designed to explore the feasibility of larger-scale studies and not answer hypotheses or draw firm conclusions. 48 The recruited subjects were not representative of the entire population, hence a higher risk of selection bias is noted. Indeed, the studies' participants were predominantly females, and the age range in three out of five studies was between 18 and 35 years old. Therefore, the demographic characteristics of the subjects do not represent the true values of the population and differences based on sex could not be drawn.
Sample size and power level are bidirectionally correlated; the bigger the sample the greater the power level. Since the majority of the studies recruited a relatively small number of participants, the power of the studies was reduced, as it is influenced by the sample size. Consequently, the probability of detecting any effects is reduced and the probability of a Type II error is increased. 56 Another limitation that adversely influenced the generalizability and applicability of the evidence is the type of intervention implemented in the studies. The participants of the intervention groups predominantly consumed caffeinated tablets, a source of caffeine that does not reflect a realistic caffeine source, as the average populations' source of caffeine is coffee, tea and soft drinks. 57 This disimproved the ecological validity of the studies.

Experimental considerations
The findings suggest that voice production was not adversely affected by caffeine consumption. This non-significant outcome could be attributed to a variety of factors. The majority of the studies utilized moderate caffeine dosages, which are considered safe for the human body. 17 Another factor that should be taken into account is the caffeine absorption rate. Caffeine absorption is completed in approximately 50 minutes; however, caffeine can be detected in blood plasma within approximately 35 minutes. 58 In two of the studies, voice measures were collected 30 minutes following caffeine consumption. 53,54 Although theoretically 30 minutes are sufficient for the caffeine to be detected in blood plasma, absorption and metabolism of caffeine varies amongst subjects depending on individual characteristics like sex and pharmacokinetics. 15 Since the presence of caffeine in blood was not objectively measured, it is impossible to determine whether the caffeine was completely absorbed within the 30 minutes allotted. One study did however use blood tests for detection of caffeine plasma and the values of caffeine concentrations in blood increased following caffeine consumption. 51 Following oral intake, caffeine's distribution is completed within 20 minutes through the biological membranes. 59 However, regular consumption of higher caffeine dosages can increase the distribution and excretion rate of caffeine. 59,60 Participants' caffeine consumption habits were not reported in any of studies, thus differences in distribution and excretion rates may have been occurred. Caffeine's metabolic rate is significantly increased for smokers, a factor that was taken into account in only one study 15,52 .
Caffeine tolerance is another variable that could have influenced outcomes. Information about the average daily caffeine intake of participants was not provided in any of the studies. Various studies have examined the tolerance levels and the effects of caffeine on habitual and non-habitual coffee drinkers. It is reported that repeated caffeine consumption can induce tolerance, a factor that mitigates caffeine's effects, such as diuresis, increased alertness, sleep deprivation and reduced sense of fatigue 2,7 . Tolerance can be induced in approximately 10 days, however the degree of tolerance varies amongst individuals, as it depends on individual characteristics (eg, sex, age). 14 Objective assessment of hydration status was not employed in any of the studies (eg, bioelectrical impedance analysis, urine samples). Assessment of hydration could provide an insight on possible systemic hydration status following caffeine consumption, since hydration levels vary amongst individuals. It should be noted though that the investigators in two of the studies instructed participants to avoid liquids 12 hours prior to the experimental procedures, 53,54 in an attempt to create equal baseline levels of hydration. Since an objective hydration assessment was not employed, it was impossible to determine whether the participants adhered to the investigators' instructions or whether equal baseline hydration levels were achieved. The effects of environmental humidity and its effects on superficial hydration of the vocal folds were acknowledged in one study where the investigators adjusted the RH to moderate levels (70% RH). 52 Overall, a few potential confounders were taken into account (eg, menstrual cycle, smoking), however none of the experiments described comprehensive experimental control by listing and exhaustively controlling potential confounders.

Implications for clinical practice
Physicians and speech and language therapists advise patients to refrain from caffeine consumption on the assumption that caffeine intake induces diuresis, which lowers fluid balance. Such imbalances can induce dehydration which can be detrimental to phonation. The implied transitive relationship must however be supported empirically. This systematic review focused on the impact of caffeine on voice in healthy adults and found inconclusive evidence about any deleterious effects. Due to lack of methodological quality and risk of bias, the outcomes of this review do not provide robust evidence regarding the potential adverse effects of caffeine on phonation. Thus, clinicians should be cautious when counselling patients to refrain from caffeine, as clinical recommendations cannot be supported. Health professionals could refer to published guidelines 17 regarding the safe values of caffeine consumption to advise patients to moderate caffeine intake if uncertain. This is not to say that counselling patients to reduce caffeine intake is totally without merit. Clinicians should consider the other relationships between caffeine with voice. For instance, caffeine intake might exacerbate existing disorders such as laryngopharyngeal reflux, making avoidance appropriate in some cases. 61 In vivo studies 62 have also demonstrated that caffeine can interfere with circadian rhythms, altering sleep and/or wake cycles. While a systemic effect and not one isolated to the vocal folds, this may cause those with voice difficulties not to obtain refreshing sleep, which could exacerbate feelings of stress. Despite the fact that caffeine has been associated with lower risk of depression, caffeine consumption might adversely affect people with mental health issues. Caffeine has been positively associated with an increased risk for deterioration of anxiety symptoms and higher risk of relapse episodes. 63 SLTs working with patients with mental health issues should consider the effects of caffeine consumption on these patients, since no specific guidelines have been published for safe dosages of caffeine consumption amongst people with mental and psychiatric health issues. 63 Directions for future research The low quantity and quality of the studies that were identified highlights the need for further, more robust research.
Five studies were deemed eligible for inclusion, but all were characterized by methodological flaws and high risk of bias rendering the results unreliable. The methodological limitations of the included studies that are described in this review could however provide the basis for more robust investigations of the effects of caffeine on phonation in the future.
RCTs utilizing randomization (eg, allocation, sampling) and blinding could ensure that cause-and-effect relationships may be attributed to the implemented intervention. However, RCTs are not always feasible, so carefully designed true experimental studies can also yield evidence that can be internally valid. Future investigations must focus on controlling extraneous factors that potentially influence the outcomes. Controlling for all potential confounders might not be possible, thus it is essential to include an objective hydration assessment method in order to determine whether caffeine has an influence on the hydration status of the participants and to utilize blood or urine tests to measure caffeine absorption.
A recent review by Wikoff et al. 6 reviewed data for potential adverse effects of caffeine consumption and the results showed that low to moderate doses (up to 400mg) of caffeine do not negatively affect the body. Thus, greater dosages of caffeine (>400mg) should be used to increase the possibility of detecting effects on voice quality and to reflect the fact that some individuals consume caffeine above recommended amounts. Future research should examine caffeine concentrations in the medium in which they are served. For instance, weaker coffee (higher water to caffeine ratio) might promote hydration and mitigate the dehydrating effects of caffeine as compared to another study that uses stronger coffee (lower water to caffeine ratio). In addition, control over environmental confounders such as humidity is advisable, as superficial hydration of the vocal fold tissues is achieved through higher RH, a factor that might influence outcomes.
Prospective studies should recruit an adequate number of participants that will be representative of the entire population. A large sample size would ensure adequate power levels, so the probability of detecting potential effects would be increased. Additionally, effects should be investigated amongst different age groups and sexes in order to render outcomes more generalizable and representative. Another factor that could be taken into consideration is the longterm effects of caffeine consumption on phonation. Future prospective longitudinal studies could shed light to the longterm effects of caffeine intake on phonation, rendering them more representative of real-life situations, where caffeine is consumed regularly and not in isolation.
It should be noted that no studies that investigated the effects of caffeine in children under-18 years of age were identified, highlighting the lack of research and evidence in this area. In terms of occupation, future research should also investigate the potentially adverse effects of caffeine on occupational and professional voice users, a population that is particularly prone to voice disorders. This systematic review focused on the impact of caffeine on voice in healthy adults and found inconclusive evidence about any deleterious effects. Clinicians working with those who have voice disorders may be interested in how caffeine affects those with dysphonia, since dysphonic individuals could be at increased risk of harm due to their underlying pathology. Even modest dehydration of the vocal folds in such individuals could exacerbate voice difficulties and it would be illuminating to identify whether caffeine might contribute to this. Any researchers engaging in such studies would however need to carefully consider the ethical difficulties of potentially exposing participants to further vocal harm, in addition to controlling for anatomical, physiological and biomechanical variation between different voice disorders.

Limitations
Due to small number of studies, poor quality of data and lack of homogeneity amongst the studies a meta-analysis was not conducted, hence a quantitative analysis and presentation of the findings was not possible.
Although sample sizes were small in a few studies, and although this most likely had a detrimental effect on statistical power, two of the studies utilized relatively high sample sizes. Since a power analysis was not computed, a precise estimation of the ability of the included studies to establish an effect could not be determined.

CONCLUSION
The findings of the present review cannot provide robust evidence regarding the effects of caffeine on voice-related measures. Since no firm conclusions can be elicited to guide clinical practice, clinicians should be cautious when recommending caffeine abstinence to patients.
The results of this review demonstrate the lack of research in the field and the necessity to inform evidence-based practice through reliable and valid outcomes. Future research should recruit a more representative sample and employ robust experimental procedures.

DECLARATION OF INTEREST STATEMENT
No potential conflict of interest was reported by the authors.