Research Article| Volume 34, ISSUE 3, P485.e1-485.e21, May 2020
• PDF [3 MB]PDF [3 MB]
• Top

# Effects of the Lung Volume on the Electroglottographic Waveform in Trained Female Singers

Open AccessPublished:October 16, 2018

## Abstract

### Objectives

To determine if in singing there is an effect of lung volume on the electroglottographic waveform, and if so, how it varies over the voice range.

### Study design

Eight trained female singers sang the tune “Frère Jacques” in 18 conditions: three phonetic contexts, three dynamic levels, and high or low lung volume. Conditions were randomized and replicated.

### Methods

The audio and EGG signals were recorded in synchrony with signals tracking respiration and vertical larynx position. The first 10 Fourier descriptors of every EGG cycle were computed. These spectral data were clustered statistically, and the clusters were mapped by color into a voice range profile display, thus visualizing the EGG waveform changes under the influence of fo and SPL. The rank correlations and effect sizes of the relationships between relative lung volume and several adduction-related EGG wave shape metrics were similarly rendered on a color scale, in voice range profile-style ʻvoice maps.ʼ

### Results

In most subjects, EGG waveforms varied considerably over the voice range. Within subjects, reproducibility was high, not only across the replications, but also across the phonetic contexts. The EGG waveforms were quite individual, as was the nature of the EGG shape variation across the range. EGG metrics were significantly correlated to changes in lung volume, in parts of the range of the song, and in most subjects. However, the effect sizes of the relative lung volume were generally much smaller than the effects of fo and SPL, and the relationships always varied, even changing polarity from one part of the range to another.

### Conclusions

Most subjects exhibited small, reproducible effects of the relative lung volume on the EGG waveform. Some hypothesized influences of tracheal pull were seen, mostly at the lowest SPLs. The effects were however highly variable, both across the moderately wide fo-SPL range and across subjects. Different singers may be applying different techniques and compensatory behaviors with changing lung volume. The outcomes emphasize the importance of making observations over a substantial part of the voice range, and not only of phonations sustained at a few fundamental frequencies and sound levels.

## 1. INTRODUCTION

Professional singers and vocal coaches consider breathing control to be important to the singing voice, and believe that poor breath management can be particularly problematic for singing, based on the general assumption that the breathing technique affects phonation.

Doscher BM. The Functional Unity of the Singing Voice. Metuchen NJ: Scarecrow Press; 1994.

• Vennard W
Singing: The Mechanism and the Technic.
A common way to assess breathing objectively for speech and singing is by measuring lung volume (LV) expressed as a percentage of the vital capacity (VC). Lung volumes have been measured in vocally untrained subjects and in professional singers. Hixon, Goldman, and Mead
• Hixon TJ
• Goldman MD
Kinematics of the chest wall during speech production: volume displacements of the rib cage, abdomen, and lung.
• Hixon TJ
• Goldman MD
Dynamics of the chest wall during speech production: function of the thorax, rib cage, diaphragm, and abdomen.
observed LV in nonsingers’ speech and singing and found it to be in the midrange of the VC for most of the utterances during normal conversation, reading, and singing, but 10%–20% higher during loud reading and speaking. They found also that LV at initiation of the phrases (ILV) was 50%–60% VC and at termination of the phrases (TLV) was 30%–50% VC. ILV reached up to 90% VC during singing in eight nonprofessional singers.
• Bouhuys A
• Proctor DF
• et al.
Kinetic aspects of singing.
ILV was in the range of 60%–90% VC and TLV at 13%–35% VC in six professional male singers.
• Watson PJ
• Hixon TJ
Respiratory kinematics in classical (opera) singers.
Similar results were found in four female professional singers featuring ILV in the range of 43%–93% VC and TLV between 28% and 52% VC,
• Watson PJ
• Hixon TJ
• Stathopoulos ET
• et al.
Respiratory kinematics in female classical singers.
suggesting that respiratory behavior, in terms of the expenditure of air over musical phrases in classical singing, is not gender-specific. Mean ILV was 70% VC and mean TLV was 30% VC across seven professional female and male classical singers.
• Thomasson M
• Sundberg J
Lung volume levels in professional classical singing.
Furthermore, an increasing ILV and decreasing TLV was observed during learning a novel piece, as familiarity with the piece increased.
• Watson PJ
• Hixon TJ
Respiratory behavior during the learning of a novel aria by a highly trained classical singer.
The respiratory behavior of six male professional country singers resembled that of untrained singers, with ILV ranging between 34% and 80% VC and TLV between 9% and 46% VC during singing.
• Hoit JD
• Jenks CL
• Watson PJ
• et al.
Respiratory function during speaking and singing in professional country singers.
LV has been found to affect a number of aspects of phonation in vocally untrained subjects, such as the larynx height and the glottal voice source. These effects might be interpreted largely as the result of the tracheal pull, referring to a biomechanical force linking the breathing and laryngeal systems and affecting the laryngeal configuration. The tracheobronchial tree is lowered in the torso during inspiration,
• Macklin CC
X-ray studies on bronchial movements.
inducing a lowering of the trachea because of the elastic interconnections between the tracheal cartilages. This lowering then generates a caudal force on the larynx, the so-called tracheal pull, which will be greater at high LV than at low LV.
• Zenker W
• Glaninger J
Die Stärke des Trachealzuges beim lebenden Menschen und seine Bedeutung für die Kehlkopfmechanik.
A number of studies have investigated LV (relative to VC) in relation to the vertical larynx position (VLP). Iwarsson, Thomasson, and Sundberg
• Thomasson M
• Sundberg J
Lung volume and phonation: a methodological study.
found that high LV was associated with a lower larynx position than that in low LV, in vocally untrained participants. They also found that nonsingers elevated their larynx with increasing pitch, corroborating previous investigations revealing a correlation of VLP with pitch.
• Thomasson M
• Sundberg J
Effects of lung volume on the glottal voice source.
• Pabst F
• Sundberg J
Tracking multi-channel electroglottograph measurement of larynx height in singers.
• Shipp T
Vertical laryngeal position during continuous and discrete vocal frequency change.
Effects of inhalatory abdominal wall movement on vertical laryngeal position during phonation.
studied VLP in vocally untrained singers in relation to two different inhalatory behaviors, such as the inward movement and the expansion of the abdominal wall (AW). The results suggested that VLP can be affected by inhalatory behaviors if no attention is paid to posture. Surprisingly, the abdomen-out condition was significantly associated with high VLP, contradicting the hypothesis that an expansion of the abdominal wall lowers the diaphragm, thus increasing the tracheal pull resulting in lowering the larynx. These results can be understood in light of the postural changes associated with the two inhalatory modes; the inward movement featured a recession of the chin toward the neck, resulting in a lower VLP.
An analysis of the breathing strategy in professional singing showed that higher LV was significantly correlated with lower VLP, but no significant effect of the inhalatory AW position such as, belly in and belly out, was found in male opera singers.
• Thomasson M
Belly-in or belly-out? Effects of inhalatory behaviour and lung volume on voice function in male opera singers.
No significant difference between the supported and unsupported voice of professional singers was found.
• Griffin B
• Woo P
• Colton R
• et al.
Physiological characteristics of the supported singing voice. A preliminary study.
High intrasinger consistency of LV behavior and rib cage (RC) movements have been revealed in opera singing, but no intersinger consistency has been observed, suggesting that the contribution of RC and AW to LV differs across singers and that operating singing does not feature a uniform breathing strategy.
• Thomasson M
• Sundberg J
Consistency of inhalatory breathing patterns in professional operatic singers.
Previous research has also demonstrated the possibility that variations in the lung volume affect the glottal voice source, based on the hypothesis that high LV is associated with an abductive glottal force component.
• Zenker W
• Glaninger J
Die Stärke des Trachealzuges beim lebenden Menschen und seine Bedeutung für die Kehlkopfmechanik.
• Zenker W
Questions regarding the function of external laryngeal muscles.
• Thomasson M
• Sundberg J
Lung volume and phonation: a methodological study.
• Thomasson M
• Sundberg J
Effects of lung volume on the glottal voice source.
investigated whether relative LV affects phonation in nonsingers, and found significant phonatory effect of lung volume on voice source characteristics analyzed by inverse filtering: high LV was associated with higher subglottal pressure, smaller closed quotient, larger glottal leakage, and greater flow amplitude than those observed in low lung volume. Similar results were obtained by analyzing respiratory, acoustic, aerodynamic, electroglottographic, and videostroboscopic recordings of normal speaking women phonating at high, mid, and low relative lung volume.
• Milstein C.
Laryngeal function associated with changes in lung volume during voice and speech production in normal speaking women.
LV might affect the glottal voice source also of professional singers. Using inverse filtering, Thomasson
• Thomasson M
Belly-in or belly-out? Effects of inhalatory behaviour and lung volume on voice function in male opera singers.
found that an increase in relative LV was significantly correlated with higher subglottal pressure and peak-to-peak flow amplitude and smaller closed quotient, in male opera singers. Contrasting results were obtained by Thomasson,
• Thomasson M
Effects of lung volume on the glottal voice source and the vertical laryngeal position in male professional opera singers.
who found that some voice parameters such as subglottal pressure, closed quotient, and glottal leakage remained constant during the relative LV range, suggesting that professional singers might learn to compensate for LV change, via laryngeal adjustments.
The effect of lung volume on the voice source remains underexplored. Therefore, the present experiment investigated how LV affects the pattern of vocal fold contacting, as represented by the electroglottographic (EGG) waveform, in trained female singers. For assessing the results, some novel modes of presentation were devised.

## 2. THEORY

Since the possible effect of LV on phonation is thought to be mediated by tracheal pull, one ought to measure directly the force of this pull. This might be done by attaching strain gauges to the inside of the subglottal tracheal wall, or by measuring how the distance between, say, ink spots spaced vertically on the elastic tracheal wall varies with LV. Clearly, the research question here does not warrant such invasive procedures. We resort instead to observing the electroglottogram.

### 2.1 Time-domain EGG analysis

Introductions to the principles and applicability of electroglottography are given by several sources, such as Baken.
• Baken RJ
Electroglottography.
Here we address only issues that are especially pertinent to the present study.
The electroglottogram is usually plotted with the electrical admittance and hence also the relative contact area increasing upward. A large positive EGG value corresponds to a relatively large change in medial vocal fold contact area.
• Scherer RC
• Druker D
• Titze I
Electroglottography and direct measurement of vocal fold contact area.
• Hampala V
• Garcia M
• Švec JG
• et al.
Relationship between the electroglottographic signal and vocal fold contact area.
This orientation helps reduce confusion of the typical EGG signal with that of the flow glottogram, which would otherwise be deceptively similar and in-phase with the EGG. Since vocal fold (VF) tissue contact is usually more abrupt than tissue separation, EGG pulses tend to be skewed to the left rather than to the right. The baseline and peak amplitude of the EGG represent the minimum and maximum areas of VF contact, respectively. The incline of the pulse flanks represents the rate-of-change of contact area. Note that in a (rare) scenario in which the medial faces of the VFs are essentially flat and parallel to each other, the contact area can change very quickly, even if the collision is quite gentle. Also, even when the VFs are vibrating without colliding at all, the EGG amplitude may still be nonzero, although very low. The variation in contact area is then much smaller and very nearly sinusoidal, occurring only at the very ends of the VFs.
The connection between the EGG waveform and the produced sound is indirect. A rapid rise in the EGG signal will usually, but not necessarily, corresponds to a rapid cessation of glottal flow. The normally abrupt transition in the EGG waveform from no contacting to some contacting is not typically associated with an abrupt change in the sound.
For quantitative interpretation, the EGG waveform is usually characterized using scalar time-domain metrics. These include (our symbols): (1) the peak-to-peak amplitude App, indicating the change in contact area; (2) the contact quotient Qc, estimating the portion of the glottal cycle for which the VFs are in contact; and (3) the peak amplitude of the time derivative of the EGG, dEGGmax, indicating the maximum rate-of-change in contact area (Figure 1).
• (1)
The App metric potentially holds one answer to our question: we would expect it to decrease with the decreasing adduction that might result from increased tracheal pull at high LV.
• Thomasson M
Effects of lung volume on the glottal voice source and the vertical laryngeal position in male professional opera singers.
Unfortunately, the strength of the EGG signal depends on many things, including skin conductance, tissue distribution, and especially the distance between the skin electrodes and the glottis. In singing, VLP relative to the EGG electrodes changes a great deal, and we do not wish to fixate a subject and her larynx while the LV is exercised during singing. For this reason, App will be treated sceptically in this study.
• (2)
The Qc metric, too, reports on the degree of adduction; and fortunately, it is independent of App. Several ways of calculating Qc are found in the literature, giving quite diverse results.
• Hampala V
• Garcia M
• Švec JG
• et al.
Relationship between the electroglottographic signal and vocal fold contact area.
• Herbst C
• Ternström S
A comparison of different methods to measure the EGG contact quotient.
We propose here a threshold-free definition, Qci, being simply the integral over time, of the EGG pulse normalized from zero to one in amplitude and zero to one in cycle time, in other words, the area under the normalized curve. The greater this area, the more contact there has been between the vocal folds, over the cycle. Because the vast majority of EGG pulses have only one peak, Qci is also highly correlated with conventional Qc metrics. In principle, the value of this integral can range from zero to one. In the present material, we have seen Qci values from 0.18 to 0.6 (Supplementary file A-13). The advantages of this metric are that it considers the relative amount of contacting over the entire cycle waveform, rather than the relative time of a threshold crossing or peak; it does not rely on the existence of identifiable peaks in the EGG derivative; and it is less sensitive to noise in the signal. Figure 2 shows how the integral Qci differentiates between three EGG pulse shapes that would have received the same Qc if calculated using a conventional threshold at 3/7 of App. In this study, we are looking for effects that may be subtle, which Qci may reveal.
• (3)
If tracheal pull does affect the VF configuration, then it might have an effect also on the speed of contacting, ie, the dEGGmax metric. In order to make the dEGGmax metric independent of App, we normalize it such that dEGGmaxN for a pure sine wave (no VF collision) receives the value one (representing the maximum derivative, or the peak of the cosine). Any VF contacting will give values greater than one, due to the contacting flank being steeper than that of the sine wave. This allows the comparison of maxima in the dEGG. In the present material, we have seen values of dEGGmaxN of up to six (Supplementary file A-14).
An ideal metric of VF contact would be absolute, in area units, and would approach zero in the noncontacting condition. Hence, Qci is not ideal, because it is computed from the normalized pulse waveform, and so even a very weak sine wave will receive a Qci of 0.5. In other words, while Qci increases from brief contact to long contact, it also increases from brief contact down to no contact, where Qci becomes 0.5, and this must be kept in mind when interpreting the results in section 4.6. Nor can Qci be any more equivalent to the closed quotient of the flow pulse than are other variants of Qc.
• Hampala V
• Garcia M
• Švec JG
• et al.
Relationship between the electroglottographic signal and vocal fold contact area.
Although one may conceive, in principle, of a VF vibratory pattern that gives a sinusoidal EGG wave shape andcomplete closure, it seems highly unlikely. Therefore, we will take a dEGGmaxN approaching one to indicate little or no contacting.

### 2.2 Harmonic-domain EGG analysis

Both Qci and dEGGmaxN are scalar values that conveniently summarize some property of the EGG waveform. Still, this waveform contains more information: a single number cannot describe it completely. As can be inferred from Figure 2, any number of different wave shapes may correspond to a given value of Qci or of dEGGmaxN. Here, we are searching for an unknown change in the EGG wave shape, and so we desire a more specific characterization, in more dimensions. Authors ST, AS, and Dennis Johansson have developed a software system, named FonaDyn, that does such an analysis, with visual feedback, in real time, and which has been released in full into the public domain.
• Ternström S
• Johansson D
• Selamtzis A
FonaDyn—a system for real-time analysis of the electroglottogram, over the voice range.
The remainder of section 2 introduces briefly those aspects of the FonaDyn analysis that are relevant to the present study.
In normal phonation, the EGG wave shape is cyclical: on completing a cycle, it returns to the starting point. This suggests the use of Fourier Descriptors (FD) for describing wave shapes, and so does the plausible connection to different vibratory modes of the vocal folds. FD-based EGG analysis was introduced earlier by authors AS and ST.
• Selamtzis A
Electroglottographic Analysis of Phonatory Dynamics and States.
• Selamtzis A
• Ternström S
Investigation of the relationship between electroglottogram waveform, fundamental frequency, and sound pressure level using clustering.
The FDs of the normalized EGG cycle correspond to the value pairs of amplitudes and phases of its harmonics, hence the term “harmonic-domain analysis.” In other words, it is a spectral analysis that is sampled in the frequency domain at exactly the harmonic frequencies. For any periodic EGG (as expected in singing), the nth FD (FDn) is the (amplitude, phase) pair of harmonic number n. For a purely sinusoidal wave shape, only FD1 has a nonzero amplitude. We can represent any cycle wave shape, to an arbitrary precision, by summing the cosine waves as specified by the FDs. In this study, we chose to use 10 FDs throughout, because this allows a visually acceptable reconstruction of the wave shapes, while harmonics >10 tend to approach the EGG noise floor, in soft phonation. This means that EGG wave shapes are completely described up to and including the frequency of 10fo. Any DC component (corresponding to FD0) was removed prior to analysis.
The EGG signal is automatically segmented into individual cycles.
The automatic segmentation algorithm is robust, but outside the scope of this article. A full description is given in the FonaDyn documentation.
Only cycles that meet certain threshold criteria of duration and regularity are accepted for further analysis. By computing the first 10 FDs for each cycle, we obtain a series of data frames, each containing 20 values, that are updated at the fundamental frequency fo. To remove the influence of App, and also that of the variable location of the cycle-segmentation trigger point, the amplitude and phase of FD1 are used as a reference, keeping only the level and phase differences ΔLn:
$ΔLn=20·log(|FDn||FD1|),Δφn=∠FDn−∠FD12 ≤ n ≤ 10$

Therefore, this harmonic-domain spectrum representation, by its very nature, is a cycle-normalized description of the EGG pulse shape. It contains no information about fo, the acoustic SPL, or the total EGG amplitude App. It contains only the harmonic levels and phases relative to the fundamental. For each glottal cycle, it creates one point in a space with 2 × (10-1) = 18 dimensions.
The FD representation of wave shapes as points in an M-dimensional space implies that the Euclidean distance between the points in this space will be a measure of the dissimilaritybetween two EGG wave shapes. A small distance means that the two EGG wave shapes are quite similar to each other. Conversely, a large distance indicates that the wave shapes are quite different. Given a space with M orthogonal axes, the distance |d| between points x and y is computed using the extension of the Pythagorean formula to M dimensions:
$|d|=∑i=1M(yi − xi)2$

Here, the coordinates are the relative levels and phases that describe a cycle-normalized wave shape.

### 2.3 Clustering of wave shapes

Clustering of data is a central textbook topic in machine learning. Here, we used the so-called K-means clustering algorithm, adapted for real-time operation.
• McFee B
More like this machine learning approaches to music similarity.
First, assume that there are already many data points in the M-dimensional space, each of which belongs to one of K clusters. For each cluster, a centroid location in the space has been computed. Each new incoming data frame (one per EGG cycle) is treated as a point in the space. Its position is compared to the positions of the K centroids, and the new point is assigned to the cluster whose centroid lies nearest. The centroid position of that cluster is then updated to account for the new point. As more points are accrued, the cluster centroids gradually adapt to the data. In our case, this means that the system ʻlearnsʼ to classify different EGG wave shapes into different ʻbinsʼ or clusters. The clusterer can be changed into a classifier simply by inhibiting the updating of the centroid locations.
The FonaDyn system performs the clustering in real time, so the data points arrive sequentially. If there are no prior data points, the very first one is assigned to all K clusters, whose centroids therefore all become the same. The next data point will have the same distance to all of them, so one cluster is picked at random, and its centroid is updated to account for the new point. This means that the outcome of the clustering is very sensitive to the location of the first few data points. Ideally, the initializing centroid should be close to the centroid of allsubsequent data points. This position, however, is not known until all the data has been seen. In practice, the analysis can be started when the subject is phonating somewhere in midrange.
The K-means clustering scheme is ʻhard,ʼ which means that every EGG cycle is classified into exactly one cluster. Other schemes exist that instead distribute points into several clusters with weighted probabilities.
The phases are problematic for clustering, in that they are cyclically bounded to the interval [−π, π) radians; and thus phase differences are bounded to [−2π, 2π). When such a bound is crossed, which sometimes happens, the wrap-around causes an abrupt discontinuity. This could be interpreted as a large change in distance, when in fact the wave shape has changed very little. In other words, the clustering space should be cylindrical rather than Cartesian. For this reason, each phase difference Δφ is instead represented as twonumbers, cos(Δφ) and sin(Δφ), which eliminates such discontinuities. Here, this increases the number M of dimensions to 3 × 9 = 27, but the computational penalty is small.
By looking at only the first 10 FDs, we lose the higher frequency components of the EGG. Therefore, FonaDyn estimates also the residual energy in the remainder of the EGG spectrum, and clusters it in a separate dimension. Furthermore, the phase φ1 of the fundamental is needed for reconstructing a wave shape from the cluster centroid. This adds two more dimensions, but they are weighted by 0.001 to minimize their influence. Thus, M finally becomes 3 × 9 + 1 + 2 = 30 or N × 3 for N FDs.
Measuring the harmonic magnitudes in a logarithmic unit introduces a biased sensitivity that increases with the harmonic number, because although higher harmonics are typically much weaker, the logarithmic level assigns equal weight to any level changes of k Bels (=10k dB), regardless of whether a harmonic is strong or weak. The phase, too, emphasizes changes in the higher harmonics, because a given time shift in the waveform translates to a phase shift that is scaled up by n for harmonic number n. Therefore, our definition of distance assigns greater weight to small details and medium-frequency features than would a uniform metric, such as the mean square error.
Once a set of appropriate cluster centroids has been found, the ʻlearning modeʼ is turned off, and the centroids are ʻfrozenʼ and saved. These fixed centroids can then be used to classify any signals into the learned categories, without further adaptation. The operator must experiment with the number of clusters, until an optimal separation of wave shapes is achieved. The criterion for “optimal” depends entirely on the research question. For a binary classification, such as modal-falsetto
• Selamtzis A
• Ternström S
Analysis of vibratory states in phonation using spectral features of the electroglottographic signal.
or normal-pressed,
• Nilsson I
Electroglottography in Real-Time Feedback for Healthy Singing.
a bare minimum of two clusters would have to be ʻlearned.ʼ In the present study, we have used five clusters per subject, as in most cases this appeared to give a useful separation into various EGG wave shapes across the voice range.
One great benefit of using clustering is that it automatically gives succinct descriptions of a highly variable phenomenon, without having to predefine thresholds or make any prior assumptions as to what EGG wave shapes might be expected. A cluster centroid also represents a useful average shape of all the cycles in a cluster, which, thanks to the levels and phases of the harmonics being relative, is not smeared by a shifting cycle trigger point or by a varying signal amplitude. By reconstructing the waveform from the centroid, we can obtain time-domain metrics such as Qci and dEGGmaxN of the average shape in each cluster.
(Note: since a reconstructed EGG signal is spectrally limited to the chosen number of harmonics (N = 10), this constrains also dEGGmaxN. Let us adopt an ideal saw-tooth wave as a reference for maximum contacting speed. Its harmonic phases are all equal to zero, and the amplitude of of the kth harmonic is Ak = 1/k. This limits dEGGmaxN of the reconstructed EGG signal to $∑k=1N(kk)=N$. For the original EGG signal, with all harmonics contributing, dEGGmaxN could be larger.

### 2.4 Measuring the percentage of vital capacity λ

Changes in lung volume can be estimated by means of respiratory inductance plethysmography (RIP). The subject wears two elastic transducer belts, one horizontally around the RC at the height of the armpits, and another around the abdomen (AB). A RIP instrument generates two signals that are proportional to the respective circumferences, which change during inhalations and exhalations. It has been shown
• Konno K
Measurement of the separate volume changes of rib cage and abdomen during breathing.
that, when signals from two such belts are correctly weighted, their sum, here RC + AB, varies nearly linearly with the lung volume. The RIP instrument carries also a potentiometer for the weighting, and a third output whose voltage represents the weighted sum RC + AB. With appropriate calibration using a spirometer, this sum can represent lung volume changes (in liters). The tracheal pull, however, can be expected to depend on the relativelung volume, in relation to the subject's minimum and the maximum. Therefore, calibration of the changes in lung volume in liters was not required. Also, since the objective was only to compare EGG wave shapes in a high range to a low range of lung volumes, the measurement linearity across that range was not considered to be crucial. In the present study, we are concerned only with the relative lung volume, expressed in percent of the subject's vital capacity, and hereinafter denoted λ (lambda). λ is thus a fraction with no dimension.

### 2.5 The fo-SPL plane

As the singer's voice changes from soft to loud and from low pitch to high pitch, phonatory conditions change, and so does the EGG waveform. The fo-SPL plane is well established for making so-called Voice Range Profiles (VRPs) that document this two-dimensional range of a voice.
• Ternström S
• Pabon P
• Södersten M
The voice range profile: Its function, applications, pitfalls and potential.
Although the full voice range was not exercised in the present study—only that of a song—we will still call this display format the “VRP.” As an example, a partial screenshot from the FonaDyn program is shown in Figure 3. The VRP cells(or “pixels”) are 1 semitone wide and 1 dB high.
Each cluster contains a large number of points that represent EGG cycles of a similar shape. For display, each cluster is assigned a color,
The order in which the K-means algorithm numbers the clusters is necessarily arbitrary (section 2.3), and may change on repeated "learning" runs with the same voice. The mapping of cluster numbers to colors therefore needs to be adjusted manually after each "learning" pass through a trial, so that the subject-specific correspondence between colors and wave shapes is maintained.
and its cycle counts are stored in a separate layer in the VRP (Figure 3e). Since a given fo and SPL can be reached with different laryngeal settings, EGG cycles from more than one cluster may occur at the same position in the VRP. The layers can be displayed one at a time, or overlaid, at each position showing only the color of the wave shape cluster with the largest number of EGG cycles.
Following the work of Pabon,
• Ternström S
• Pabon P
• Södersten M
The voice range profile: Its function, applications, pitfalls and potential.
we can map by color any scalar metric derived from the voice, into the VRP. In the Results section, we will thus display several selected metrics, across the voice range that was elicited by the acquisition protocol. Ultimately, we will arrive at the correlation and effect sizes of Qci and dEGGmaxN versus λ, thereby addressing the title of this article.

## 3. METHOD

### 3.1 Participants

Eight female singers (mean age: 35.1 years old, SD: 8.4 years old) with healthy voices were recruited. Age, voice classification, and taxonomy of each singer are shown in Table 1 according to reference.
• Bunch M
• Chapman J
Taxonomy of singers used as subjects in scientific research.
They reported having normal hearing and not having absolute pitch or suffering from pulmonary disease. They received two cinema vouchers each as compensation for participating. Ethical approval for the study was obtained from the Physical Sciences Ethics Committee at the University of York.
TABLE 1Participants with age, voice classification and taxonomy according to Bunch and Chapman (2000)
• Bunch M
• Chapman J
Taxonomy of singers used as subjects in scientific research.
SubjectAgeVoice classificationBunch and Chapman taxonomy
S144Contralto5.1 Local community club singers
S243Mezzosoprano3.1 National singer
S328Soprano4.1 Regional singer
S424Mezzosoprano7.2 Full-time student in singing
S540Mezzosoprano3.1 National singer
S640Soprano2.1 International operatic singer
S738Mezzosoprano5.1 Local community club singers
S824Soprano7.2 Full-time student in singing

A within-subject study design was adopted. Singers were asked to sing Frère Jacques (4/4, 8 measures long) from the score using only the vowel [ɑ:], only the syllable [pɑ:], and also to the original lyric, in the language of their choice (English, Danish, French, Finnish, or Swedish). They were also asked to sing each rendition of the tune legato, in a comfortable performance pitch range, and at a specified tempo of 100 bpm per quarter note, whilst taking shallow breaths every other bar as written in the score (Figure 4).
Participants were asked to sing the piece twice in direct succession (2 × 8 = 16 bars), at both high lung volume and at low lung volume, but in randomized order. Additional conditions for the singing were: three different dynamic levels (forte, mezzoforte, piano), and three different phonetic contexts (sing to the original lyric, on the vowel [ɑ:], and on the syllable [pɑ:]). This design resulted in 36 repetitions of Frère Jacques: 2 (lung volume conditions) × 2 (replications) × 3 (dynamic levels) × 3 (phonetic contexts). These were applied in a randomized order, and recorded in 18 takes. The subject would pause for 30–60 seconds between takes, while the operator registered the names of the recorded files. Most subjects stayed in the same key for all takes, but a few sang some takes in slightly different keys. This was not a problem, since more pitches were visited in this way. The procedural condition ʻhigh/low lung volumeʼ was included only as a means of eliciting a wide range of lung volumes from the subject. In the analysis, the ʻlung volumeʼ factor was replaced by a running measurement of the actual relative lung volume at 10 milliseconds intervals. Similarly, the condition ʻdynamic levelʼ was incorporated only as a means of eliciting different vocal efforts from the subject. In the analysis, the resulting SPL was used instead. No instructions were given regarding the choice of using ʻheadʼ or ʻchestʼ voice. The relative vertical larynx position was also measured, but the participants received no instructions as to larynx position. In the [pɑ:] condition, the intraoral pressure at the p-occlusion was measured, but with no instructions regarding breath support.

### 3.3 Equipment

Vocalists were asked to sing in a small room treated with absorptive acoustic material, with a volume of 50 m3 and a reverberation time of less than 0.1 seconds. The experimental setup is shown in Figure 5. Voice audio was acquired using a head-worn cardioid condenser microphone (AKG model C520, www.akg.com) placed near the cheek at approximately 2 cm from the lips. Respiratory signals were acquired with a RIP device (RespTrack, Department of Linguistics, Stockholm University) that transforms the belt elongations into voltages in the range of ±2V (section 2.4). Vocal fold contact patterns and vertical VLP were recorded by means of a dual-channel electroglottograph (EG2, Glottal Enterprises, www.glottal.com). Electrical interference (beating) between the EG2 and the RespTrack was noted (both come with a ≈2 MHz oscillator). The RespTrack was therefore refitted by the manufacturer with a nonstandard crystal of 4.194304 MHz, which moved the beat frequency to outside of the audio band.
Mean subglottal pressure was recorded as the intraoral pressure pio during p-occlusion, using a plastic tube with an inner diameter of about 3 mm. The tube was held by the subject at the corner of the lips, and connected to a pressure transducer (PG-100E, Glottal Enterprises). The pressure values are reported only in Supplementary file A-15. Unlike the other signals, they were acquired during periods of non-phonation, and thus are more weakly related to the phonatory settings. For each production of [pɑ:], only a single pIO value taken prior to phonation onset is available, whereas the other metrics are updated continuously during phonation (Table 2). Also, not all subjects were able to perform the [pɑ:] task consistently on the melody, without aspirating the [p] release; and the pIO signal was faulty for one subject.
TABLE 2Contents of a LOG file (tracks1-29), an EXTRA file (30-32), and post-computed data frames (33-35), one frame per EGG cycle
Track #Based onDescription
1clockStarting time in seconds, for this cycle(monotonically increasing, but not contiguous) (A-11, A-12)
2audiofo in floating-point semitones; 57.0 = 220 Hz. Updated at a fixed interval of 21.53 ms.
3audioSPL @ 0.3 m in calibrated dB re 20 µPa.
4audioclarity, a metric of periodicity 0…1 (cycles for which clarity <0.96 were rejected)
5audioCrest factor of the audio signal, the peak-to-rms ratio in dB (not reported here) (A-17, A-18)
6eggCluster number (1…5) as assigned to egg cycle wave shapes by FonaDyn (supervised learning)
7eggRunning estimate of the sample entropy SampEn of the egg cycle data (not discussed here) (A-19)
8…17eggLevel, in Bels re. full scale, of egg harmonic 1…N (here, N = 10 was used throughout)
18eggLevel in Bels of the residual (the power remaining when harmonics 1…N are accounted for) (A-16)
19…28eggPhase in radians of egg harmonic 1…N
29eggTwice the phase of harmonic 1 (for internal use in FonaDyn)
30a/dVertical laryngeal position (VLP), updated every 10 ms
31a/dIntraoral pressure during [p]-occlusion, pio, updated for every /pa/ (A-15)
32a/dλ (% vital capacity), updated every 10 ms
33eggEGG peak-to-peak amplitude Ap-p (based on FDs 1…10)
34eggIntegrated contact quotient Qci (based on FDs 1…10) (A-13)
35eggNormalized dEGG peak amplitudedEGGmaxN (ditto) (A-14)
Supplementary File A gives additional voice maps of several metrics that are interesting but are not discussed in the text.
For each subject and each experimental condition, two signal files were recorded using the bespoke software FonaDyn v1.3.6.
• Ternström S
• Johansson D
• Selamtzis A
FonaDyn—a system for real-time analysis of the electroglottogram, over the voice range.
The audio and EGG signals were recorded into a two-channel wav file (44.1 kHz, 16 bit integer resolution per channel) using a high-end digital audio interface (RME Fireface UCX, www.rme-audio.com). This file type will be referred to as audio + egg. The five physiological signals (VLP, pio, RC + AB, RC, AB) were recorded synchronously into a five-channel wav file (100 Hz sampling rate, 16 bit integer resolution per channel). DC coupling of the latter signals was achieved by using ±10V A/D converters (model ES-6 CV, Expert Sleepers, www.expert-sleepers.co.uk), linked to the audio interface via ADAT/TosLink optical fiber. The file type containing these signals will be referred to as extra.

### 3.4 Procedure

On the day of the appointment, singers were asked to read the information sheet and fill in the consent form and a background questionnaire. Then, the microphone, EGG electrodes, and respiratory bands were placed on each singer. The satisfactory placement of the EGG electrodes was verified by checking the indicator on the EG2’s bar meter, and the live EGG display on the computer monitor. The microphone gain was calibrated for sound pressure level for each participant, by matching the level on the FonaDyn display to that shown by a sound level meter (model 8922, AZ Instrument, www.az-instrument.com.tw) at 0.3 m from the subject's mouth. The pio signal was calibrated by exposing the transducer to pressures of zero cm H20 and 20 ± 0.5 cm H20 from a U tube manometer. This modest precision was deemed sufficient for the present study.
Setting up the acquisition of the respiratory signals involves (1) adjusting the relative gains of the respiratory band signals RC and AB, using an isovolume manoeuvre; and then (2) calibrating the sum RC + AB to minimum and maximum λ. For the isovolume manoeuvre (1), the subject inhales to about midcapacity, and then keeps the glottis closed so that the lung volume remains approximately constant. The singer subject then alternately expands and contracts the rib cage and the abdomen wall, while the operator adjusts the relative gains of the RC and AB signals so that their sum (as displayed by an on-screen trace) varies as little as possible.
• Konno K
Measurement of the separate volume changes of rib cage and abdomen during breathing.
For step (2), singers then performed three maximum inhalations and exhalations, to locate the maximum and minimum lung volumes. The maximum value thus obtained of the summed signal RC + AB was taken to represent λ = 100% of vital capacity, and the minimum to represent λ = 0%. Finally, the singers performed a series of relaxed sighs to locate the resting expiratory level. Singers were standing in an unrestricted upright position, but avoiding large torso movements, which could distort the respiratory signals.
After the calibration procedure, the task was explained to the subject: to adopt for each take a certain lung volume (“sing with nearly filled lungs” or “sing with nearly empty lungs;” and “take only shallow breaths between phrases”), as well as a given dynamic level and a given phonetic context. Singers then listened for a few seconds to a metronome set at 100 bpm, and were invited to familiarize themselves with the piece and the different conditions. Finally, the 18 takes were recorded, with short breaks in between. The conditions were presented in a randomized order that was different for each subject. Each take took about one minute. Singers were not aware of the purpose of the study. The session as a whole lasted for approximately one hour per subject.
A flow chart of the subsequent processing is given in Figure 6. The raw audio + egg files were first labeled manually, so as to indicate only the parts actually sung, using a signal editor (Swell 4.0, in the Soundswell Signal Workstation, www.neovius.se). This labeling excluded incidental extraneous sounds, such as throat clearing and dialogue with the operator. A script written in matlab (version R2017b, The MathWorks, Inc., www.mathworks.com) then unraveled the previously randomized order, and extracted and concatenated the labeled portions only, from both the audio + egg and extra files. Each of the 18 takes per subject was stored in a pair of wav files. At this point, the lung volume track was also normalized per subject to represent 0%…100% of the vital capacity readings initially acquired. For each subject, audio + egg files were concatenated into contiguous files for each phonetic context ([ɑ:] [pɑ:] lyric), of 4–6 minutes duration, with synchronous editing performed by the matlab script, on the corresponding 5-track physiological signal files.

### 3.5 Cluster analysis of EGG wave shapes

The script-modified audio + egg files were passed as input to FonaDyn. FonaDyn examines the EGG signal cycle-by-cycle, and applies a statistical clustering algorithm that dynamically classifies the cycles according to their wave shapes, as described in section 2.3.
The number of clusters must be chosen manually, with regard to the research question. We know from earlier work
• Selamtzis A
Electroglottographic Analysis of Phonatory Dynamics and States.
(1) that the EGG waveform varies considerably over the voice range (soft-loud, low-high), (2) that if SPL and fo are kept constant, then small EGG wave shape changes brought on only by vowel changes can be resolved by the clustering, and (3) that the EGG waveform can also vary considerably from one subject to another. Therefore, in order to interpret any influence of lung volume on the EGG, we wish first to understand the effects of SPL, fo, and phonetic context, for each subject. A first pass through the recordings was thus conducted to build ʻEGG mapsʼ per subject and per production condition, using five clusters. These clusters were ʻlearnedʼ from all productions of the [ɑ:] context, ie, the eight bars of the song repeated with all 12 combinations of high/low LV × 3 dynamics × 2 replications. The cluster centroids thus obtained from the [ɑ:] context were then ʻfrozenʼ and used to classify all three contexts ([ɑ:] [pɑ:] lyric).
This gave an overview of how the EGG varies over the voice range, for each subject and each phonetic context condition. The EGG maps and EGG wave shape reconstructions were done in matlab, from CSV data files saved from FonaDyn. The results from this mode of analysis are reported in section 4.1.

### 3.6 Relating EGG metrics to other physiological signals

FonaDyn optionally generates also a bespoke log file (.aiff, 32-bit floating-point) containing one frame of data for each accepted EGG cycle. The frame rate is thus variable and equal to fo. Silences are skipped. This file type will be referred to as log. The content of the log data frames is shown in Table 2.
The log files were read into matlab as a 29-column array, with one track per column, and one row per egg cycle. The number of cycles for any one subject, with all productions from one phonetic context concatenated, was on the order of 105. The VLP values, pio readings (on [pɑ:] only), and the lung volume percentages from the extra files were then time-aligned and appended into columns 30–32, respectively. Finally, a matlab routine reconstructed the EGG wave shape for every cycle from its FDs, and from this synthetic wave shape computed the App, Qci, and dEGGmaxN as described in section 2.1. These values were appended into columns 33–35.
To give the reader some feeling for these data, Figure 7 shows an example graph of tracks that are relevant to this study (fo, SPL, cluster number, λ, VLP) for subject S2 on [ɑ:] only. Two out of the 12 productions of the tune (Figure 4) can be seen in the fo graph at the top. For a shorter subset of that same excerpt, Figure 8 shows the relative levels and phases of EGG harmonics 2…5, and also the App, Qci, and dEGGmaxN, as computed from the resynthesized sum of all 10 harmonics. Some further comments are given in the figure captions.
The data format of Table 2 allows us to compute the correlations between the values in any two tracks. By selecting one fo-SPL position at a time, we can do so cell-by-cell in the VRP. Again, such a correlation is a scalar metric that can be mapped across fo and SPL. The results are visualized as correlation VRPs, in section 4.5. The color mapping is a gradient, from red for −1, through gray for 0, to green for +1. It cannot be assumed that any such relationships will be linear, so the Spearman's rank correlation was used rather than the Pearson correlation. Spearman's rho was computed for each VRP cell, and a significance threshold of p < 0.05 was imposed. If the correlation within a cell was not significant, then that cell is empty (white) in the graphs.

### 3.7 Comparing EGGs across conditions

Given two VRPs, each constructed from many EGG cycles but in separate replications or conditions, we would like to quantify and visualize the resulting amount of change in the EGG, also across the VRP. The distance in FD space between EGG wave shapes (section 2.2) is a scalar metric, that it can be mapped across the VRP, to yield a delta-VRP. To do so, we need, for each cell, a FD representation of the average wave shape in that cell. Such an average can be computed in two ways, either based on the five cluster centroids, weighting the influence of cluster n by the relative occurrence of such cycles in the cell (method OW, overlap-weighting); or by computing the FD centroid of all cycles in the cell, without wave shape clustering (method CC, cell centroids).
These two methods were compared, and the results were highly correlated, as can be seen in Supplementary file C. Since the clustered data are preprocessed by orders of magnitude, method OW is much quicker, and was adopted. The algorithm is as follows. Each VRP cell contains five cycle counts, one for each of the five clusters. If cycles from only one cluster have contributed to a cell, use that cluster's centroid as the averaged FD for the cell. If cycles from several clusters have contributed, compute an intermediate FD, by weighting the contributing centroids according to their relative occurrence in that cell. For example, if a cell contains 60% cycles from cluster 1% and 40% from cluster 2, compute a centroid that lays 60% of the way from centroid 2 to centroid 1. This works also for more than two contributing centroids. Then, cell by cell, compute the distance |d| between the weighted FDs derived from the two VRPs under comparison. Another advantage of method OW is that the existing centroid wave shape plots give some idea of what the wave shape difference looks like. The disadvantage is that, even within clusters, wave shapes can be different; and these smaller differences are lost, because of the binning into clusters. This is not a problem with method CC, because it does not involve any binning of wave shapes.
The OW method of difference analysis was used for assessing the magnitude of change in EGG wave shape across replications, phonetic context and λ conditions, as reported in section 4.3. The data sets used for comparing high and low λ conditions were constructed as follows. For each subject, the set of EGG cycle data frames was partitioned by percentiles of λ values into three sets, containing the lowest (<34%), middle, and highest (>66%) λ values. The middle third was discarded. The low-λ and high-λ sets were compared by making the delta-VRP, for each subject.

## 4. RESULTS

### 4.1 Overviews of the EGG over the range of the song

Figures 9 and 10 show examples from two of the eight subjects, chosen for illustrative strength. To reduce ʻdroopingʼ at voice offsets, a threshold has been applied such that cells must contain at least 5 EGG cycles to be included in these plots. Each of the five colors corresponds to an EGG wave shape as shown on the left. The reader is reminded that the clustering was done by the normalized EGG wave shape (section 2.3), not by proximity in the fo-SPL-plane.
The color plotted in any given cell is that which corresponds to the most frequently occurring wave shape at that location in the fo-SPL-plane. So if, for example, a given cell has accumulated 47% ʻredʼ cycles, 45% ʻyellowʼ cycles, and 8% ʻgreenʼ cycles, it is plotted in red. Note that the EGG wave shapes and their color mapping are specific to each subject. The cluster-to-color mapping was manually arranged post hoc for similarity across conditions; such that one (red) would correspond to the most firm vocal fold contact, and five (purple) to the least amount of contact. Because of individual variation, this principle could not be consistently upheld. The corresponding figures for all eight subjects are given in the Supplementary file A, with comments.
In Figure 9 (of subject S2), it can be seen that the color maps are similar for all three phonetic contexts. This means that the distribution of EGG wave shapes over the range of the song (second, third, and fourth panel from left) was highly consistent across these contexts. For subject S2, above 80 dB, the EGG wave shapes depend mostly on fo, with a progression from red to yellow and green. Here dEGGmaxN decreases with rising fo, while Qci increases. Below 80 dB, red is replaced by blue, with a slightly slower contacting and lower dEGGmaxN. The lowest levels, around 60 dB SPL, are associated with the softest phonation (purple). Finally, going from [ɑ:] to [pɑ:] to lyric, the maps become more speckled and less distinct, presumably due to the greater variation incurred by singing on syllables rather than on one vowel only.
Turning to subject 4 (Figure 10), the clustering again showed variability of the EGG shapes over fo and SPL. The distribution of EGG shapes is quite consistent across the three phonetic contexts. Above 70 dB, there are three main diagonal bandings from red, to yellow to purple, corresponding to different EGG wave shapes and indicating a covariation with SPL and fo. S4 was very consistent and had an unusually brief EGG pulse, quantified by the lowest values observed of Qci. The region below 70 dB is characterized mostly by the purple shape. At these low levels, the EGG signal is very weak and nearly sinusoidal, since there is little or no VF contact, so the Qci value for this cluster is not comparable to the others.
In summary, the examples in Figures 9 and 10 show how the EGG wave shape varied substantially across the song's range and across subjects. This variation was quantified using the intercentroid distances in FD space, and compared to the (smaller) variations incurred by phonetic contexts and high-low λ conditions, see section 4.4. The figures illustrate also how phonation within subjects was quite consistent across the three phonetic contexts, suggesting that the [ɑ:] context is in fact representative of more realistic singing.

### 4.2 Variation achieved in λ

While subjects were instructed to try to maintain either a high or a low lung volume for the duration of the song, they were not always entirely successful or consistent in doing so. For instance, a few subjects would indeed start with filled lungs in the ʻhighʼ condition, only to descend quite soon to a lower lung volume for the rest of the take. For each subject, the criterion for ʻlow lung volumeʼ and ʻhigh lung volumeʼ was set post hoc as the lowest third and highest third of all λ values observed in that subject (section 3.7). Table 3 shows the mean values of λ achieved under this criterion, i.e., the mean λ of the lowest third and the mean λ of the highest third, as well as the difference. It can be seen that the effective high-to-low difference in mean λ varied from 30.9% (subject S5) to 91.5% (subject S3). When looking at effect sizes for different subjects (Figures 15 and 16), one thus needs to bear in mind that some varied their average λ much less than did others. The distribution histograms of λ for all subjects are shown in Supplementary file C. The best yield of high/low lung volume comparisons would be obtained when the λ histogram is wide and clearly bimodal.
TABLE 3Performance of subjects in producing λ settings over a large range. λ was measured every 10 ms during phonation. Columns: (2) mean λ of the 34% of all measurements with the lowest values of λ (“low lung volume”); (3) mean λ of the 34% of all measurements with the highest values of λ (“high lung volume”); (4) the difference, or how much each subject actually varied her relative lung volume, on average.
(1)(2)(3)(4)
SubjectMean λ (%) “low LV”Mean λ (%) “high LV”Change in λ (%)
S14.9761.356.3
S221.372.851.5
S3-0.6290.991.5
S4-4.3471.976.2
S518.749.630.9
S613.259.346.1
S76.4860.954.4
S811.265.354.1
Average8.9%66.5%57.6%
When singing, some subjects unexpectedly exhibited a larger range in λ than during the initial max-min calibration. This is why some values in Table 3 are below 0% or rather close to 100%. The reasons could be an imprecisely performed calibration, unnoticed dislocations of the elastic bands of the RespTrack, or a warming up to the task over the rather long session. Strictly speaking, this means that the λ calibration for those subjects should be considered invalid. This is not a serious problem, however, since we have defined ʻlow lung volumeʼ and ʻhigh lung volumeʼ in terms of the ensemble of each subject's productions over the 36 repetitions of the song.

### 4.3 Subject consistency across replications

In order to assess any effect of λ on the EGG waveform, we need to know how large such an effect is in relation to the variation between replications in the same conditions. Therefore, each subject's consistency across replications was assessed, by making separate cluster VRPs of the two sets of replicated [ɑ:]-context conditions and then computing the delta-VRP between them (section 3.7). The result is shown for one subject in Figure 11, and for all subjects in the Supplementary file B, which also contains more detailed comments. The difference between EGG wave shapes was quantified using the Euclidean distance in the 30 dimensions between the FD centroids of two EGG wave shapes (section 2.2). This metric has no direct physical interpretation, because some of the dimensions are in Bels, and others are in sines or cosines (section 2.3). Still, the distance affords a relative quantification of how different wave shapes are from each other. To give an idea of its magnitude, Figure 11b shows an example of the distances computed for the five wave shapes (Figure 11a) that were clustered from the [ɑ:] productions of subject S2. Note in Figure 11b how this distance metric increases as wave shapes become visibly more dissimilar. For this subject, the cluster centroids were spaced by 1.41–3.96.
Comparing the two replications (Figure 11d and e) made by this subject, always at the same fo and SPL, the pair-wise distances between 335 replicated cells were small, with a median of 0.32 (Figure 11c). Hence the replications were much more similar to each other, than were the five clustered wave shapes that could portray the EGG variation of this subject over the fo and SPL range of the song. The same was generally the case for all eight subjects (Table 4 and Supplementary file B ).
TABLE 4Distances in EGG FD space resulting from replications of [ɑ:], change of phonetic context ([ɑ:] to [pɑ:], and [ɑ:] to lyric), and changes in relative lung volume λ (Section 4.6, and Supplementary File C). Each value in columns 2-5 is the median or mean over all cells in one delta-VRP. The distributions of these positive distances are skewed toward zero, so the medians are smaller and more representative than the means. Distances were computed using the OW method (see text). Column 6 gives the mean distances between each subject's wave shapes as clustered over the song's range (see Figure B-2 in Supplementary File B). These range-induced variations were much larger than the variations across context and λ conditions.
Subject[ɑ:] to [ɑ:] (replications)[ɑ:] to [pɑ:][ɑ:] to lyricAll contexts ʻhigh λʼ to ʻlow λʼ
Tokens per set6:612:1212:1218:18
Medians of all per-cell distances
S10.690.880.860.80
S20.320.710.510.30
S30.240.560.530.32
S40.560.620.760.63
S50.480.380.440.43
S60.530.510.580.58
S70.460.600.860.72
S80.430.400.430.31All distances between
Average0.460.580.620.51all five clusters
Means of all per-cell distancesMeans (SD)
S11.231.411.491.244.16 (1.96)
S20.451.610.640.472.59 (0.87)
S30.461.040.870.542.99 (1.42)
S40.720.820.990.803.62 (1.40)
S50.560.540.580.552.40 (0.98)
S60.660.660.720.702.43 (0.78)
S70.590.811.010.882.76 (1.10)
S80.480.450.500.391.51 (0.27)
Average0.640.920.850.702.81 (1.41)

### 4.4 Effect of conditions

Table 4 shows the median and mean distances between per-cell EGG wave shapes, with comparisons of the [ɑ:] replications, [ɑ:] to [pɑ:], and [ɑ:] to lyric. One purpose of Table 4 is to demonstrate that the variations in EGG shape that were incurred by replications were on average smaller than those that arose from changes in phonetic context. The averages in Table 4 confirm that the differences between replications were somewhat smaller than those between productions made in different conditions.
Table 4 shows also the distances incurred by differences in λ, computed as described in section 3.7. This was done for all phonetic contexts pooled (column 5). Compared to the EGG wave shape variation over the song's range (column 6), the influence of λ is quite small. Supplementary file C gives the individual results, where it can be seen that the largest distances tend to appear near the bottom of the SPL range. This implies that it is in soft singing that the lung volume has its largest effect on the EGG. We did not compute the significance of these differences, because they signify only the absolute distancesbetween centroids, regardless of their directionin the high-dimensional FD space. In other words, these results show that the EGG waveform didchange a little with lung volume, but not in what way. Two distances of equal magnitude might correspond to quite different changes of the EGG shape. To visualize the shape changes, we would need to reconstruct the wave shapes in very many cells, which the FD representation allows us to do. However, the effect sizes are quite small. In the interest of brevity, we will look at LV effects on the time-domain parameters only (section 4.6).

### 4.5 Correlations

As mentioned, the EGG peak-to-peak amplitude App must be interpreted with caution in this study. Figure 12 supports this caveat, exposing a strong but shifting correlation of App to the vertical larynx position VLP, visualized as correlation VRPs (section 3.6) for all subjects. This correlation would be positive when the vocal folds are below the electrodes (shown as green), and negative with vocal folds positioned above the electrodes (red). Most subjects exhibit simple patterns, but individually different. Overall, this result indicates that the EGG amplitude is contaminated by changing larynx elevation, and by the electrode placement, which might even have shifted a little over the course of the procedure. Although in principle it could be possible to make a subject-specific model for this dependency, and then compensate for it, this was deemed to be too complicated, and prone to misinterpretation.
If increased lung volume incurs a tracheal pull for which the singer does not compensate, then a lowering of the larynx might be expected in connection with larger lung volume. Figure 13 shows the VRPs of the correlation between λ and VLP. The expectation would be for negative correlation, i.e., a red color in the correlation VRP. For subject S8 this is indeed the case over the entire range of the song; to a lesser degree also for S2, S4, and perhaps S6. Subjects S3 and S7 show fairly strong correlations, but the polarity varies across the range (red–green). This means that S8 very consistently allowed her larynx to descend with increasing lung volume, while the others behaved differently. The effect size cannot be judged from the figure, and no other attempt was made to quantify it.
If increased lung volume leads to a decrease in adduction, presumably mediated by tracheal pull, then one might expect a reduction in Qci as λ increases. Figure 14 shows the VRPs of the correlation between λ and Qci. The general impression is mixed. The expected negative correlation (red) is clear, though weak, only for subjects S6 and perhaps S5.
One can make similar correlation plots of dEGGmaxN, and indeed of any scalar metric derived from the EGG harmonic levels or phases. The plots become similar to Figure 14 in character, but different in the details.

### 4.6 λ effect on EGG time-domain metrics

To answer the research question as posed at the outset, the size of the effects of λ on dEGGmaxN and Qci was assessed as follows. For each subject, the set of data frames (Table 2) from all phonetic contexts was partitioned by λ, as described in section 3.7. Then, for each VRP cell, high-λ and low-λ averages for these two metrics were computed. Finally, delta-VRPs were constructed from the per-cell differences in the metric averages.
Figure 15 thus shows the differences in average dEGGmaxN, between high λ and low λ, for all subjects. If larger tracheal pull at high λ has a lowering effect on dEGGmaxN, then we would expect the color in Figure 15 to be toward the negative (ochre color). In all subjects except perhaps S7, this effect can be seen at the very lowest SPLs, in varying degrees for the different subjects. In S1 and S7, the effect extends to about half of the cells in the delta-VRP.
Similarly, Figure 16 shows the differences in the average contact quotient Qci, between high λ and low λ, for all subjects. This metric is harder to interpret than is dEGGmaxN, since Qci increases not only when the EGG pulse widens because of a longer closed phase, but also when VF contacting ceases and the EGG pulse becomes weak and sinusoidal. If larger tracheal pull at high λ has a lowering effect on Qci, then we would expect the color in Figure 16 to be toward the negative (ochre color), except at the lowest SPLs, where reduced adduction actually leads to increased Qci, because of the more sinusoidal EGG wave shape. In all subjects except S2 we can thus again conclude that increased λ did have a small abducting effect at the lowest SPLs (turquoise color). At moderate and high SPL, the expected effect is seen only in S2 and S6 (ochre), and it is generally very small.
A plot in the manner of Figures 15 and 16 was made also for App, but the outcome was sufficiently similar to Figure 12 to confirm that a changing VLP had indeed overridden the effect of changes in λ on the EGG amplitude. Supplementary file D gives this plot for App and also for the spectral metrics ΔL2 (being analogous to H2–H1 of the voice spectrum), ΔL3 and ΔL4, which report on other aspects of the EGG waveform than do dEGGmaxN and Qci.

## 5. DISCUSSION AND CONCLUSION

This study sought to establish whether or not there is a measurable effect of the relative lung volume, expressed as percent of vital capacity, on the EGG waveform, in reasonably realistic singing. To detect such an effect, the variations in the EGG wave shape that are due to fo and SPL were first accounted for, by collecting spectral data for all EGG cycles in a voice range profile format. Using statistical clustering of wave shapes, a systematic overview per subject was obtained of these EGG wave shape variations. The distance in Fourier Descriptor space was taken as a metric of dissimilarity between EGG wave shapes, and was used to assess the magnitude of changes in EGG wave shape induced by fo and SPL, replications, phonetic context, and changes in percent vital capacity λ. The FD representation of wave shapes was also used to compute the time-domain metrics of normalized maximum contacting speed and normalized contact quotient. The findings can be summarized as follows:
• The shape of the EGG waveform is quite personal, yet within individuals varies consistently over fo and SPL. This variation in the EGG wave shape was much larger than the variations due to phonetic context or percent vital capacity λ. It was accounted for by making so-called delta-VRPs, in which EGG wave shapes are always compared at the same fo and SPL.
• Within subjects, the variation in EGG wave shape across the range of the song was found to be very similar in the different singing tasks [ɑ:], [pɑ:], or a lyric (section 4.1), with good reproducibility across replications (section 4.3). This means that EGG studies may be ecologically valid for nonphonetic conclusions about normal singing, even if only [ɑ:] vocalizations are used.
• At the lowest SPLs, the higher λ values were associated with small reductions in the normalized maximum contacting speed dEGGmaxN (Figure 15) and small increases in the contact quotient Qci (Figure 16). These observed effects of λ on the cycle-normalized EGG waveform are consistent with the abducting effect that is postulated by the tracheal-pull hypothesis. Note that the lowest SPLs shown here resulted from the productions sung in the softest dynamic, piano, which were still well above the threshold of phonation.
• Also at moderate and high SPLs, the eight subjects exhibited small yet statistically significant changes in dEGGmaxN and Qci with λ, but these changes were different from subject to subject, and not consistently in line with the tracheal pull hypothesis.
In this experiment, the subjects were free to compensate for changing forces acting on the larynx, for example, to increase glottal adduction in response to an increasing tracheal pull. While this is ecologically valid, it does confound a clear interpretation of some effects.
The main contribution of this study, however, is not really the effects of lung volume, but rather to demonstrate how the combined application of the VRP paradigm and of statistical clustering can provide an overview of complex and individual behaviors.
The VRP paradigm is applicable to any scalar metric of the voice. Connected regions of color appear in FIGURE 12, FIGURE 13, FIGURE 14, which are based on 12 takes of the song, and in FIGURE 15, FIGURE 16 (36 takes). This fact is in itself a confirmation of systematic changes in the portrayed metrics. The humbling observation is that it becomes clear how individual these systematic behaviors of the different subjects are.
Statistical clustering can help the researcher to see connections between data points that otherwise might not be readily recognized as being similar. Its value is clear from FIGURE 9, FIGURE 10, FIGURE 11 and Supplementary file A, where we see how a subject's typical EGG wave shapes are distributed across the range of the chosen song. Again, we observe that while the behavior of a given subject is consistent over many takes, different subjects exhibit different distributions of EGG wave shape. With this information, we are now perhaps in a position to construct a set of more generic EGG wave shapes, and use this single set to classify the EGG wave shapes of all subjects, across their voice range. This would enable more direct intrasubject comparisons. But we cannot take this step before being familiar with the range of variation that might be encountered. This will be especially true once the method is applied to the assessment of pathological phonation. Such studies are being planned.
We conclude from the ʻvoice mapsʼ presented herein that comparisons across conditions will, in general, require a matching of fo and SPL. Intrasubject comparisons are possible with careful control, and this is valuable, e.g., for assessing the effect of a treatment or training on an individual. However, direct intersubject comparisons of any given voice metric would be precarious to interpret. This more general notion has repeatedly been articulated by Pabon
• Ternström S
• Pabon P
• Södersten M
The voice range profile: Its function, applications, pitfalls and potential.

Pabon P, Ternström S. Feature Maps of the Acoustic Spectrum of the Voice. J Voice, e-publication available online 27 September 2018, https://doi.org/10.1016/j.jvoice.2018.08.014, open access.

; and it has profound implications for how we assess voices by measurements. To complicate matters further, although the present method of mapping the results onto the fo-SPL plane helps to account for variations due to fo and SPL, the temporal relationships, if any, are lost. For instance, results for a given fo and SPL might be different when fo is rising and when it is falling, depending on the immediately preceding configuration of the larynx and lungs; and such differences might be systematic. Exposing such temporal effects would require a different analysis.

## Acknowledgments

The visit of D'Amario to KTH in Stockholm, in April–May of 2017, was funded by a WROCAH White Rose Network Scholarship at the University of York. The study was also supported by KTH faculty funding (Ternström's time), as a continuation of work done under contract 2010-4565 with the Swedish Research Council. The funding sources were not involved in this research work. We wish to thank the singers for their voluntary participation, and Johan Stark and Peter Branderud at Stockholm University for their assistance in modifying the RespTrack device.

## References

1. Doscher BM. The Functional Unity of the Singing Voice. Metuchen NJ: Scarecrow Press; 1994.

• Vennard W
Singing: The Mechanism and the Technic.
Fisher, New York, NY1967
• Hixon TJ
• Goldman MD
Kinematics of the chest wall during speech production: volume displacements of the rib cage, abdomen, and lung.
J Speech Lang Hear Res. 1973; 16: 78-115https://doi.org/10.1044/jshr.1601.78
• Hixon TJ
• Goldman MD
Dynamics of the chest wall during speech production: function of the thorax, rib cage, diaphragm, and abdomen.
J Speech Lang Hear Res. 1976; 19: 297-356https://doi.org/10.1044/jshr.1902.297
• Bouhuys A
• Proctor DF
• et al.
Kinetic aspects of singing.
J Appl Physiol. 1966; 21: 483-496https://doi.org/10.1152/jappl.1966.21.2.483
• Watson PJ
• Hixon TJ
Respiratory kinematics in classical (opera) singers.
J Speech Lang Hear Res. 1985; 28: 104-122https://doi.org/10.1044/jshr.2801.104
• Watson PJ
• Hixon TJ
• Stathopoulos ET
• et al.
Respiratory kinematics in female classical singers.
J Voice. 1990; 4: 120-128https://doi.org/10.1016/S0892-1997(05)80136-5
• Thomasson M
• Sundberg J
Lung volume levels in professional classical singing.
Logop Phoniatr Vocology. 1997; 22: 61-70https://doi.org/10.3109/14015439709075316
• Watson PJ
• Hixon TJ
Respiratory behavior during the learning of a novel aria by a highly trained classical singer.
in: Davis PJ Fletcher NH Vocal Fold Physiology/Controlling Complexity and Chaos. Singular Publishing Group, Inc., San Diego, CA1996: 325-343
• Hoit JD
• Jenks CL
• Watson PJ
• et al.
Respiratory function during speaking and singing in professional country singers.
J Voice. 1996; 10: 39-49https://doi.org/10.1016/S0892-1997(96)80017-8
• Macklin CC
X-ray studies on bronchial movements.
Am J Anatomy. 1922; 35: 303-320
• Zenker W
• Glaninger J
Die Stärke des Trachealzuges beim lebenden Menschen und seine Bedeutung für die Kehlkopfmechanik.
Ztschr Biol. 1959; 111: 155-164
• Thomasson M
• Sundberg J
Lung volume and phonation: a methodological study.
Logop Phoniatr Vocol. 1996; 21: 13-20https://doi.org/10.3109/14015439609099198
• Thomasson M
• Sundberg J
Effects of lung volume on the glottal voice source.
J Voice. 1998; 12: 424-433https://doi.org/10.1016/S0892-1997(98)80051-9
• Pabst F
• Sundberg J
Tracking multi-channel electroglottograph measurement of larynx height in singers.
Scand J Logop Phoniatr. 1993; 18: 143-152https://doi.org/10.3109/14015439309101360
• Shipp T
Vertical laryngeal position during continuous and discrete vocal frequency change.
J Speech Lang Hear Res. 1975; 18: 707-718https://doi.org/10.1044/jshr.1804.707
Effects of inhalatory abdominal wall movement on vertical laryngeal position during phonation.
J Voice. 2001; 15: 384-394https://doi.org/10.1016/S0892-1997(01)00040-6
• Thomasson M
Belly-in or belly-out? Effects of inhalatory behaviour and lung volume on voice function in male opera singers.
Tmh-Qpsr. 2003; 45: 61-74
• Griffin B
• Woo P
• Colton R
• et al.
Physiological characteristics of the supported singing voice. A preliminary study.
J Voice. 1995; 9: 45-56https://doi.org/10.1016/S0892-1997(05)80222-X
• Thomasson M
• Sundberg J
Consistency of inhalatory breathing patterns in professional operatic singers.
J Voice. 2001; 15: 373-383https://doi.org/10.1016/S0892-1997(01)00039-X
• Zenker W
Questions regarding the function of external laryngeal muscles.
in: Brewer D. Research Potentials in Voice Physiology. State University of NY, Syracuse, NY1964: 20-40
• Milstein C.
Laryngeal function associated with changes in lung volume during voice and speech production in normal speaking women.
Unpublished doctoral dissertation. University of Arizona, 1999 (Available at:) (Accessed April 25, 2018)
• Thomasson M
Effects of lung volume on the glottal voice source and the vertical laryngeal position in male professional opera singers.
Tmh-Qpsr. 2003; 45 (Accessed April 19, 2018)
• Baken RJ
Electroglottography.
J Voice. 1992; 6: 98-110https://doi.org/10.1016/S0892-1997(05)80123-7
• Scherer RC
• Druker D
• Titze I
Electroglottography and direct measurement of vocal fold contact area.
in: Fujimura O Vocal Physiology: Voice Production, Mechanisms and Function. Raven Press, New York1988: 279-291
• Hampala V
• Garcia M
• Švec JG
• et al.
Relationship between the electroglottographic signal and vocal fold contact area.
J Voice. 2016; 30: 161-171https://doi.org/10.1016/j.jvoice.2015.03.018
• Herbst C
• Ternström S
A comparison of different methods to measure the EGG contact quotient.
Logop Phoniatr Vocol. 2006; 31: 126-138https://doi.org/10.1080/14015430500376580
• Ternström S
• Johansson D
• Selamtzis A
FonaDyn—a system for real-time analysis of the electroglottogram, over the voice range.
SoftwareX. 2018; 7: 74-80https://doi.org/10.1016/j.softx.2018.03.002
• Konno K
Measurement of the separate volume changes of rib cage and abdomen during breathing.
J Appl Physiol. 1967; 22: 407-422https://doi.org/10.1152/jappl.1967.22.3.407
• Selamtzis A
Electroglottographic Analysis of Phonatory Dynamics and States.
KTH Royal Institute of Technology, 2014 (Licentiate thesis, Stockholm, Sweden, 2014. ISBN 978-91-7595-189-8)
• Selamtzis A
• Ternström S
Analysis of vibratory states in phonation using spectral features of the electroglottographic signal.
J Acoust Soc Am. 2014; 136: 2773-2783https://doi.org/10.1121/1.4896466
• Selamtzis A
• Ternström S
Investigation of the relationship between electroglottogram waveform, fundamental frequency, and sound pressure level using clustering.
J Voice. 2016; https://doi.org/10.1016/j.jvoice.2016.11.003
• McFee B
More like this.
(Doctoral thesis in Computer Science) University of California at San Diego, 2012: 151-152 (Available online at) (September 2018)
• Nilsson I
Electroglottography in Real-Time Feedback for Healthy Singing.
KTH Royal Institute of Technology, Stockholm, Sweden2016 (M.Sc. degree thesis in computer science and communication. Available online at) (February 2018)
• Ternström S
• Pabon P
• Södersten M
The voice range profile: Its function, applications, pitfalls and potential.
Acta Acust United Acust. 2016; 102: 268-283https://doi.org/10.3813/AAA.918943
• Bunch M
• Chapman J
Taxonomy of singers used as subjects in scientific research.
J Voice. 2000; 14: 363-369https://doi.org/10.1016/S0892-1997(00)80081-8
2. Pabon P, Ternström S. Feature Maps of the Acoustic Spectrum of the Voice. J Voice, e-publication available online 27 September 2018, https://doi.org/10.1016/j.jvoice.2018.08.014, open access.