Advertisement
Research Article|Articles in Press

An Investigation in the Measurable Differences between Pitch Perception in the Voice and Pitch Perception of External Sound Sources

Open AccessPublished:January 27, 2023DOI:https://doi.org/10.1016/j.jvoice.2022.11.026

      Abstract

      Objectives

      Pitch perception is an important part of accurate singing. Therefore, accurate singing requires the ability to accurately assess the pitch in one's own voice.There are two objectives of this study the first was to investigate whether there is a measurable difference in perceived pitch in one's voice to the pitch one perceives from an external sound source. The second, to measure the effects of occlusion on pitch accuracy over a melodic phrase.

      Study Design

      We recruited 16 participants for this study. The study that was designed to investigate the perceptual difference was split into two parts. The first is a one-to-one pitch matching test where they would recreate a pitch with singing and matching external pitches. The second was singing the familiar song ‘Happy Birthday’ which was used to measure pitch accuracy over a melodic phrase and to measure the effects of occlusion on pitch accuracy while singing.

      Methods

      The one-to-one study involved singing back a series of 5 notes to a set vowel which were the same 5 notes used when matching them to a series of possible pitches on the button test. The melodic test was to sing ‘Happy Birthday’ 3 times, the first normally, the second wearing headphones to occlude the ear to reduce air conductive hearing and the third time with white noise to mask all hearing.

      Results

      The results showed a higher accuracy of pitch matching with external sounds over using their voice, and some form of occlusion (wearing headphones or headphones with white noise) showed the version with higher pitch accuracy.

      Conclusions

      The results of this study showed that there was improved pitch accuracy when comparing two external sounds in pitch and when singing occlusion of some form improved pitch accuracy. This could suggest a difference when recreating pitch between the voice and matching external sound sources. Furthermore, with the improvements shown from occluding the ears, it could further suggest a difference in pitch perception abilities between the voice and external sound sources. This could have implications of improving pitch accuracy in a studio environment.

      Key words

      INTRODUCTION

      The physical act of singing requires a source of energy; which is airflow; the sound source (or oscillator); which are the vocal folds; and the sound modifiers; which are the vocal and nasal tracts.
      • Fant G.
      ,
      • Titze IR
      • Martin DW.
      Principles of voice production.
      Being ‘in tune’ is a basic requirement of ‘good’ singing, and accurate perception of pitch by singer and listener is necessary to underpin and monitor excellent tuning when singing.
      Because pitch is a psychological attribute of voice as well as a physical one, the ability to characterize vocal pitch can be done through both subjective perceptual evaluation and objective acoustic assessment.
      • van der Woerd B.
      Voice Analysis with Iphones: A Low Cost Experimental Solution.
      Successful singing requires accurate voice perceptual skills, and to be able to sing accurately a singer needs to perceive their vocal pitch accurately.
      • Howard DM
      • Angus JAS.
      A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
      Loui et al.
      • Loui P
      • Demorest SM
      • Pfordresher PQ
      • et al.
      Neurological and developmental approaches to poor pitch perception and production.
      suggest that there are amusical individuals whose pitch production abilities do not necessarily reflect their pitch perception abilities and vice versa which could suggest that there can be a disconnect between being able to hear what is in tune and being able to sing in tune.
      The hearing system is an important element in perceiving and recreating pitch accurately. The fact that the larynx is internally linked via bone conduction to the ear canal means that ‘internal head transmission paths give colouration to the sound of one's voice that is quite different from the colouration that arises from external airborne paths around the head and within room and concert hall enclosures.
      v. Békésy G
      The structure of the middle ear and the hearing of one's own voice by bone conduction.
      • Békésy G∼.
      Experiments in Hearing.
      • von Békésy G
      • Peake WT.
      Experiments in hearing.
      The common experience that people perceive themselves as sounding different to how they expect when listening to a recording of their voice is because bone and muscle conduction filter out higher frequencies of the voice. When speaking or singing the sounds is quite directional and radiated from the nose and lips, at a meter distance at frequencies from C4 (261.6Hz) to C5(523.3Hz), ‘the intensity of the sound at the ears is less by half than that directly in front of the mouth’.

      Fourcin A. Hearing and Singing. Chapman J, Howard DM, eds. Fourth edition. ed. San Diego, CA: Plural Publishing Inc; 2021

      At these frequencies, bone conduction will be more important when hearing one's own voice because the sound travelling to the ear will be at a reduced level compared to the sound travelling from the nose and/or mouth. For higher frequency notes between ‘C7 (2093Hz) and C8, (4186Hz) the region of the singer's formant, the sound at a meter in front of the mouth is twice as loud as that at one meter from either ear’.

      Fourcin A. Hearing and Singing. Chapman J, Howard DM, eds. Fourth edition. ed. San Diego, CA: Plural Publishing Inc; 2021

      The tuning system used in reference to the musical notes is based on A440 in equal temperament. This would suggest that the sound received from AC (Air conduction) does not give the singer an accurate or full representation of the sound they are emitting due to a loss in sound intensity of their own voice.
      Howard and Angus
      • Howard DM
      • Angus JAS.
      A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
      conducted a listening test in which they found that trained singers could not discriminate better than 20 cents. This could be one definition of what is classed as in tune if a change in pitch cannot be detected. Small changes in pitch less than 20 cents could be described as timbral
      • Howard DM
      • Angus JAS.
      A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
      otherwise known as tone colour or tone quality. However, Sundberg
      • Sundberg J.
      Theœ Science of the Singing Voice.
      argues that ‘[i]nvestigations have revealed that […] [p]honation frequency errors exceeding 3 cents are detected by the most skilled musical ears, and musically trained listeners hear errors of 5 cents corresponding to about 0.3% of the frequency’.
      • Sundberg J.
      Theœ Science of the Singing Voice.
      A variance from a target note of 100 cents would be a semi-tone In a study by Cobes, she uses the term ‘uncertain singer’
      • Cobes CJ.
      The conditioning of a pitch response using uncertain singers.
      for those of her subjects, ‘who sang three uncertain pitches out of four’. The criterion for an ‘uncertain’ pitch was a sound, ‘…at least a half-step [50 cents] sharp or flat’.
      • Cobes CJ.
      The conditioning of a pitch response using uncertain singers.
      The importance of using cents when assessing fo is because it is a logarithmic scale of measurement between two frequencies, whereas Hz is exponential. The intervals between notes in Hz grow larger the higher the notes. This would make a comparison between a low male voice and a high female voice mis leading in results when comparing pitch accuracy. In this instance, 100 is equal to a semitone in western classical music and the unit of measurement remains the same regardless on the hight of the notes.
      • Howard DM
      • Angus JAS.
      Acoustics and Psychoacoustics.
      Loui et al
      • Loui P
      • Demorest SM
      • Pfordresher PQ
      • et al.
      Neurological and developmental approaches to poor pitch perception and production.
      evaluated the relationship between vocal pitch-matching skills and pitch perception skills in untrained accurate and inaccurate singers is one of a small number of similar studies to the current study that compares the relationship between pitch perception accuracy and pitch-matching accuracy.
      In a study conducted by Watts et al. it was found that pitch discrimination accuracy was the cause of more than 40% of error in a pitch matching task.
      • Watts C
      • Murphy J
      • Barnes-Burroughs K.
      Pitch matching accuracy of trained singers, untrained subjects with talented singing voices, and untrained subjects with nontalented singing voices in conditions of varying feedback.
      These results highlighted the importance of the relationship between pitch discrimination, or perception and pitch-matching. To accurately match a target pitch, the singer must simultaneously coordinate their auditory perception and their motor abilities of using their vocal mechanism to reproduce the perceived pitch.
      • Watts C
      • Murphy J
      • Barnes-Burroughs K.
      Pitch matching accuracy of trained singers, untrained subjects with talented singing voices, and untrained subjects with nontalented singing voices in conditions of varying feedback.
      The ‘sensation of the voice is a result of the transmission of frequency through the bone and muscle of the head. This transmission via bone and muscle conductions filters the higher frequencies creating a bias toward the lower frequencies.

      Fourcin A. Hearing and Singing. Chapman J, Howard DM, eds. Fourth edition. ed. San Diego, CA: Plural Publishing Inc; 2021

      This ‘darker’ sound bias in an individual's voice suggests that there is an existing perceptual difference in how we perceive our voices and how we perceive external or stimulus pitch.
      Von Békésy reported that these internal paths of acoustical energy transmission were responsible for about half of the self-perceived loudness of the speaking and, by implication, the singing voice.
      v. Békésy G
      Vibration of the head in a sound field and its role in hearing by bone conduction.
      This suggestion indicates a balance to how we hear our voices when we sing and subsequently, pitch. However, ‘[p]oor pitch singers may not necessarily be deficient in pitch discrimination, but rather have inadequate control over the voice mechanism in reproducing vocally a given pitch’.
      • Welch GF.
      This paper investigates measurable differences between how the pitch of a singer's voice is perceived by themself compared to the pitch of an external sound source. The ability to assess the accuracy of one's own singing is essential to good signing. Pitch perception accuracy is a basic requirement in singing and for the purposes of this work, it is defined as a particular quality of a sound (e.g., an individual musical note) that fixes its position in the scale.

      Haynes B, Cooke P. Pitch. Www.OxfordMusicOnline.com Web site.https://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.001.0001/omo-9781561592630-e-0000040883?rskey=JkVelZ. Updated 2001. Accessed 27/01/, 2022

      Pitch perception can also be described as different notes played or sung at the same volume a change of pitch can be detected. This is generally described as raising or lowering pitch which gives the listener the perception of higher and lower pitches.
      • Howard DM
      • Hunter EJ.
      Perceptual Features in Singing. Perceptual Features in Singing.
      The practice of ear covering is commonplace in choirs and vocal studios to help a singer ‘tune in’ to their own sound more. This study also measures and assesses the effects that ear canal occlusion (by wearing headphones or ear defenders) can have on pitching accuracy when singing a melody of the familiar tune ‘Happy Birthday’. The literature has stated that good singing requires accurate perceptual skills and when this isn't present it can result in singing out of tune whilst being unaware. These observations have also been noted in the authors experience as a voice teacher and singer. The purpose of this investigation is to provide evidence to enhance overall understanding of pitch perception in singing and to provide strategies to improve vocal pitching accuracy in practice.
      Two themes of this literature to consider in relation to this study is that perceptual skills are important in the process of pitch matching and that vocal coordination can be a factor in poor pitch singing separate to perceptual skills.

      METHODS

      Pitch discrimination accuracy
      • Watts C
      • Murphy J
      • Barnes-Burroughs K.
      Pitch matching accuracy of trained singers, untrained subjects with talented singing voices, and untrained subjects with nontalented singing voices in conditions of varying feedback.
      ; is a basic skill and two tests were devised firstly to assess pitch discrimination accuracy and secondly to see how the addition of ear canal occlusion of a singer to reduce air-condition (AC) affects the pitch perception of that singer and its overall effect on tuning accuracy. In these two tests, an occlusion of the ear canal was created to reduce air-conduction and enable the measurement of any resulting change in accuracy of fo (fundamental frequency) of the singer's sung notes and compare that occluded version to their normal singing.

      Procedures for data collection

      As part of the study, participants completed a questionnaire to give information on age, gender, any knowledge of hearing loss or impairment, headphones used in the study, and to rank their sung results. Screening of hearing loss or impairment in the questionnaire was to exclude outliers. No participants in this declared any hearing loss or impairment. All singers in this study were choral singers.
      At the time of the study, Covid-19 restrictions meant that conducting the study in person wasn't possible. This meant that data collection process was arranged to be carried out remotely based on clear instructions given to the participants to minimise the variation of the data sets recorded on different devices and microphones in different environments. A study which compared the use of a smartphone microphone against specialised microphone found that ‘. Calculation of fo […] [was] not significantly different when comparing the choice of microphone in this study’.
      • van der Woerd B.
      Voice Analysis with Iphones: A Low Cost Experimental Solution.
      However, as the recordings generated results relative to itself and not directly compared to another recording, it was decided that the variable factors would not be detrimental to the results. The study was conducted via Zoom

      Eric Yuan. Zoom. 2011

      and prior to the study taking place, a series of checks were done to test the quality of the microphone and the stability of the internet. Set up checks were done in the Zoom settings to enable original sound for musicians, high fidelity music mode and ensure supressing background noise was on low to ensure optimal recording settings. This was done on an individual basis based on the participants equipment by singing something short that they knew already or singing a single note loudly or at a high pitch. This was to check the individual microphone didn't clip any of the sound and disrupt the recording. In most cases, there was only one microphone option but where there was an additional option it was made sure that the same was used throughout.
      All the sound files were converted to WAV from MP3 and M4A that were taken from a Zoom

      Eric Yuan. Zoom. 2011

      recording, mobile phone or tablet or computerusing Audacity before being analysed for fo using Praat.

      Boersma P. Praat. 1991;6.1.49

      The frequency analysis range was set in Praat

      Boersma P. Praat. 1991;6.1.49

      to include fo; clipped data or formants would therefore not be excluded from this analysis range. As this study is only analysing fo and not spectral properties of sound, data samples were not affected by the variable factors of recording of different recording spaces, different equipment and different original file types.
      Uloza et al.
      • Uloza V
      • Padervinskis E
      • Vegiene A
      • et al.
      Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.
      found that the use of smartphones to be reliable in clinical settings for measurements of acoustic voice parameters. These findings, deemed it feasible to accept recordings made on smartphones. They were then asked to place their phone or recording device between 30 to 50cm from mouth to microphone and maintain this distance for the duration of the recording. This distance is based on what Titze describes as a commonly used distance
      • Titze IR
      • Winholtz WS.
      Effect of microphone type and placement on voice perturbation measurements.
      (30cm) with the additional margin if the proximity to the microphone was causing the sound to distort. Instructions for this were given in a participant information document prior to the recording.
      Issues that can arise when measuring pitch accuracy can relate to data collection and the tests implemented to assess pitch accuracy. Some studies measures only single pitch matching to a piano timbre, whereas others have participants singing a song from memory.
      • Loui P
      • Demorest SM
      • Pfordresher PQ
      • et al.
      Neurological and developmental approaches to poor pitch perception and production.
      There are many different ways in which singing can be assessed. It was important for this study to have some sort of calibration between perception and recreation of pitch as well as a more realistic representation of singing. This study was divided into two tests to assess a range of skills needed for singing in tune. Test 1 was a one-to-one pitch matching test designed to measure pitch perception and recreation using the voice and an external sound generator. This is because ‘successful singing requires perceptual skills’.
      • Howard DM
      • Angus JAS.
      A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
      Five sampled piano notes are played and after each note, the subject was asked to sing back the note to an ‘i’ vowel (see Table 1) after each note given with no overlapping. The ‘i’ vowel was chosen for the high tongue position and the benefits this has to singing. The external sound was created using a vocal synthesiser patch used in Vocal Tract Organ
      • Howard DM.
      The vocal tract organ: a new musical instrument using 3-D printed vocal tracts.
      based on the software Pure Data.

      Puckette M. Pure Data. 1996;0.52-2

      This is used in the form of a button test (see Figure 1). Participants selected pitch by clicking on numbered buttons, each of which played the test sound at a different pitch, and their task was to select the button which played the note that they believe was closest to the target pitch. The subjects were allowed to repeat any stage of the button selection and compare it back to the target note until they were satisfied with their answer. However, they were not able to overlap the notes to make the comparison.
      TABLE 1The 24 Consonants and 20 Vowel Sounds in English With Their SAMPA Symbols (Sym.), an Example Word (Word) and Its SAMPA
      • Wells JC.
      Computer-coded Phonemic notation of individual languages of the European community.
      Transcription (Trans.). The Voice, Place and Manner Descriptions are Listed for the Consonants.
      • Plural Publishing I
      • Murphy D
      Voice Science, Acoustics, and Recording.
      English ConsonantsEnglish Vowels
      Sym.WordTrans.VoicePlaceMannerSym.WordTrans.
      pPun/[email protected]/V-BilabialPlosiveiNeap/nip/
      bBuoy/bOI/V+BilabialPlosiveIJib/dZIb/
      tTide/taId/V-AlveolarPlosiveERed/rEd/
      dDeck/dEk/V+AlveolarPlosive{Anchor/{[email protected]/
      kCabin/k{bIn/V-VelarPlosiveAHard/hAd/
      gGalley/g{li/V+VelarPlosiveQLocker/[email protected]/
      TThwart/TwOt/V-DentalFricativeOPort/pOt/
      DWeather/[email protected]/V+DentalFricativeUFoot/fUt/
      fFog/fQg/V-Labio-dentalFricativeuFood/fud/
      vVang/v{N/V+Labio-dentalFricativeVRudder/[email protected]/
      sSea/si/V-AlveolarFricative3Stern/sT3n/
      zZenith/zEnIT/V+AlveolarFricative@Tiller/[email protected]/
      SShip/Sip/V-Palato-alveolarFricativeeIWeigh/weI/
      ZTreasure/[email protected]/V+Palato-alveolarFricativeaILight/laIt/
      hHeeling/hilIN/V-GlottalFricativeOIOilskin/OIlskIn/
      tSChain/tSeIn/V-Palato-alveolarAffricate@URow/[email protected]/
      dZJibe/dZaIb/V+Palato-alveolarAffricateaUBow/baU/
      mMast/mAst/V+BilabialNasal[email protected]Pier/[email protected]/
      nMain/meIn/V+AlveolarNasal[email protected]Fare/[email protected]/
      NRigging/rIgIN/V+VelarNasal[email protected]Fuel/[email protected]/
      wWinch/wIntS/V+BilabialSemi-vowel
      rRain/rein/V+AlveolarSemi-vowel
      lLee/li/V+AlveolarSemi-vowel
      jYacht/jQt/V+PalatalSemi-vowel
      FIGURE 1
      FIGURE 1Button test screen shot from Pura Data

      Puckette M. Pure Data. 1996;0.52-2

      used for test 1.
      The white buttons running horizontally numbered 1-11 are the variations of the target note set 10 cents apart ascending in pitch from left to right.
      The sung notes are analysed using Praat

      Boersma P. Praat. 1991;6.1.49

      for their fo (fundamental frequency) which is condensed into an average for the duration of the note and then plotted into a chart.
      Test two is designed to measure the effect of occlusion of the ear on tuning while singing and whether occlusion changes the perception of the pitch of an individual's voice to their overall tuning accuracy whilst singing a familiar song. The addition of occlusion adds the potential to attenuate own voice perception compared to air conduction which links back to the hypothesis: ‘Is there a measurable psychoacoustic difference in how we perceive pitch via air conduction and bone and muscle conduction. This test is to test pitch matching in a ‘real world’ scenario where they will sing the familiar song ‘Happy Birthday’. The findings will show whether there is a measurable difference in how pitch is perceived via bone and muscle conduction and air conduction.
      The participants were asked to sing the song ‘Happy Birthday’ three times starting on a given starting note appropriate for their voice fach or vocal range.
      The first time, subjects sang ‘Happy Birthday’ normally with no occlusion, the second time they sang it with a pair of headphones in or on (for in-ear bud style headphones or over/ on ear headphones) to provide air-borne sound occlusion. The third time it is sung wearing headphones, connected to the device playing the instructional track, with white noise playing through the headphones to disrupt anything they might still be hearing via air conduction. The same pair of headphones was used for both versions of ‘Happy Birthday’. The volume of the white noise was set to be at a comfortable level on an individual basis agreed by the participant.
      Each note from the song is analysed using Praat

      Boersma P. Praat. 1991;6.1.49

      instruction ‘pitch read’ which generates a measurement of fo for every 100th of a second for the duration of the note which gives a measure of the time point on the recording and the corresponding fo.
      As per test 1, an fo average is taken for each sung note in cents deviation from the starting note. An overall average is calculated and plotted (see Figure 2). Each note of the 25 for the song ‘Happy Birthday’ is plotted on the X-axis of Figure 2 where the accuracy of the following notes is quantified in cents as the variation from the equal tempered tuned version of the target note. A variation of 100 cents would be a semitone, and an octave variation would be 1200 cents and a variation. of 1150 cents as the sung response to an octave jump, would indicate that the singer was 50 cents (or a quarter tone) flat in pitch.
      FIGURE 2
      FIGURE 2Plot of 3 versions of happy birthday showing the intervals in cents across the 25-note series shown in SAMPA phonetic alphabet (See Table 1).

      Calculation of results

      The difference in cents (‘The cent is defined as one-hundredth of an equal-tempered semitone, which is equivalent to one twelve-hundredth of an octave since there are 12 semitones to the octave’;) is given in Equation 1
      • Howard DM
      • Angus JAS.
      Acoustics and Psychoacoustics.
      where F1 is the variable data point and F2 is the fixed-point average is:
      C=3986.3137xlog<ce:inf>10</ce:inf>F1F2.12
      (1)


      The results from the vocal attempt button test are calculated to show the error

      Boersma P. Praat. 1991;6.1.49

      from the target note in cents. An average of each fo Hz is then taken for the duration of the note where a fo readout is taken every 100th of a second; the average error for the note is then plotted on a chart (Figure 3).
      FIGURE 3
      FIGURE 3Plotted results of test 1 showing both vocal and button test attempts across the 5 notes of a single subject. This shows the error in cents from 0 on the X axis which is adjusted to represent each different pitch 1 -5. The adjustment is so that 0 on the X axis is representing 0 change in pitch for each note.
      FIGURE 4
      FIGURE 4Results of chart 2 equalised to only show error from target pitch. The Linear is showing pitch drift.
      The averages for all five notes in the voice and the button test of test one, and all 25 unique notes of ‘Happy Birthday’ for all three versions are plotted into charts to generate visual empirical data to demonstrate the comparative accuracy in test one and the overall melodic accuracy of test two.
      For test 1, Equation 1 is used to calculate the distance in cents between the sung note and the target note and likewise with the button test.
      The pitch drift is measured using the first sung note as the reference point rather than a target note This will show each singer's ability to stay in tune relative to themself. This is measured by interval accuracy, (as opposed to staying in tune with the exact Hz of the song ‘Happy Birthday’). Interval accuracy is quantified by calculating the frequency ratio between intervals in cents (e.g., ‘Ha-’ to ‘–ppy’ of Happy Birthday) using Equation 1 and repeated for each frequency ratio for all the notes in ‘Happy Birthday’. This enables inter-note musical intervals to be plotted as shown in Figure 3 when there is an octave leap, the expectation in to see an interval of 1200 cents or 12 semitones. This is done by subtracting the number of cents of the musical interval from the average of the sung data.
      The results of the two tests are expressed in cents to demonstrate a uniform comparison of pitch accuracy across different voice types.

      RESULTS

      The results of Table 2 show an overall average error from the target note in cents for the first test showing the vocal pitch recreation on the left and the button test pitch matching on the right. The averages show an overall average for all participants as well as a breakdown of the averages across the different gender groups and age groups which took part in the study.
      TABLE 2Results from Test 1.
      Test 1 Results VoiceTest 1 Results Button Test
      Overall averageOverall average
      -8.23-0.32
      Male averageMale average
      -7.484.83
      Female AverageFemale average
      -8.99-4.19
      21-30 Average21-30 Average
      -10.492.29
      31-40 Average31-40 Average
      -7.40-0.84
      41-50 Average41-50 Average
      6.7212.24
      51-60 Average51-60 Average
      -11.01-4.17
      61+ Average61+ Average
      -12.681.76
      The ratio of male to female participants was 50:50 which is shown in Table 3. Table 4 shows the average error in cents for the 3 different attempts of ‘Happy Birthday’ in test 2. The average error, or pitch drift (these are normally small deviations in pitch which cause the singer to drift off key and away from the target notes), is shown by +/- cents broken down into gender and age group as well as the overall trend for all participants.
      TABLE 3Demographic Break Down of Participants Taken from a Study Group which Included 8 Men and 8 Women. The age Groups are Shown in Table 2.
      Age21-3031-4041-5051-6061+
      6.003.002.002.003.00
      Gender
      F total8.00
      M total8.00
      Table 4Results from Test 2.
      Overall AverageOverall AverageOverall Average
      23.6239.4610.94
      Male averageMale averageMale average
      21.0035.4012.21
      Female averageFemale averageFemale average
      26.2342.789.66
      NormalOcclusionWhite Noise
      21-30 Average21-30 Average21-30 Average
      17.3426.6519.17
      31-40 Average31-40 Average31-40 Average
      10.4319.258.28
      41-50 Average41-50 Average41-50 Average
      15.4155.312.36
      51-60 Average51-60 Average51-60 Average
      68.74100.84-2.08
      61+ Average61+ Average61+ Average
      24.7533.7911.53
      The overall average error from the target pitch which demonstrates aural pitch matching using the button test (shown in Table 2) was -0.32 cents which as a total average suggests good pitch matching capabilities. An exaple of this can been seen in Figure 4. The range of the individuals average was from -18.84 cents (F4) to +27.61 cents (M1). The vocal pitch matching where the participant sang back the 5 pitches was on average -8.2 cents which is not as accurate as the aural pitch matching with matching the same 5 pitches with the button test which was -0.32 cents on average. The range for the individual average for this test was from -35.56 cents (M6) to +10.65 cents (M4) (see Table 5). Male subjects were, on average, -7.5 cents in the vocal pitch matching and -4.8 cents for the button test and female subjects were -8.9 cents on average when matching pitch with their voices and -4.2 of cents for the button test. The most accurate age group for vocal pitch matching were the 41′s to 50′s with +6.7 cents on average. The most accurate group for the button test was the 31-40′s with -0.84 cents which is a smaller margin to the target pitch that the best vocal pitch matching. Given that it was found that trained singers couldn't discriminate better than 20 cents
      • Howard DM
      • Angus JAS.
      A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
      the overall trend of pitch drift would still be classed as in tune based of previous definitions. It should be noted that the pitch accuracy is based in equal temperament which is the same tuning system a piano would use. This has been used because often singers are singing in reference to an instrument to give them their pitch or are accompanied by an instrument(s) which is also in equal temperament.

      DISCUSSION

      As this study was conducted remotely, using a range of microphones and devices for the recordings, a follow up study was conducted to investigate the effect the variable factors had on fo. Three different recordings of the same source audio were taking simultaneously using a mobile phone, a PC using a Rode NT55 microphone, and an Ipad recorded over Zoom.

      Eric Yuan. Zoom. 2011

      This equipment test recorded a piano playing 5 notes as in test 1.
      Each note in Table 6 was sampled over 10 cycles of the notes sound wave from peak to peak to get the same number of cycles for each result.
      The results from Table 7 show that there is less that a cent difference between the three recording devices in this test. The same calculation of cents is used from the calculation of results section. This means that there is less that a 1/100 of a semitone difference between the recording devices which is not a significant difference which would have impacted the results of this study.
      It could be argued that these findings show that there is a marginal difference between vocal pitch matching and matching using aural perception in which aural skills are more accurate. This could be due to the physical ability to match pitch and vocal coordination rather than an inaccuracy to assess the pitch of one's own voice accurately. These findings support the findings of Loui et al.
      • Loui P
      • Demorest SM
      • Pfordresher PQ
      • et al.
      Neurological and developmental approaches to poor pitch perception and production.
      where perceptual abilities reflected by the pitch production. This also supports the results of Watts et al.
      • Watts C
      • Murphy J
      • Barnes-Burroughs K.
      Pitch matching accuracy of trained singers, untrained subjects with talented singing voices, and untrained subjects with nontalented singing voices in conditions of varying feedback.
      where more than 40% of pitch matching accuracy was due to pitch discrimination error.The purpose of test 1 was to ascertain if the participant could match pitch both aurally and vocally. The results show the there is a small difference, but the aural perception and recreation of pitch is more accurate than that of the vocal pitch matching.
      The difference in pitch accuracy between aural and vocal attempts could be down to vocal coordination rather than misjudgement of pitch as they hear it in the voice.
      Test 2 is to measure the pitch drift in a melodic setting, comparing 3 different attempts of singing the familiar tune ‘Happy Birthday’, first time normally, second time with occlusion (by wearing headphones) and third time with white noise (by wearing headphones with white noise playing through them while singing). The results in Table 4 show that in all total averages for all 3 versions of ‘Happy Birthday’ drifted sharp. This could be a conscious drift in an attempt to stay in tune and not let the pitch drift flat as the pretext to the study was assessment of pitch perception. However, the version which showed the highest accuracy, and the least overall average drift was the white noise attempt.
      Table 5Average Pitch Accuracy Across Both Tests for Each Individual Participants as Well as Detailing Their Age, Gender.
      Test 1 Results VoiceTest 1 Results Button TestTest 2 Normal Average Pitch DriftTest 2 Occlusion Average Pitch DriftTest 2 White Noise Average Pitch DriftGenderAge
      F1−7.74−2.84−19.0612.77−11.83F31-40
      F2−7.74−2.8428.2615.4723.57F31-40
      F3−9.52−7.311.8931.1023.71F21-30
      F4−18.51−18.8433.807.74−2.13F61+
      M1−6.2827.6118.121.8940.00M61+
      M219.6223.6134.2039.89−5.38M41-50
      M3−23.093.61−21.54−3.2532.70M21-30
      F5−6.190.86−3.3970.7310.09F41-50
      F6−6.713.1622.0829.5113.09F31-40
      F7−7.74−2.8465.0475.2824.62F21-30
      M410.651.615.19−59.5110.07M21-30
      M52.32−0.3935.5060.0234.65M21-30
      M6−35.56−8.3917.9556.26−10.73M21-30
      M7−14.28−5.5056.2296.19−0.34M51-60
      F8−7.74−2.8481.25105.49−3.81F51-60
      M8−13.24−3.5022.3391.73−3.28M61+
      The use of occlusion changing pitch drift is based on a studio method of masking the singer's ears in some form to stop pupils self-assessing their own singing to improve pitch accuracy and quality of sound. The outcome of it positively affecting pitch drift in a melodic setting could be used in a studio and teaching setting to help with this issue.
      Table 6Results from the audio recording equipment test measured in Hz.
      PhoneP.CIpad
      Pitch Number
      1165.38165.32165.39
      2220.43220.40220.40
      3278.27278.35278.23
      4330.90330.77330.79
      5370.68370.61370.51
      Table 7A Comparison of Table 6 Results in Cents.
      Pitch NumberPC to PhoneIpad to PhoneIpad to PC
      10.610.190.80
      20.230.27−0.04
      3−0.560.21−0.77
      40.660.560.09
      50.340.81−0.47
      There is no obvious trend regarding age or gender in the results. A larger study would be required to assess if there is a trend based on demographic. Of all the participants who chose to answer the question rating the attempts of ‘Happy Birthday’ only 1 said they found the attempt with white noise to be their best and 3 said that occlusion was their best attempt whereas all other participants said the first attempt with no occlusion or white noise was the best, when in fact the data suggests that it was the second most accurate overall.

      CONCLUSION

      The results given in this paper show that on average within the sample group, there is a small difference in pitch matching abilities using one's voice and matching two external sounds, but it is noted that the latter of these was the more accurate. A larger study group would give more confidence in the results from this test. In addition. The differences in the results could be due to vocal coordination and the ability to achieve the desired pitch.
      The second test shows that on average, pitch drift is reduced with the introduction of occlusion to the ear canal.

      Declaration of competing interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgements

      The authors thank all the subjects for taking part in these experiments especially given the restriction during the COVID-19 pandemic.

      Appendix 1

      Appendix 1. This shows the calibration of the piano notes given in the case of an Alto voice demonstrating the target pitch error from exact fo in cents.

      Appendix 2

      Appendix 2. This is the questionnaire that the participants completed for the study (the questions regarding the study were completed once they had completed the tests).

      References

        • Fant G.
        Acoustic Theory of Speech Production. Vol 2. Mouton, 1970
        • Titze IR
        • Martin DW.
        Principles of voice production.
        J Acoust Soc Am. 1998; 104: 1148https://doi.org/10.1121/1.424266
        • van der Woerd B.
        Voice Analysis with Iphones: A Low Cost Experimental Solution.
        ProQuest Dissertations Publishing, 2019
        • Howard DM
        • Angus JAS.
        A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers.
        Logopedics Phoniatrics Vocol. 1997; 22: 169-176https://doi.org/10.3109/14015439709075331
        • Loui P
        • Demorest SM
        • Pfordresher PQ
        • et al.
        Neurological and developmental approaches to poor pitch perception and production.
        Ann N Y Acad Sci. 2015; 1337: 263-271https://doi.org/10.1111/nyas.12623
        • v. Békésy G
        The structure of the middle ear and the hearing of one's own voice by bone conduction.
        J Acoust Soc Am. 1949; 21: 217-232https://doi.org/10.1121/1.1906501
        • Békésy G∼.
        Experiments in Hearing.
        Acoustical Soc. of America, 1960
        • von Békésy G
        • Peake WT.
        Experiments in hearing.
        J Acoust Soc Am. 1990; 88: 2905https://doi.org/10.1121/1.399656
      1. Fourcin A. Hearing and Singing. Chapman J, Howard DM, eds. Fourth edition. ed. San Diego, CA: Plural Publishing Inc; 2021

        • Sundberg J.
        Theœ Science of the Singing Voice.
        Northern Illinois Univ. Pr, 1987
        • Cobes CJ.
        The conditioning of a pitch response using uncertain singers.
        Bull Counc Res Music Educ. 1972; : 28-29
        • Howard DM
        • Angus JAS.
        Acoustics and Psychoacoustics.
        5th ed. Routledge, 2017
        • Watts C
        • Murphy J
        • Barnes-Burroughs K.
        Pitch matching accuracy of trained singers, untrained subjects with talented singing voices, and untrained subjects with nontalented singing voices in conditions of varying feedback.
        J Voice. 2003; 17: 185https://doi.org/10.1016/S0892-1997(03)00023-7
        • v. Békésy G
        Vibration of the head in a sound field and its role in hearing by bone conduction.
        J Acoust Soc Am. 1948; 20: 749-760https://doi.org/10.1121/1.1906433
        • Welch GF.
        Improvability of poor pitch singing experiments in feedback [dissertation]. Institute of Education, University of London, 1983
      2. Haynes B, Cooke P. Pitch. Www.OxfordMusicOnline.com Web site.https://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.001.0001/omo-9781561592630-e-0000040883?rskey=JkVelZ. Updated 2001. Accessed 27/01/, 2022

        • Howard DM
        • Hunter EJ.
        Perceptual Features in Singing. Perceptual Features in Singing.
        Oxford University Press, 2019
      3. Eric Yuan. Zoom. 2011

      4. Boersma P. Praat. 1991;6.1.49

        • Uloza V
        • Padervinskis E
        • Vegiene A
        • et al.
        Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.
        Eur Arch Otorhinolaryngol. 2015; 272: 3391-3399https://doi.org/10.1007/s00405-015-3708-4
        • Titze IR
        • Winholtz WS.
        Effect of microphone type and placement on voice perturbation measurements.
        J Speech Hear Res. 1993; 36: 1177-1190https://doi.org/10.1044/jshr.3606.1177
        • Howard DM.
        The vocal tract organ: a new musical instrument using 3-D printed vocal tracts.
        J Voice. 2018; 32: 660-667https://doi.org/10.1016/j.jvoice.2017.09.014
      5. Puckette M. Pure Data. 1996;0.52-2

        • Wells JC.
        Computer-coded Phonemic notation of individual languages of the European community.
        J Int Phonetic Assoc. 1989; 19: 31-54https://doi.org/10.1017/S0025100300005892
        • Plural Publishing I
        • Murphy D
        Voice Science, Acoustics, and Recording.
        Plural Publishing, Incorporated, 2007