The mechanics behind human voice production are unique and in many ways quantifiable. Understanding human speech and its perceived properties are an important factor when it comes to the development and engineering of communications equipment. The following are some salient points associated with the production of human speech.
Human speech results from air being forced from the lungs, through the vocal chords and along the vocal tract which stretches from an opening in the vocal chords to the mouth and nose. Speech is made up from a number of different types of sound which include voiced sound, unvoiced and plosive. The voiced sounds result from the vocal chords vibrating and thus interrupting the flow of air from the lungs and producing a frequency range of sounds of roughly 50 to 500Hz. Unvoiced sounds result when air passes some impediment in the mouth or constraint in the vocal tract. Finally, plosive sounds are sudden bursts of air being let out for example when the vocal tract is closed and suddenly released or the mouth is suddenly opened. All of these sounds are influenced by the person’s sinuses and nasal cavities and all make up what we understand as normal human speech.
The resultant sound and the mix of frequencies which are produced by these different sound sources are what determine the unique sound of each person’s voice. These range of frequencies can vary dramatically from one person to another. However there is a generality in frequency of human speech which can be (and has been) used for the basis of designing telephony equipment for decades.
Typically, frequencies in the range of 50Hz and upwards are generated in human speech. The majority of the energy is concentrated between 300Hz and 3kHz. The human ear, on the other hand, can detect sounds over a range of frequencies from around 20Hz to 20kHz with most sensitivity in the region between about 300Hz and 10kHz. With the account of these factors along with functional testing the frequency range of 300Hz to 3.4kHz has been found to be the most important for speech intelligibility and speech recognition.
Reducing this (300Hz to 3.4Khz) bandwidth can significantly reduce the speech intelligibility however increasing it has been found not to significantly improve recognition or the intelligibility. Please note that increased bandwidth will improve overall sound quality however the incremental gains in sound quality have to be weighed against increased frequency usage.
The frequency band of 300Hz to 3.4Khz is therefore used in our everyday telephone system. In reality this range of bandwidth provides exceptionally understandable speech and has been the basis of our society’s telephony equipment for many decades.
In a future article I will cover the topic of digitizing this (analog) voice into binary 1s and 0s, along with modern day packetization of these 1s and 0s also know as VoIP.
Voice Fundamentals, Nortel Networks, 2001