Spectrogram

Loading…




This is a spectrogram of a human voice, broken down into its acoustic components. The y-axis represents the frequency (i.e., pitch); lower (bass) parts of the sound are at the bottom, higher parts are at the top. The brighter the line, the louder the constituent part is.

Press the spacebar to start or pause playback, and click on the image to (de)activate different parts. This allows you to hear how each part contributes to the overall sound.

More information

The horizontal lines are the harmonic partials. Each line represents a “pure” sine tone. Together, they form a harmonic sound perceived as one note. From a linguistic perspective, this is what forms the vowels (as well as some consonants like m and l). The frequency of the fundamental tone is usually around 100 Hz (which corresponds roughly to a G♯2), the first overtone is always twice as high (that is, an octave higher), around 200 Hz, the second overtone is about 300 Hz (an octave and a fifth), the third around 400 Hz, and so on. Try selectively listening to individual overtones to hear the intervals between them.

What distinguishes one vowel from another is the emphasis on certain overtones—you can see in the image that sometimes overtones at specific frequencies are louder than others. This is why vowels are more clearly audible when you have more overtones activated.

The parts of the image that appear more blurred are the inharmonic components of the sound. These make up most of the consonants, including fricatives (like s, f, th, v) and plosives (like c, b, g, t). These components consist of unvoiced noise rather than pure sine tones. While harmonic sounds are created by vibrating our vocal cords in our throat, these inharmonic sounds are created by the flow of air through our mouth.