sound waves
© Coneyl Jay
The same words spoken by the same person can have completely different meanings depending on pitch and where emphasis is placed, and now researchers understand why.

Consider, for a moment, the comic genius of Mel Blanc, the renowned "man of a thousand voices" who played Bugs Bunny, Daffy Duck, Foghorn Leghorn, Porky Pig, and every other major male Warner Brothers cartoon character with the exception of Elmer Fudd.

Actually, late in his career, he even did Elmer Fudd.

Casual observers would never have guessed that each of those voices - many of them instantly recognizable to millions of people - came from the same actor.

But Blanc's ability to imbue each character with a distinct persona tapped into the human brain's predilection to place specific meaning on the combination of word choice, who is speaking, and how words are expressed.

Now, scientists have taken a significant step toward understanding how the brain accomplishes that task by identifying a set of neurons in the brain responsible for detecting relative changes in pitch in a human's voice as well as the absolute pitch differences between men and women.

"Pitch is the melody of speech," said Claire Tang, a graduate student at the University of California, San Francisco who led the study. "It can be used to change the meaning of what you're saying, even without changing the words. What we were trying to do with this study was find out how neurons in the auditory cortex respond to pitch in speech."

The study was published this week in the journal Science.

Unraveling how the brain perceives the nuances of human voices could eventually improve the capacity of artificially intelligent algorithms to generate human-sounding speech. In other words, if science can understand exactly how we process changes in pitch and tone, engineers might have a leg up on teaching computers to generate sounds like humans.

When it comes to spoken communication, pitch matters. The same words spoken by the same person can have completely different meanings whether the voice rises or falls in pitch and where emphasis is placed. Those differences can be subtle and dramatic.

Take, for example, this simple sentence: Bill called you.

By changing the pitch, you can create a question: Bill called you?

Moving the emphasis can indicate that certain parts of the sentence warrant surprise - for example, that it was Bill who called.

Or that Bill called you, not someone else.

Tang and her colleagues constructed an experiment involving 10 patients, each of whom had an implant monitoring their brain activity because they suffered from seizures.

The team repurposed the implants, using them to collect data about what happened in the brain when the patients were exposed to synthesized voices, which varied between male and female and in intonation and phonetic content.

The data revealed brain regions that respond in the same way to completely different sentences, so long as they are spoken with the exact same pitch pattern. In other words, the scientists located the parts of the brain associated with knowing the difference between a question and a statement and the many other pitch-dependent shades of meaning in between.

In the study, Tang and her colleagues monitored the activity of neurons in a region of the auditory cortex called the superior temporal gyrus, or STG. Some neurons in the STG seemed to distinguish different sentences based on the nouns and vowels used, no matter how they were spoken. Another group of neurons in the STG responded to changes in intonation patterns, including where the emphasis fell in the sentence.

"There really seems to be a separate group of neurons that respond to pitch rather than to the words themselves," Tang said.

The researchers also found distinct neural responses for male and female voices, which tend to have absolute differences in pitch.

How these insights might eventually inform the development of technology in the field of AI communication, of course, remains to be determined. But engineers are already staking out territory. Canadian startup Lyrebird, for example, recently unveiled a algorithm that it claims can clone anyone's voice after first listening to a minute of sampled audio.

"When you hear these artificial systems, something clearly sounds unnatural," Tang said. "What you're picking up on is that in natural speech, people use these pitch contours all the time. You can hear the meaning immediately. I think this research points out how important it is to get the pitch right and how important that part of speech is for conveying meaning."