Screen of 250,000 species reveals unexpected tweaks to genetic code

Dan Robitzski
The Scientist
Tue, 09 Nov 2021 12:00 UTC

A massive screen of bacterial and archaeal genomes revealed five previously unknown instances where an organism uses an alternate code to translate genetic blueprints into proteins.

The genetic code that dictates how genetic information is translated into specific proteins is less rigid than scientists have long assumed, according to research published today (November 9) in eLife. In the paper, scientists report screening the genomes of more than 250,000 species of bacteria and archaea and finding five organisms that rely on an alternate genetic code, signifying branches in evolutionary history that haven't been fully explained.

The genetic code refers to how sequences of DNA nucleotide bases lead to specific chains of amino acids during the process of protein synthesis. To perform this synthesis, ribosomes read strands of mRNA — copies of bits of the organism's genome — in chunks of three bases at a time. Each three-base sequence, known as a codon, binds to a specific transfer RNA (tRNA) that ferries a corresponding amino acid to the ribosome to the added to the protein chain. An organism with an alternate genetic code, like the five new instances that the study authors found, has codons that correspond to different amino acids than they would in the standard genetic code employed by the vast majority of known life forms.

"The genetic code has been set in stone for 3 billion years," study coauthor Yekaterina Shulgina, a Harvard University graduate student in systems biology, tells The Scientist. "The fact that some organisms have found a way to change it is really fascinating to me. Changing the genetic code requires changing ancient, important molecules like tRNAs that are so fundamental to how biology works."

As such, the code was thought to be largely preserved across all forms of life, with scientists finding the occasional exception during the past several decades of research. In addition to finding five new alternate genetic codes, the team also verified seven others that had been discovered one-by-one in the past, bringing the total number of known exceptions in bacteria to 12.

"I'm pleased to see that all of the results that we had so far came out in [the new paper]," Yale University biochemist Dieter Söll, who didn't work on the study, tells The Scientist. Söll has been studying the evolution of the genetic code for decades and was the first to find an alternate genetic code in bacteria. The team's methodology was "very good," he says — especially because of how it illustrates that "the genetic code is exceedingly flexible."

"To re-find everything that we already knew and to double the number of known reassignments" was particularly impressive, says University College Dublin genomic evolution professor Kenneth Wolfe, who also didn't work on the study. "What they're looking for is really rare."

Shulgina and study coauthor Sean Eddy, a biologist at Harvard and a Howard Hughes Medical Institute investigator, developed an algorithm called Codetta — named after the Rosetta Stone — that can screen an organism's genome and predict which amino acids its codons will add into a given protein. The algorithm quickly screens a genome and compares it to the organism's proteins in a database called Pfam. If enough variations from the standard code appear in a consistent manner, Codetta flags the organism as potentially using an alternate amino acid for a particular codon. From there, researchers can experimentally validate its work by looking for the predicted tRNA in the organism.

Codetta excels at finding genetic codes that are nearly the same as the standard. For example, all of the newly-discovered reassignments affect arginine, which is normally encoded by the codons AGG, CGA, and CGG. In some of the newly-discovered alternate genetic codes, these codons are reassigned to the amino acid tryptophan, which is already associated with the similar codon TGG.

Prior to the new study, all known alternate genetic codes in bacteria involved changing a "stop" codon that instructs cellular machinery that it's reached the end of a protein into one associated with an amino acid. The new results reveal the first alternate genetic codes in bacteria that represent a sense codon reassignment — that is, changing a codon from one amino acid to another.

Identifying these alternate codes is important, Eddy says, because scientists who predict what proteins an organism will synthesize based on its genome do so under the assumption that the standard genetic code is at play. Accounting for deviations, then, will improve the accuracy of those predictions and prevent errors from being codified into databases as more and more genomes are sequenced.

But these changes shouldn't happen under normal circumstances, Eddy explains.

"If you tried to change the meaning of a codon, you're essentially introducing simultaneous mutations all over the genome," Eddy tells The Scientist. "Every place where that codon is used, you just substituted an amino acid. It's just mind-boggling that an organism could survive that." Stop codon shifts are considerably less "dramatic," Eddy adds, because changing a stop codon to a sense codon doesn't really change the function of a protein, but just extends its tail.

Part of the reason changes do happen, Shulgina explains, is that some bacterial genomes may have a low composition of certain nucleotides compared to others. That brings the usage of codons that rely on those nucleotides down to nearly zero, making it easier for an organism to survive shifts without altering too many proteins in a drastic way.

"At least in bacteria, it seems like these sorts of forces might explain why the genetic code evolved this way," Shulgina says. "This might be totally different if we looked in other forms of life like eukaryotes." Exceptions to the standard genetic code have been found in single-celled eukaryotes such as yeast, but experts expect alternate codes to be rare in more complex eukaryotic organisms.

Tracing down why these alternate genetic codes emerged during evolutionary history is difficult, multiple researchers tell The Scientist, in no small part because humans couldn't watch it happen. But the authors do have some hypotheses.

In one case, Shulgina identified a bacterium that uses the same alternate code as a bacteriophage virus that infects it, indicating that the bacteria seemingly evolved an alternate code that prevented its cellular machinery from being hijacked — and that the phage may have then made the same adaptation to follow its host.

To that end, bacteria and archaea represented the easiest test run for Codetta. Shulgina, Eddy, and other researchers say they're eager to see what the algorithm finds in eukaryotic life; running a screen on a eukaryotic genome would, however, be difficult because of how much noise would be added to the signal. Eukaryotic genomes are full of non-functional pseudogenes, Shulgina explains, that would either need to be filtered out or accounted for in a Codetta update lest they confuse the algorithm.

The team also ran Codetta on the genome of yeast, a eukaryote notorious for having alternate genetic codes, and made a new discovery that helped validate its predictions for bacteria. At one point, the algorithm wasn't confident enough to assign an amino acid to a particular codon. Upon further investigation, it turned out that the yeast uses the same codon to encode two different amino acids.

"My method isn't built to find that; it was built to find different codings, not ambiguous codings," Shulgina says. "Codetta didn't want to pick any amino acid for that codon, so in a sense it failed in a very correct way."

Eddy and Shulgina plan to refine the algorithm and say that knowing these limitations in advance will enable other researchers to take their work and run genomic screens of their own. To that end, Shulgina and Eddy have made Codetta available on GitHub to any researchers who may want to use it or develop their own version.

"As we are only scratching the surface of the real microbial biodiversity, I think it is very likely that Codetta will be helpful in the discovery of additional codon reassignments," University of Cork biochemist Pavel Baranov, who didn't work on the study, tells The Scientist over email. "I am eager to try it myself."

Still, Codetta's computational predictions must be validated with mass spectroscopy and other tools, University of Göttingen systems biologist Martin Kollmar tells The Scientist.

In particular, Kollmar would have liked to see more instances of experimental verification, such as double-checking that the proteins synthesized by alternate genetic codes had the expected amino acid composition. But, he adds, "this is hard work and out of the scope of that paper."

Meanwhile, every researcher who spoke with The Scientist said they expect Codetta to continue to find new exceptions to the rules laid out by the genetic code.

"This study is a nice illustration that there is almost nothing universally conserved [across all life]," says Vanderbilt University biologist Antonis Rokas, who was not involved in the work. "Biology is just, in some ways, the science of exceptions."

Comment: See also:

And check out SOTT radio's: