In a paper published in Genome Research on Nov. 4, scientists at the Genome Institute of Singapore (GIS) report that what was previously believed to be "junk" DNA is one of the important ingredients distinguishing humans from other species.

More than 50 percent of human DNA has been referred to as "junk" because it consists of copies of nearly identical sequences. A major source of these repeats is internal viruses that have inserted themselves throughout the genome at various times during mammalian evolution.

Using the latest sequencing technologies, GIS researchers showed that many transcription factors, the master proteins that control the expression of other genes, bind specific repeat elements. The researchers showed that from 18 to 33% of the binding sites of five key transcription factors with important roles in cancer and stem cell biology are embedded in distinctive repeat families.

Over evolutionary time, these repeats were dispersed within different species, creating new regulatory sites throughout these genomes. Thus, the set of genes controlled by these transcription factors is likely to significantly differ from species to species and may be a major driver for evolution.

This research also shows that these repeats are anything but "junk DNA," since they provide a great source of evolutionary variability and might hold the key to some of the important physical differences that distinguish humans from all other species.

The GIS study also highlighted the functional importance of portions of the genome that are rich in repetitive sequences.

"Because a lot of the biomedical research use model organisms such as mice and primates, it is important to have a detailed understanding of the differences between these model organisms and humans in order to explain our findings," said Guillaume Bourque, Ph.D., GIS Senior Group Leader and lead author of the Genome Research paper.

"Our research findings imply that these surveys must also include repeats, as they are likely to be the source of important differences between model organisms and humans," added Dr. Bourque. "The better our understanding of the particularities of the human genome, the better our understanding will be of diseases and their treatments."

"The findings by Dr. Bourque and his colleagues at the GIS are very exciting and represent what may be one of the major discoveries in the biology of evolution and gene regulation of the decade," said Raymond White, Ph.D., Rudi Schmid Distinguished Professor at the Department of Neurology at the University of California, San Francisco, and chair of the GIS Scientific Advisory Board.

"We have suspected for some time that one of the major ways species differ from one another - for instance, why rats differ from monkeys - is in the regulation of the expression of their genes: where are the genes expressed in the body, when during development, and how much do they respond to environmental stimuli," he added.

"What the researchers have demonstrated is that DNA segments carrying binding sites for regulatory proteins can, at times, be explosively distributed to new sites around the genome, possibly altering the activities of genes near where they locate. The means of distribution seem to be a class of genetic components called 'transposable elements' that are able to jump from one site to another at certain times in the history of the organism. The families of these transposable elements vary from species to species, as do the distributed DNA segments which bind the regulatory proteins."

Dr. White also added, "This hypothesis for formation of new species through episodic distributions of families of gene regulatory DNA sequences is a powerful one that will now guide a wealth of experiments to determine the functional relationships of these regulatory DNA sequences to the genes that are near their landing sites. I anticipate that as our knowledge of these events grows, we will begin to understand much more how and why the rat differs so dramatically from the monkey, even though they share essentially the same complement of genes and proteins."