Junk DNA
© Kim Kyung-Hoon / Reuters
How many times have we heard it claimed that the vast majority of the human genome is "junk" and therefore could not have been designed? Even in the face of overwhelming evidence from the ENCODE project and numerous other studies showing that most of our genome has biochemical function, most evolutionists still maintain that our genomes are largely junk. But a few brave scientists, including some rare evolutionists, have been willing to buck that trend.

In a new article at Advanced Science News — "That 'Junk' DNA... Is Full of Information!" — Andrew Moore, the Editor-in-Chief of the respected biology journal BioEssays, comments on a new BioEssays paper. The paper finds that our DNA contains overlapping layered "'dual-function' pieces of information," including a "genomic code" that spans virtually the entire genome in order to "defin[e] the shape and compaction of DNA into the highly-condensed form known as 'chromatin.'" More about that paper in just a moment. It was written by leading Italian biologist Giorgio Bernardi who played a major role in the discovery of isochores. Isochores are important in this story. But for now, let's look at Moore's essay. It has something worth mentioning in almost every paragraph.

Moore starts by saying that it should not be surprising that there is more function in the genome than we initially expected:
It should not surprise us that even in parts of the genome where we don't obviously see a 'functional' code (i.e., one that's been evolutionarily fixed as a result of some selective advantage), there is a type of code, but not like anything we've previously considered as such.
What Side He's On

From an intelligent design (ID) perspective, Moore is absolutely correct: finding more function in the genome "should not surprise us." But Moore is not an ID proponent; he's clearly writing from an evolutionary perspective. Even as he describes extensive function in our genome, he frequently adds evolutionary "narrative gloss" just to remind you what side he's on. But within the evolutionary perspective, his support for mass genomic functionality does not represent the majority. There is a long history of evolutionary biologists predicting that non-protein-coding DNA is largely "junk." (See "Post-ENCODE Posturing: Rewriting History Won't Erase Bad Evolutionary Predictions.") As one example, in 1980 Francis Crick and Leslie Orgel wrote that "Much DNA in higher organisms is little better than junk," and "it would be folly in such cases to hunt obsessively for" its function. Numerous similar claims have been made over the years.

Though clearly evolution-based, Moore's perspective stands out in an important way: it is open to seeing coordinated function across the entire genome. Moore thus proposes an idea with which ID proponents would heartily agree:
And what if it [this other code] were doing something in three dimensions as well as the two dimensions of the ATGC code? A paper just published in BioEssays explores this tantalizing possibility...
So there are multiple layers of information in DNA controlling cellular processes that operate in multiple dimensions. Not only that, but as Moore explains, these codes are frequently "overlapping" within our DNA sequence:
One of the intriguing things about DNA sequences is that a single sequence can "encode" more than one piece of information depending on what is "reading" it and in which direction — viral genomes are classic examples in which genes read in one direction to produce a given protein overlap with one or more genes read in the opposite direction (i.e., from the complementary strand of DNA) to produce different proteins. It's a bit like making simple messages with reverse-pair words (a so-called emordnilap). For example: REEDSTOPSFLOW, which, by an imaginary reading device, could be divided into REED STOPS FLOW. Read backwards, it would give WOLF SPOTS DEER.
Though highly specified and difficult to produce by chance, overlapping codes are demonstrably present in our DNA. Proponents of intelligent design have long identified overlapping genes as a signature of design. For example, one chapter in the volume Biological Information: New Perspectives argues that "Multiple Overlapping Genetic Codes Profoundly Reduce the Probability of Beneficial Mutation." The chapter observes that, "DNA sequences are typically 'poly-functional'" with "overlapping protein-coding sequences" which "can contribute to multiple overlapping codes simultaneously." But the likelihood of producing such information-rich, tightly constrained sequences by chance is exceedingly low: "it is difficult to understand how poly-functional DNA could arise through random isolated mutations."

The Current Situation

How do overlapping codes relate to the current situation? Moore explains that these "'dual-function' pieces of information" are found throughout our genome where DNA can both encode proteins and simultaneously define a "genomic code":
For two distinct pieces of information to be encoded in the same piece of genetic sequence we would, similarly, expect the constraints to be manifest in biases of word and letter usage — the analogies, respectively, for amino acid sequences constituting proteins, and their three-letter code. Hence a sequence of DNA can code for a protein and, in addition, for something else. This "something else", according to Giorgio Bernardi, is information that directs the packaging of the enormous length of DNA in a cell into the relatively tiny nucleus. Primarily it is the code that guides the binding of the DNA-packaging proteins known as histones. Bernardi refers to this as the "genomic code" — a structural code that defines the shape and compaction of DNA into the highly-condensed form known as "chromatin".
This "genomic code" is thus a genome-wide feature, woven throughout our DNA, including portions of the genome that evolutionists have typically assumed had no important function. This code is defined by the "GC" content of a stretch of DNA — the level of base pairs that are guanine-cytosine (hence "GC") rather than adenine-thymine. In protein-coding DNA, the third base-pair in codons can often vary from AT/TA to CG/GC without affecting the amino acid being specified. Evolutionists have presumed that the precise nucleotide in this third base pair was irrelevant, so long as the codon was "synonymous," and that variation in the third nucleotide represented an important non-functional feature. But Moore explains that the third nucleotide in a codon can have great functional importance apart from merely specifying the amino acid, and could actually help define this "genomic code," which overlaps with the protein-code:
Protein-coding sequences are also packed and condensed in the nucleus — particularly when they're not "in use" (i.e., being transcribed, and then translated into protein) — but they also contain relatively constant information on precise amino acid identities, otherwise they would fail to encode proteins correctly: evolution would act on such mutations in a highly negative manner, making them extremely unlikely to persist and be visible to us. But the amino acid code in DNA has a little "catch" that evolved in the most simple of unicellular organisms (bacteria and archaea) billions of years ago: the code is partly redundant. For example, the amino acid Threonine can be coded in eukaryotic DNA in no fewer than four ways: ACT, ACC, ACA or ACG. The third letter is variable and hence "available" for the coding of extra information. This is exactly what happens to produce the "genomic code", in this case creating a bias for the ACC and ACG forms in warm-blooded organisms. Hence, the high constraint on this additional "code" — which is also seen in parts of the genome that are not under such constraint as protein-coding sequences — is imposed by the packaging of protein-coding sequences that embody two sets of information simultaneously.
An Application of Narrative Gloss

Moore's evolutionary bias is evident here as he repeatedly adds "narrative gloss," ascribing functional aspects of our genome to evolution, rather than simply describing the functional nature of DNA and leaving evolution out of it. But the substance of what he's saying identifies function in an aspect of the genome that evolutionists have frequently ignored as junk.

He goes on to explain that this genomic code is not limited to protein-coding sequences, overlapping with the code that specifies protein sequences. The code also persists throughout giant portions of our genome, characterized by repetitive sequences that evolutionary scientists have, again, frequently ignored as junk. Read the following carefully, and try to filter out the gloss. It basically admits that these massive segments of our genome are functional:
But didn't we start with an explanation for non-coding DNA, not protein-coding sequences? Yes, and in the long stretches of non-coding DNA we see information in excess of mere repeats, tandem repeats and remnants of ancient retroviruses: there is a type of code at the level of preference for the GC pair of chemical DNA bases compared with AT. As Bernardi reviews, synthesizing his and others' groundbreaking work, in the core sequences of the eukaryotic genome, the GC content in structural organizational units of the genome termed "isochores" increased during the evolutionary transition between so-called cold-blooded and warm-blooded organisms. And, fascinatingly, this sequence bias overlaps with sequences that are much more constrained in function: these are the very protein-coding sequences mentioned earlier, and they — more than the intervening non-coding sequences — are the clue to the "genomic code". ... In eukaryotic genomes, the GC sequence bias proposed to be responsible for structural condensation extends into non-coding sequences, some of which have identified activities, though less constrained in sequence than protein-coding DNA. There it directs their condensation via histone-containing nucleosomes to form chromatin.
What we see here is that major portions of our genome, traditionally viewed as junk, are actually full of "information in excess of mere repeats, tandem repeats and remnants of ancient retroviruses" because "there is a type of code at the level of preference for the GC pair of chemical DNA bases compared with AT." The purpose of the code, in short, is to direct DNA-packing in the nucleus.

And Now for Isochores

The genomic code is largely defined by huge GC-biased portions of the genome called "isochores." When you hear the word "isochore," think of humongous portions of our genome characterized by repetitive sequences of DNA that most evolutionists have typically ignored as junk, but that ID proponents have predicted as probably having function.

Giorgio Bernardi's paper in BioEssays provides an extensive discussion of the literature. It shows that isochores have "functional importance" and that the GC level of isochores defines an important "genomic code." Bernardi explains:
[T]he genomic code, which is responsible for the pervasive encoding and molding of primary chromatin domains (LADs and primary TADs, namely the "gene spaces"/"spatial compartments") resolves the longstanding problems of "non-coding DNA," "junk DNA," and "selfish DNA" leading to a new vision of the genome as shaped by DNA sequences.
Bernardi's view is that most of the genome is functional, contradicting the typical "junk DNA" perspective:
By the end of the 1980s, our knowledge of the isochore organization of the human genome had not only rejected what had been called the "bean-bag" view of the genome, that is, a collection of genes randomly scattered over vast expanses of "junk DNA"; but it had also indicated that the genome is an integrated structural, functional, and evolutionary system. This view arose from a comparative study of vertebrate genomes, centered on the analysis of their compositional patterns, namely of the compositional distributions of large DNA segments, coding sequences, and introns.
Thus, the presence of GC-rich isochores leads us to reject the "junk DNA" view. It indicates that "the genome is an integrated structural, functional, and evolutionary system." Ignoring Bernardi's evolutionary gloss, which wrongly assumes that integrated structural and functional systems can arise by blind evolutionary mechanisms, his statement is exactly what ID theory would expect. Bernardi continues explaining how we know that isochores are functional and carry the "genomic code" which "overlaps" with the genetic code:
The functional importance of isochores was already evident in the 1980s because of the correlations of their GC levels with all the genome properties tested. It was later confirmed by investigations carried out in the 1990s. ... The first indications that the base composition of isochores was under constraint came from the strong correlations between the composition of interspersed repeats, such as the GC-poor LINES and GC-rich SINES, and the composition of the GC-poor and GC-rich isochores, respectively, in which those sequences were located. The next step was the extension of the compositional correlations to genes (exons, introns, codon positions) located in GC-poor and GC-rich isochores, correlations that affect codon usage and amino acid composition of the encoded proteins. These points were subsequently reinforced, leading to the proposal that a "genomic code" was responsible for the compositional correlations just mentioned. As shown in Table S3, Supporting Information, the genomic code was further extended in the following years to include the sequence distributions, the functional properties associated with GC-poor and GC-rich isochores, and the structure and nuclear location of interphase chromatin.

Only recent investigations showed, however, that the genomic code: 1) is a "structural code" in that it directly encodes and molds chromatin structures and defines nucleosome binding; 2) is pervasive because it applies to the totality of the genome; 3) overlaps the "genetic code" and constrains it, by affecting the composition (but not the function) of coding sequences (and contiguous non-coding sequences), codon usage, and amino acid composition of the encoded proteins, as already mentioned.
A Striking Conclusion

Moore's article, describing Bernardi's findings, concludes strikingly:
These regions of DNA may then be regarded as structurally important elements in forming the correct shape and separation of condensed coding sequences in the genome, regardless of any other possible function that those non-coding sequences have: in essence, this would be an "explanation" for the persistence in genomes of sequences to which no "function" (in terms of evolutionarily-selected activity), can be ascribed (or, at least, no substantial function).

We may marvel at such complicated structures and ask "but do they need to be quite so complicated for their function?" Well, maybe they do in order to condense and position parts of the protein in the exact orientation and place that generates the three-dimensional structure that has been successfully selected by evolution. But with a knowledge that the "genomic code" overlaps protein coding sequences, we might even start to become suspicious that there is another selective pressure at work as well...
Moore doesn't specify what the other "selective pressure" is, but clearly he sees the functionally important "genomic code" as pervasive throughout the genome. So here's what we have: evolutionary scientists proposing that most of our genome's sequence has functional importance because it carries a genomic code, controlling the three-dimensional packing in the nucleus. This code even "overlaps" with the genetic code in protein-coding DNA. Such a perspective directly contradicts the evolutionary paradigm of a genome flooded with junk.

Why would evolutionary scientists like Moore and Bernardi step outside that paradigm? The answer is simple: Their views are driven by the data. Moore — or rather, more — power to them!