What is Life?

Where did it come from?

A Very Sophisticated Code
no 2 Overlapping Genes
Overlapping genes are a remarkable feature of our genome

Imagine if you would being given the challenge of writing a piece of text that contained more than one meaning embedded into the same string of letters. Perhaps imagine that you are having to smuggle valuable information to others knowing your letter will be screened and censored.
That would be no easy task but it would be doable.

Imagine now that instead of just hiding your real intended message within some vaguely  plausible top layer of text, imagine that you had to write two layers of critically important detailed code, one on top of the other, in an effort to reduce the file size and improving download times. That is going to be more of a challenge. 


Here is how it works. Here we have our previous line of DNA from line 4, all nicely spaced out so we can see how the three nucleotide bases (letters) are combined to form Codons. (words) In reality there are no spaces between the DNA "letters" (this is what is referred to as a comma less code) The letters have been spaced in our example to help visualise what is occurring.

aca  aga  tgc  att  gtc  ccc  ctg  ccc  ttc  cga  tat  tag

So our first Codon is aca. But just look what happens if you begin reading the sequence of letters one letter later. (known as shifting the reading frame)

(a)  caa  gat  gca  ttg  tcc  ccc  tgc  cct  tcc gat  att ag(?)

Just as in the English language the vast majority of randomly chosen letters do not produce a meaningful sentence. The vast majority of randomly chosen Codons are not able to form a stable folded protein. 


Bearing in mind that the average human gene is around 80 Codons long it would be incredibly difficult to encode two precisely sequenced genes in the same string of DNA by simply shifting the reading frame, but that is exactly what has been discovered. It should be also noted that the same thing can occur by shifting the reading frame along by two bases allowing the possibility of a third functioning code.

Overlapping genes severely restricts the adaptability of mutations

A very interesting side effect of overlapping genes is highlighted in the following paper.

"Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence."

"The second particularly interesting feature of overlapping genes is that they represent a clear example of adaptive conflict. Indeed, they simultaneously encode two proteins whose freedom to change is constrained by each other which would be expected to severely reduce the ability of the virus to adapt"

Having two codes written within the same data string prevent mutational changes to one gene from being selected by an evolutionary process.
Why? because no matter how beneficial such a mutational change may be to the first gene it would almost certainly damage the second gene, and thus  "
severely reduce the ability of the virus to adapt"

Overlapping (or sometimes called nested) genes, are a stunningly clever  innovation which without question reduce the genome's file size and improve replication speeds, but are in fact a major obstacle to evolutionary development.

There is however much more to this problem that we have covered so far.
Researchers have also discovered that DNA is also written in two separate languages.

Two languages in use nested within same data string


Scientists from the university of Washington  were
stunned to discover that genomes use the genetic code to write two separate languages.
One describes how proteins are made, and the other instructs the cell on how genes are controlled. One language is written on top of the other, which is why the second language remained hidden for so long."

“For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made … Now we know that this basic assumption about reading the human genome missed half of the picture. These new findings highlight that DNA is an incredibly powerful information storage device, which nature has fully exploited in unexpected ways.”

"The genetic code uses a 64-letter alphabet called codons. The UW team discovered that some codons, which they called “duons,” can have two meanings, one related to protein sequence, and one related to gene control."

"We found that ~15% of human codons are dual-use codons (“duons”) that simultaneously specify both amino acids and TF recognition sites"

Please note, this discovery of two separate nested languages imbedded into the same data string is an additional level of complex coding to the previously discussed overlapping genes that are written in the same language. In fact, researchers have discovered that DNA is supremely optimised to carry multiple layers of additional information at the same time.


Take a note of the title of this paper,

The genetic code is nearly optimal for allowing additional information within protein-coding sequences


"DNA sequences that code for proteins need to convey, in addition to the protein-coding information, several different signals at the same time. These “parallel codes” include binding sequences for regulatory and structural proteins, signals for splicing, and RNA secondary structure. Here, we show that the universal genetic code can efficiently carry arbitrary parallel codes much better than the vast majority of other possible genetic codes."

The multiple layers of interwoven information imbedded into the DNA sequence is nothing short of breathtaking. Every different layer of additional information within a given DNA sequence makes the possibility of a random mutational change of actually being of net benefit to the organism vastly more difficult. In fact these features make genetic information highly resistant to change. In this context natural selection would act as a stabilizing force, not a driver of change as it is usually portrayed.

It is interesting to note that many of the features here mentioned have been discovered since
Bill Gates described the information in life as being "like a computer program but far more advanced than any software we have created".

The question we should be asking ourselves at this point is again:-

Does the discovery of such a supremely optimised code indicate an undirected natural process or a purposeful design?