Skip to content

Retrogenes – YOU CAN’T EXPLAIN THAT!

April 23, 2011

Everyone* knows that DNA codes for proteins. In between the DNA and the protein though, there is an intermediate molecule called RNA – ribonucleic acid (DNA is deoxyribonucleic acid). RNA is similar to DNA in many respects – long chain of bases that we can consider “letters”, which form a code – but it’s less stable, and thus less suitable for long-term storage of information (that’s DNA’s job). RNA has many uses in the cell – the actual machinery that makes proteins? That’s made of RNA! – but for now we’ll just concentrate on the mRNA.

That’s right, I said mRNA. It stands for messenger-RNA, to distinguish it from the various other types of RNA. This is the bit that gets directly copied (transcribed) from DNA; and this is what forms the template for translation to a protein. (Transcription and translation are technical terms. Don’t get them mixed up.)

But sometimes, things get a little mixed up. Sometimes, instead of going on to make a nice protein, the mRNA gets reverse transcribed back into DNA – and gets snatched back into a random place in the genome. This is called a retrocopy.

Now even a perfectly good bit of code is useless in a genome without something to flag it up to the cell’s machinery. These bits are called promoters. Promoters are in the non-coding DNA just outside of the gene proper, so they aren’t found in the subsequent mRNA. Thus retrocopies don’t have any promoters.

Without promoters, the retrocopies are just playing to an empty room, so they tend to degenerate into pseudogenes – they look like the parent gene, but they’re non-functional, and they tend to accumulate mutations. These exist primarily to screw up my alignments.

But once in a while, a retrocopy will find itself inserted next to some pre-existing promoters. Now it’s a whole new copy of the original gene – a retrogene.

The twist? Many genes in eukaryotes (that’s essentially every living thing except bacteria and viruses) have gaps in the coding sequence – bits of DNA that don’t get transcribed. These are called introns, because they’re in-between the coding sequence. (The bits that get expressed are called exons. Again, don’t get them mixed up.)

But the retrogenes come from mRNA. And mRNA has the introns already cut (spliced) out. So retrogenes don’t have any introns – that’s how you can tell which one is the original, or parent gene, and which one is the copy. Neat huh?

The final thing that can happen is that a retrocopy ends up not only with a new promoter, but also close enough to another bit of coding sequence that this new bit gets transcribed along with it, all as one gene. This is called a chimeric retrogene – a gene composed of different original bits.

OK? Everyone up to speed?


Zhu, Z., Zhang, Y., & Long, M. (2009). Extensive Structural Renovation of Retrogenes in the Evolution of the Populus Genome PLANT PHYSIOLOGY, 151 (4), 1943-1951 DOI: 10.1104/pp.109.142984

Up until recently, it was thought that retrocopies and retrogenes were mostly an animal thing, and didn’t really play a big role in plant evolution. This was mostly based on the fact that very few retrogenes were found in Arabidopsis thaliana, the major model species in plant genetics. This rather neatly highlights the problem of basing judgements on a whole kingdom on one species. As soon as researchers started looking elsewhere, they found LOTS of retrogenes, including lots of chimeric retrogenes (Charlesworth et al., 1998; Wang et al., 2006).

The current paper looked at retrocopies in Populus trichocarpa, still the major model tree in genetics (yes, I know I just said that using model species was inherently flawed… but you gotta start somewhere, that’s why we have them).

The first step, and really the core of the paper, is their “pipeline” for identifying potential retrocopies. They started with the whole Populus genome, got 71,278 candidates… and ended up with 106 retrocopies. That’s a hell of a lot of narrowing down; I’m not going to break down the specifics here, but they admit that their criteria for inclusion tended towards the stringent. By using the same method with Arabidopsis, they ended up with 32 out of 69 previously identified – clearly, they were prioritising finding unambiguous cases rather than all possible cases.

Then they took their 106 retrocopies and tried to figure out if they were just copies or actual functional genes – retrogenes; and if so, whether they were straight-up copies of the parent, or if they were chimerised. They found 95 out of their 106 copies showed evidence that they were being transcribed. That’s 89.6%. As they state, for comparison, only 16% of retrocopies so far identified in the human genome are potentially functional.

Whoah, that’s some serious stuff. But wait, let’s think about this… Their “pipeline” was super-stringent, as we already said. They were most likely throwing out a lot of potential retrocopies along the way. Have they really found that “OMG 89% of Populus retrocopies are functional!?!” Or is it that functional retrocopies are more likely to survive their inclusion process, giving a biased proportion? I would guess the latter, and my major criticism of this paper is that I think they were a bit premature in throwing this out as the major finding.

Because – let’s not mess around here – they found 95 potentially functional retrogenes in the Populus genome! In one study! That’s awesome! And wait – it gets better!

Not only were there 95 retrogenes, but there were 12 chimeric retrogenes. This is pretty cool of itself. But they also found what may be a NEW way for retrogenes to produce novel genetic information: intronisation. Some of the retrogenes generated a new intron out of previously coding DNA.

This is big! As you’ll recall, one of the features we expect from retrogenes is that they don’t have any introns – because they come from mRNA, where the introns are already spliced out. These ones not only have introns – they’re different introns from the parent gene.

So to sum up: plants have retrogenes; they may have a LOT of retrogenes; intronisation of retrogenes is potentially a new mechanism for generating new genetic information; and the authors really downplayed that in their own paper in favour of a number (89%!) which I don’t trust. Oh well.

Charlesworth, D., Liu, F.L. & Zhang, L. The evolution of the alcohol dehydrogenase gene family by loss of introns in plants of the genus Leavenworthia (Brassicaceae). Molecular Biology and Evolution 15, 552 -559 (1998).

Wang, W. et al. High Rate of Chimeric Gene Origination by Retroposition in Plant Genomes. The Plant Cell Online 18, 1791 -1802 (2006).

Zhu, Z., Zhang, Y. & Long, M. Extensive structural renovation of retrogenes in the evolution of the Populus genome. Plant Physiol 151, 1943-1951 (2009).

3 Comments leave one →
  1. April 25, 2011 00:35

    The footnote for “Everyone” is missing. This causes mental blockage for me. Am I part of everyone, or am I special?

    Promotors = promoters?

    Nice post.

  2. April 25, 2011 01:33

    Thank you.

    I had originally intended to leave a footnote. But now I think I prefer it to be left ambiguous. You will have to decide for yourself if you are everyone or if you are special.

  3. June 27, 2013 17:01

    Oh wow! Thanks so much for this perfect explanation 🙂

Leave a comment