Abstracts - 2006
Mechanisms for Novel Gene Emergence
Pouya Kheradpour & Manolis Kellis
Adaptation through the emergence of new genes is a fundamental process investigated in evolutionary biology. Previous studies have shown that genes can arise through the duplication and subsequent mutation of existing genes . However, duplication does not explain how genes with functions unrelated to previously existing functions can arise.
One potential source for functionally novel genes is in the alternative reading frames of other genes . Each DNA sequence can be translated into an amino acid sequence using one of six possible reading frames (three in each orientation). These lead to distinct and entirely different proteins. When considering a gene, there is typically only one of these frames is open (devoid of stop codons), and any alternative reading frame is nonsensical and usually rich in stop codons.
Consider Figure 1. The DNA sequence indicated can be translated using six frames depending on the strand read and the offset used. Reading from frame 1 would result in the codons ATG, TTT whereas reading from frame -2 would result in TAA, ACA. Notice, that TAA is a stop codon, indicating this reading of the sequence would result in no amino acids.
We and others have noticed, however, that due to specific properties of the genetic code there are fewer stop codons in alternative frames of genes. Also, the gene in the original frame forces that genomic region to have frame conservation. While these properties have been used to clean annotations of overlapping genes , they also make these alternative frames a potentially more common source of new genes than intergenic space.
We are evaluating the potential for this and other sources of new genes by examining identified genes with unknown orthology, by looking at existing overlaps of genes and by measuring the potential for DNA sequence to code for overlapping proteins.
A strong indication that more than one of the overlapping genes is actually translated is a higher-than-expected conservation in the overlapping region. We expect a higher conservation because coding multiple genes more highly restrict the genomic sequence.
We examined long open alternative reading frames of identified genes in eight fungal genomes. We identified several genes and their orthologs that contained a open alternative reading frame whose sequence conservation was higher than the rest of the gene.
Figure 2 indicates the stop codons and sequence identify of orthologous genes from two fungi species. We notice that the indicated region is more highly conserved than the rest of the gene and contains an open alternative reading frames in both genes for both frames 2 and -3. We are currently evaluating the statistical significance of findings such as these.
 Manolis Kellis, Bruce Birren and Eric Lander. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature, 8(428): 617-24, April 2004.
 Zsolt Boldogkoi. Coding in the Noncoding DNA Strand: A Novel Mechanism of Gene Evolution? Journal of Molecular Evolution, 51(6): 600-606, December 2000.
 Pawel Mackiewicz, Maria Kowalczuk, Agnieszka Gierlik, Miroslaw R. Dudek and Stanislaw Cebrat. Origin and properties of non-coding ORFs in the yeast genome. Nucleic Acids Research, 27(17): 3503-3509, September 1999.
 Matt Rassmussen and Manolis Kellis. SynPhyl: Ortholog and paralog detection in multiple complete genomes. Unpublished Works.