Evolutionary remnants in non-coding regions

William R. Pearson wrp at alpha0.bioch.virginia.edu
Fri Dec 12 18:22:56 EST 1997

newsmgr at merrimack.edu writes:
> I am well out of my area of expertise, so please forgive my ignorance. 
> My lab has cloned a fission yeast gene that encodes a member of a
> highly-conserved gene family.  The putative product is 44% identical to
> a family members found from squid to human, however our protein lacks
> the first 36 residues (about 10% of the protein) relative to all other
> members.  Even so, if I look at the coding capacity of the 5'
> untranslated region, I find properly placed codons for absolutely
> conserved residues found in the larger proteins.  By "eyeball alignment"
> using only two one-residue gaps I can produce 25% identity between the
> "non-product" of the 5' UTR of our gene and the first 36 residues of a
> mouse homolog.
> Here are my questions.
> 1.  Is it valid to suggest that this indicates the fission yeast gene
> arose from a larger "standard-length" precursor and that what I am
> seeing is an evolutionary remnant?
> 2.  Are there computer programs that can ignore the fact that I have a
> stop codon and produce a protein alignment that includes this region?
> 3.  Are there other examples of this phenomenon?  I do not consider this
> to be the same as pseudogenes, since my protein is expressed.
> I would greatly appreciate your comments.

The FASTX and TFASTX programs are variations of the FASTA program that
are designed to align DNA to protein sequences, or vice versa,
allowing frameshifts to improve the alignment. We have used the
programs extensively to identify putative sequencing errors in
bacterial genome sequences by searching with the DNA encoding an ORF +
500 nt on either end; if the FASTX alignment extends well beyond the
called ORF, we predict that a sequencing error has been made.
Likewise, we identified several clear protein homologues in intergenic

FASTX and TFASTX are available in the FASTA package from:


Both fasta2 and fasta3 have the programs; fasta3 is more up-to-date.

Bill Pearson

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net