I'm lookiing at a degenerate repeat promoter polymorphism. I have 43 sequences from 8 species (the polymorphism is found in humans/apes and old world monkeys). The sequences are 7 to 23 repeats in length. The repeats are 19-25 nucleotides in length. I would like to do a nucleotide alignment that is "repeat based". That is, it uses the repeats as the basic unit of the alignment. There are 112 distinct repeats in the whole sample. I can think of two ways to do this:
Generate a matrix for all the repeats and find a piece of software that will allow me to input my own matrix AND allow me to use a 112-character alphabet for repeat-based sequences. (Unicode can handle this on the alphabet end). I have yet to find any such software.
Globally align the repeats and insert gaps accordingly into each repeat of the sequences, then do a DNA alignment that would preserve the gaps and allow me to set the constant gapped repeat length as the minimum "chunk" size to be considered.
Is there anything out there that can do this?
Thank you.
Bryan Maloney
IUPUI.