Content-Type: text/plain; charset="us-ascii"
Content-Type: text/plain; name="satellite_note"; charset="us-ascii"
Content-Disposition: attachment; filename="satellite_note"
I would appreciate some advice in analyzing gene trees and
relationships among satellite DNA sequences isolated from
teleost genomes. I've been in this business a while, but
find that my mind has been molded by thinking about gene
frequencies, gels, and related stuff for so long that I
sometimes find it difficult to think in terms of trees
(phylogenetic or otherwise). And tree "gurus" with whom I
can talk are thin on the ground here in southwestern VA...
About a year ago John Elder and I reported (ref: PNAS 91:994)
that the canonical (most frequent) monomers of an alpha-type
satellite DNA showed population-specific concerted evolution
in an Atlantic coast pupfish, Cyprinodon variegatus. Ten of
12 populations we surveyed had different characteristic
canonical monomers, and the divergence (as measured by
proportional similarity after aligning all of them by GCG)
was not obviously related to the geographic distances between
the samples. The sequences (or comparisons among them) have
the following interesting features:
1. Nearly all of the divergence is confined to the 3' half
of the 155-180 bp monomer.
2. The divergences are frequently extensive.
3. They are very rich in "indels."
We are now cloning and sequencing comparable satellite
monomers from isolated pupfish populations in Death Valley
and environs (C. nevadensis, C. salinus, C. diabolis, C.
radiosus, etc.). This system has the advantages of worked
out phylogenies (based on allozymes or mt DNA rflp's) and
well characterized variance in both the durations of
isolation of various populations and population sizes. We
would like to use the system to test various hypotheses about
satellite DNA evolution and the mechanisms involved in
concerted evolution in general. It is important to note that
we are using the organisms to study the DNA sequences, and
not the reverse.
We need to estimate: 1. The extent to which the divergences
of the canonical monomers are congruent with the phylogeny of
the populations. 2. The extent to which secondary (less
frequent) monomers in these genomes resemble the canonical
monomer in each case or are independently derived (this will
give a rough test of the "library expansion" vs "genomic
replacement" models of concerted evolution). 3. The
relationship between canonical monomer divergence and elapsed
In a recent grant proposal I suggested that this could be
done by the minimimum evolution method of Rzhetsky & Nei,
which, in turn, is based on Saito & Nei's "neighbor-joining"
method of tree construction. The inputs into the NJ program
would be pairwise similarities (genetic distances) computed
by the GCG program. Indels would be counted as single
nucleotide changes. This proposed analysis was criticized in
the review statement of the advisory panel: " ...pairwise
proportional similarities may not be representative of
evolutionary distance for a sequence that is probably not
evolving by simple nucleotide substitutions."
Now (finally) for my questions:
1. What methods of tree construction/sequence comparison
would be appropriate for these data and would satisfy the
reviwer's criticism? Would one use "character state" methods
such as "maximum likelihood", "evolutionary parsimony", what?
2. I could align the sequences by using the mtDNA or
allozyme phylogenies and locate the gaps accordingly, but
isn't this circular if I then seek to measure the congruence
of the satellite DNA and (say) the mtDNA trees?
3. What is the best way of asking if the divergences of the
3" end of the monomer are essentially random with respect to
I thank anyone who has stayed with me to the end of this
massive missive. I would appreciate any comments that
anyone might like to make, references, etc. Thanks again.