In article <DJ5H45.FJD at zoo.toronto.edu>, mes at zoo.toronto.edu (Mark
Siddall) wrote:
> I like Dougs idea of coding gaps separately like:
> Taxon 1 AACCGTCAGTCAGT-----CGACGTACGTACGTAC 0
> Taxon 2 AACCGTCAGTCAGT-----CGACGTACGTACGTAC 0
> Taxon 3 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC 1
> Taxon 4 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC 1
>> But it has only limited utility and is still a matter of inference since
> if we add a Taxon 5 and Taxon 6 we could get:
>> Taxon 1 AACCGTCAGTCAGT-----CGACGTACGTACGTAC
> Taxon 2 AACCGTCAGTCAGT-----CGACGTACGTACGTAC
> Taxon 3 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC
> Taxon 4 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC
> Taxon 5 AACCGTCAGTCAGT---CTCGACGTACGTACGTAC
> Taxon 6 AACCGTCAGTCAGT_GACTCGACGTACGTACGTAC
> >
> and now what?
> >--
For what it's worth, my software outputs the following:
Taxon 1 AACCGTCAGTCAGT-----CGACGTACGTACGTAC000
Taxon 2 AACCGTCAGTCAGT-----CGACGTACGTACGTAC000
Taxon 3 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC111
Taxon 4 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC111
Taxon 5 AACCGTCAGTCAGT---CTCGACGTACGTACGTAC001
Taxon 6 AACCGTCAGTCAGT_GACTCGACGTACGTACGTAC011
or perhaps the reverse (with gaps as 1), I can't remember and
it doesn't matter. It only appends characters such as these three
that are informative for parsimony, i.e., with at least two
taxa having gaps and at least two taxa having nongaps.
BTW, if you want to do this with your software, just
convert all nongap or nonmissing characters to "1",
all gap characters to "0", get rid of all but one
of identical adjacent sites, get rid of uninformative
sites (if you want), and append the resulting
matrix to the original file. Transposing the matrix
helps with the string comparisons.
This approach is not without its problems, especially
concerning treatment of alternative equally plausible
gap placements, dealing with missing data, and the
possibility of homoplasy. Other posters have expressed
legitimate frustrations.
--
Doug Eernisse <DEernisse at fullerton.edu>
Dept. Biological Science MH282
California State University
Fullerton, CA 92634