In article <bengt.oxelman-0412951325470001 at mac38.systbot.gu.se> bengt.oxelman at systbot.gu.se (Bengt Oxelman) writes:
(actually I wrote this following bit)
>> 1) indel events are not observed data (that is one does not
>> observe gaps in a sequence), they are matter of inference, thus, should
>> not be treated as observed data points (i.e., code them as "missing").
>>Length differences are as 'observed' as polymorphisms at 'inferred'
>nucleotide positions.
I disagree to a point.
Polymorphisms are not observed either. No one sequence has more than one
base at any given position. Multiple sequences from multiple isolates
may have different bases that are observed, but no one "observes"
a polymorphism.
Regardless this differs from my point.
My point is, change the alignment parameters and you change the
homology statement about gaps (in many circumstances).
Take for example:
Taxon I AACCGTACT
TaxonII AACT
In so far as one could get:
AACCGTACT
AAC-----T
or
AACCGTACT
AA-----CT
or
AACCGTACT
A-----ACT
under the same alignment parameters, these are obviously not "obersvations"
but inferences.
I do not think this is all that trivial and it makes me wonder about the
veracity of coding gaps as a "fifth state".
The alternative is to treat them as uninformative but this really does
nto treat them as "nothing" it treats them as one of the four observed
states (ACGT) whatever is most parsimonious, notwithstanding that
none of the four observed states was observed or could rationally
be placed in that position.
I like Dougs idea of coding gaps separately like:
Taxon 1 AACCGTCAGTCAGT-----CGACGTACGTACGTAC 0
Taxon 2 AACCGTCAGTCAGT-----CGACGTACGTACGTAC 0
Taxon 3 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC 1
Taxon 4 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC 1
But it has only limited utility and is still a matter of inference since
if we add a Taxon 5 and Taxon 6 we could get:
Taxon 1 AACCGTCAGTCAGT-----CGACGTACGTACGTAC
Taxon 2 AACCGTCAGTCAGT-----CGACGTACGTACGTAC
Taxon 3 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC
Taxon 4 AACCGTCAGTCAGTGGACTCGACGTACGTACGTAC
Taxon 5 AACCGTCAGTCAGT---CTCGACGTACGTACGTAC
Taxon 6 AACCGTCAGTCAGT_GACTCGACGTACGTACGTAC
>and now what?
>--
Mark
--
Mark E. Siddall "I don't mind a parasite...
mes at vims.edu I object to a cut-rate one"
Virginia Inst. Marine Sci. - Rick
Gloucester Point, VA, 23062