thanks guys, i appreciate you replies.
well, i also talked over with some ppl around here and nobody really had any
idea of how to get around that and since i still wanted to run DNADIST and
see the results....
well, here is what i did:
i saw that DNADIST saved the dist matrixes for the data sets that were
analyzed (that is: they did not have bootstrapped sequences that yielded
infinite distances) on the output file, up to the moment the program found a
data set with infinitely distant sequences.
them i used other bootstrap sets to run DANDIST and append the result to the
previous file, in the end i had around the number of replicates i wanted
(haven't had the time yet to see if the resulting consensus tree makes sense
lots of work, but i couldn't think of any other way to go around it.
so here is my question:
is that ok? would this interfere with my bootstrapping analysis because i'm
selecting some bootstrap data sets out?
joe at removethispart.gs.washington.edu wrote:
> In article <and3qk$pm7$1 at mercury.hgmp.mrc.ac.uk>,
> Chris Hoffman <choffman at lucas.cis.temple.edu> wrote:
> >I have a question regarding the DNADIST program from Phylip Package.
> >I run SEQBOOT with my seqs and get my new data sets produced using
> >bootstrap and so far so good. but when i use these new data sets to run
> >DNADIST, the program can't run it because it finds one or more sequences
> >that are supposedly too different to allow the computation to proceed.
> >I tried all the methods available in the program and all give similar
> >btw: I haven't found any similar msgs running DNAPARS or DNAML
>> This occurs for a reason inherent to bootstrapping. When you have
> sequences, some of which are fairly distant, and bootstrap, two
> sequences can become so far apart that their distance would be infinite.
>> For example, when you use a Jukes-Cantor distance, any two sequences
> that are more then 75% different will have an infinite distance. Thus
> when your original sequences are (say) 70% different, bootstrapping
> can occasionally make those sequences 76% different.
>> What should the distance program do in such a case? I chose in PHYLIP
> to make it complain and stop. Other peoples' programs sometimes are
> set to assign a large number (say 10) as the distance. Both of these
> policies have disadvantages. One denies you the ability to use that
> replicate, the other puts in somewhat fictional information.
>> Parsimony and likelihood don't have this problem, though likelihood
> could put a species on the end of a very long branch. In Dnaml, I
> just have the branch length get fairly long, then at some point the
> program has iterated it enough and leaves it at that length.
> Joe Felsenstein joe at removethispart.gs.washington.edu> Department of Genome Sciences, University of Washington,
> Box 357730, Seattle, WA 98195-7730 USA