In article <wblank.1180078558B at news.srv.ualberta.ca>,
Panhead McNipper <wblank at gpu.srv.ualberta.ca> wrote:
>I want to place 16S sequences obtained in our lab on a tree with closely
>related sequences from GenBank, but one of them isn't complete (e.g. a
>string of about 60 Ns between nucleotides 940-1000). Should I:
> A) go ahead anyway?
> B) cut the offending region out of all the sequences in the alignment?
> C) leave the incomplete sequence out entirely?
>I plan to go ahead with option B (60 bases out of 1400-some isn't _too_ much
>info lost, is it?) but want to know how others treat this sort of thing.
Essentially all programs for implementing all phylogeny methods can cope
easily with the missing data if you code the bases there as "?" or "N",
meaning that they are ambiguous. There is no need to discard that region
in all sequences. You need to put _something_ in there, though, as the
programs need aligned sequences.
--
Joe Felsenstein joe at genetics.washington.edu (IP No. 128.95.12.41)
Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA