Sequence alignments -- where to chop?

higgins at ebi.ac.uk higgins at ebi.ac.uk
Thu Apr 18 05:45:02 EST 1996

In article <wblank.1180078558B at news.srv.ualberta.ca>, wblank at gpu.srv.ualberta.ca (Panhead McNipper) writes:
> Just trying to figure out the most "appropriate" course of action...
> I want to place 16S sequences obtained in our lab on a tree with closely
> related sequences from GenBank, but one of them isn't complete (e.g. a
> string of about 60 Ns between nucleotides 940-1000).  Should I:
>     A) go ahead anyway?
>     B) cut the offending region out of all the sequences in the alignment?
>     C) leave the incomplete sequence out entirely?
> I plan to go ahead with option B (60 bases out of 1400-some isn't _too_ much
> info lost, is it?) but want to know how others treat this sort of thing.

Hi Wally:

There are advantages and disadvantages to all three :-).
c) is drastic but effective.  A similar argument applies to full length
sequences that are so distantly related to the rest that they largely contain 
regions that are unalignable.  You have to decide if the extra information is 
worth it or not.  
b) is consevative but nice and safe.  It is "good" because you compare like
with like in all sequences and you avoid "funny" problems that can arise
when you compare sequences of greatly different lengths.  Again, the same
argument can be applied to regions of alignment that are unalignable.  Such
regions occur in most alignments.   It is "bad" because you throw away
information.  In the case of blocks/regions that are unalignable, you lose
nothing and help protect yourself from artefacts of the alignment process
(automatic OR manual).  
a) In this case, the 60 out of 1400nt should not make much difference.  Why
not try it and see.  If you use the "bootstrap" to measure support for the
different groupings, you can see the effect easily by running the analysis
with and without the block.

Personally I like to use B and C where it is convenient and you will find
that this is common practice in the literature.   Sometimes however, you have no
choice but to include everything you can.  Just be careful :-).

Des Higgins


> Thanks much
> Wally
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> "The company of the future has only two employees:
>  a man
>  and a dog.
>  The job of the man is to feed the dog.
>  The job of the dog is to keep the man away from the computer." -- Chico
> It's all much appreciated.  Out!

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net