Des Higgins' excellent commentary

Steve Thompson: VADMS genetics THOMPSON at WSUVMS1.CSC.WSU.EDU
Thu Apr 8 10:51:43 EST 1993

Good Day folks -

Des Higgins' reply to McKenna [Message-Id: <9304081426.AA20368 at net.bio.net>]
concerning the construction of molecular phylogenies with "low homology"
sequences was SO BEAUTIFUL that it ought to be framed!  How often have I
repeated exactly the same arguments to my students and colleaques!?  So often
they want the one best (simple 8^) automated) way, the "right way," to perform
a phylogenetic analysis, when, as Des so clearly describes, the most realistic
analysis will usually involve a complex interplay of methods and MUCH reliance
on your own subjective BIOLOGICAL KNOWLEDGE.  Where have I heard this before? 
It rings so true.  His note will become required reading in our course.  I'm
enclosing a copy of Des' reply for those of you who missed it the first time
					Have a good one, Steve T

                              Steven M. Thompson
            Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
           Washington State University, Pullman, WA 99164-1224, USA
          AT&Tnet:  (509) 335-0533 or 335-3179  FAX:  (509) 335-0540
                  BITnet:  THOMPSON at WSUVMS1 or STEVET at WSUVM1
                   INTERnet:  THOMPSON at wsuvms1.csc.wsu.edu

Des' reply ======================================


>My experience is that it is certainly possible technically but the results
>may not be very reliable.  If you do not have enough information to align
>sequences comfortably, trees are usually even more difficult.   Finding 
>close groupings will not be a problem but the deep branches may be
>I have generated trees where the identity levels dropped below 10 percent for 
>the most divergent pairs.  The trees were useful as long as I did not try to
>over-interpret the deepest branches. To get the trees you need very high 
>quality alignments (i.e. EVEN better than you get from clustal :-)).  These 
>have to be made with reference to structures if they are available.  Usually 
>structures are not available but you may still get parts of the sequences 
>aligned well by trying to match the more obvious looking secondary structure 
>elements.  This cannot yet be done automatically.  If you are lucky, you will
>find "blocks" of conserved segments with very few gaps, separated by regions 
>that are totally ambiguous.  These ambiguous pieces must be removed.   Some 
>parts of homologous proteins are simply unalignable from primary sequence 
>information alone.  You can guess at the alignment in these difficult parts 
>using an "algorithm" but the guess may not mean anything biologically.  
>If you use these badly guessed at pieces, then the tree topology may only 
>depend on how the guess was made.  
>A further problem is how to treat gaps (insertions and deletions).  I have seen
>many cases where people include gaps in difficult alignments and score them as 
>characters (for parsimony or distances).   You may end up with the effect of 
>the gaps completely outweighing the aligned residues, in determining the 
>topology of the final tree.  If the tree was derived manually, then, in 
>effect, you are also manufacturing the tree topology manually.  One drastic 
>but clean solution is to remove all sites where any sequence has a gap.  This 
>may throw away half your data though.
>If you do manage to generate a multiple alignment with enough conserved blocks
>and remove the nasty bits, actually generating a topology is the easy part.
>(e.g. using bits of PHYLIP or PAUP).  Neighbor-Joining trees from distances are
>fast and you can bootstrap them easily but beware that you cannot use the 
>usual corrections for "multiple hits" on the distances if any of the
>sequence pairs are less than about 18% identical (over the aligned regions).   
>Des Higgins
>EMBL, Heidelberg, Germany.

P.S. my apologies for taking up bandwidth with a repeated message, however, I
     feel that Des' message is definitely worth repeating. SMT

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net