Ambiguity in RNA/DNA alignments

Dave Carmean carmean at sfu.ca
Wed Dec 20 11:17:52 EST 1995


Another aspect of alignment is whether or not to include the ambiguously
aligned sequence. I suggest a method in PAUP that allows portions of
sequence from an individual taxon to be excluded from analysis but keeps
the ignored sequence in the data matrix. 

As shown in the file below, one may set the header to "respectcase": this
allows the ambiguous/unalignable data to be included in the file (as
lowercase) but ignored in the analysis (currently MacClade does not
support the 'respectcase' option, but one can rapidly produce a data
matrix for MacClade from PAUP with all the ignored characters as '?').  By
using equate="A=G" and equate="T=C"  one may do transversion parsimony.

Cheers,
Dave

#NEXUS
[! CO1 Data][Comments in brackets are ignored by PAUP, the '!' next to the
left bracket makes the comment visible when the file is executed by PAUP ]

 begin data;
 dimensions  ntax=6 nchar=36 [If you do not know the number of characters,
use a very large number here, place a '@' as the last character of the
last taxon, execute the file, and PAUP will generate an error message of
one more than the actual number of characters];
FORMAT     
    MISSING=N    respectcase  
  [Enclose the "equate...  =N" in brackets and re-execute file to produce
MacClade data matrix in PAUP]
   equate="a=N"     equate="c=N" equate=".=N"     equate="n=N"         
   equate="g=N"     equate="t=N"   equate="I=N" 
   [equate="A=R" equate="G=R"   equate="T=Y" equate="C=Y" ][Allows
Transversion Parsimony]
   SYMBOLS="ACGTacgtI" INTERLEAVE  [Don't interleave if using a PIR file
or a PHYLIP sequential file] 
   GAP=-;  OPTIONS IGNOR=INVAR;  
  matrix
DROMTTGNC  TACTACCCTGCTCTTTCT TTATTATTAGTAAGAAGA
     Dros  TACTATCCTGCTCTTTCT TTATTATTAGTAAGAAGA
 YMU09206  TATTATCCATCCTTAACa cTATTAATTTCTAGAAGA
LUCMTPIEA  TTTTATCCTGCATTAACT TTACTATtagtaagtagt [lower case ignored if
using the respectcase format]
 MSQNCATR  TATTACCCCTCTTTAACT CTTCTAATTTCTAGAAGT
     Apis  TACTTTCCCTCATTATTT ATACTTTTATTAAGAAAT   ;
  end;
begin assumptions;
charset begin = 1-5;  [Characters sets (for excluding/including etc). This
one is named 'begin' and makes the first 5 characters a set]
charset various = 8 10 13-16;
charset first = 1-36\3;   [For amino-acid coding, every third base]
charset second = 2-36\3;
charset third = 3-36\3;
taxset  one = 1 3 5;    [Taxa sets, note taxa may be referred to by number
or name]
taxset Diptera = DROMTTGNC LUCMTPIEA MSQNCATR Dros;    end;
begin PAUP;
outgroup Apis; [Automatically roots trees at Apis instead of the first
taxon when file is executed]
delete DROMTTGNC  ;  [Automatically excludes taxa or taxa sets]
exclude various  third ; [Automatically excludes characters or character sets]

  constraints   both_genera =  ((Dros,DROMTTGNC),(LUCMTPIEA, MSQNCATR));
[Constraint tree, only 
   enforced when ticked in the search box]   end;
begin trees;    uTREE both_genera =  (((2,3),6),(4,5)) [Places tree in
memory upon execution];     end;



More information about the Mol-evol mailing list