Another aspect of alignment is whether or not to include the ambiguously
aligned sequence. I suggest a method in PAUP that allows portions of
sequence from an individual taxon to be excluded from analysis but keeps
the ignored sequence in the data matrix.
As shown in the file below, one may set the header to "respectcase": this
allows the ambiguous/unalignable data to be included in the file (as
lowercase) but ignored in the analysis (currently MacClade does not
support the 'respectcase' option, but one can rapidly produce a data
matrix for MacClade from PAUP with all the ignored characters as '?'). By
using equate="A=G" and equate="T=C" one may do transversion parsimony.
Cheers,
Dave
#NEXUS
[! CO1 Data][Comments in brackets are ignored by PAUP, the '!' next to the
left bracket makes the comment visible when the file is executed by PAUP ]
begin data;
dimensions ntax=6 nchar=36 [If you do not know the number of characters,
use a very large number here, place a '@' as the last character of the
last taxon, execute the file, and PAUP will generate an error message of
one more than the actual number of characters];
FORMAT
MISSING=N respectcase
[Enclose the "equate... =N" in brackets and re-execute file to produce
MacClade data matrix in PAUP]
equate="a=N" equate="c=N" equate=".=N" equate="n=N"
equate="g=N" equate="t=N" equate="I=N"
[equate="A=R" equate="G=R" equate="T=Y" equate="C=Y" ][Allows
Transversion Parsimony]
SYMBOLS="ACGTacgtI" INTERLEAVE [Don't interleave if using a PIR file
or a PHYLIP sequential file]
GAP=-; OPTIONS IGNOR=INVAR;
matrix
DROMTTGNC TACTACCCTGCTCTTTCT TTATTATTAGTAAGAAGA
Dros TACTATCCTGCTCTTTCT TTATTATTAGTAAGAAGA
YMU09206 TATTATCCATCCTTAACa cTATTAATTTCTAGAAGA
LUCMTPIEA TTTTATCCTGCATTAACT TTACTATtagtaagtagt [lower case ignored if
using the respectcase format]
MSQNCATR TATTACCCCTCTTTAACT CTTCTAATTTCTAGAAGT
Apis TACTTTCCCTCATTATTT ATACTTTTATTAAGAAAT ;
end;
begin assumptions;
charset begin = 1-5; [Characters sets (for excluding/including etc). This
one is named 'begin' and makes the first 5 characters a set]
charset various = 8 10 13-16;
charset first = 1-36\3; [For amino-acid coding, every third base]
charset second = 2-36\3;
charset third = 3-36\3;
taxset one = 1 3 5; [Taxa sets, note taxa may be referred to by number
or name]
taxset Diptera = DROMTTGNC LUCMTPIEA MSQNCATR Dros; end;
begin PAUP;
outgroup Apis; [Automatically roots trees at Apis instead of the first
taxon when file is executed]
delete DROMTTGNC ; [Automatically excludes taxa or taxa sets]
exclude various third ; [Automatically excludes characters or character sets]
constraints both_genera = ((Dros,DROMTTGNC),(LUCMTPIEA, MSQNCATR));
[Constraint tree, only
enforced when ticked in the search box] end;
begin trees; uTREE both_genera = (((2,3),6),(4,5)) [Places tree in
memory upon execution]; end;