Sequence alignment software
carmean at sfu.ca
Tue Apr 18 12:40:25 EST 1995
For the Macintosh, I suggest SeqApp:
Suggestions For Using SeqApp (Preliminary version 0.6)
Dave Carmean, Simon Fraser University
carmean at sfu.ca (Please tell me of better ways of doing things!)
SeqApp is a freeware program written by Don Gilbert and available by
anonymous ftp (Fetch) or gopher from ftp.bio.indiana.edu. SeqApp is a
powerful program- it is used for storing and aligning data, and outputting
formatted files for PHYLIP, PAUP, and other analysis programs as well as
for printing hard copies of alignments. It also will compare regions of
homology between two sequences (with DottyPlot) and send sequence to BLAST
for similarity searches of all sequences in GenBank. However, it has
several bugs or unincorporated features. Exploring new areas of the menu
can lead to a program or system crash- thus always have backup copies of
your files, and always save your file regularly while working and before
doing anything new. These dire warnings are intended to keep your
expectations low so you will be pleasantly surprised when use the program-
they are not intended to discourage use of the program. I rarely have any
problems with SeqApp. Also, there usually is more than one way to do
things- please tell me if you learn a simpler method.
A couple of hints (of general use for the Macintosh): if the program locks
up, do not hit keys randomly. Wait a while, then use command(
apple)-option-esc (three keys simultaneously)- That will give you the
message "Force SeqApp to quit? Unsaved changes will be lost" Mouse click
on the yes. This will save other programs in progress. If nothing
happens, simultaneously hit the power-on key on the keyboard, the
(apple), and the ctrl keys. This will restart the computer, and all
unsaved work will be lost.
The alignment window will only display ~2700 bp- the program will tell you
if you exceed this. It is still possible to work with the first 2700 bp
pairs in the alignment window, and all the sequence in the edit windows.
SeqApp will not work under system 6 or with an SE.
I suggest keeping an extra sequence at the bottom of the alignment-
sometimes a portion of the end of the last sequence is lost when saving.
It is worthwhile occasionally checking the alignment to see no portions
are corrupted- especially the ends. I have not had trouble with this
recently and do not know why I had this problem when I first used the
Two Major Modes: Padlocked and unpadlocked (see padlock near upper left
corner of align window). Double click on a sequence name to see that
sequence in an edit window. Click and then drag a sequence name up or
down to change the position of a taxon in the alignment. Under the edit
menu, 'clear' will delete the entire sequence.
Padlocked: Allows aligning of sequences, either individually or blocks.
Does not allow changing sequence- only inserting or deleting gaps.
Highlight a block, click on an area within the block you wish to bring to
another area, and drag it to that area (just to the left, actually, or it
will overshoot). It will insert '~' into the gap. This '~' character and
periods are not locked, and will be swallowed instead of pushed while
aligning other areas. In order to lock the gaps, highlight the '~'s that
should be locked and choose under the sequence menu "Lock indels"
(insertion deletion). If you wish to lock the entire alignment
(recommended regularly), highlight a single base and choose "Lock
indels." If a taxon name is highlighted only the indels of that sequence
will be locked- to undo the highlight unpadlock the alignment, click
inside the alignment, and padlock the alignment again.
Never use the drag-alignment option with several sequences when some of
the sequences do not reach the area being dragged. SeqApp normally will
add tildes (in the middle of sequence) or periods (at the beginning of
sequence), but if there is no sequence it will add garbage, and it may not
be possible to undo. For example, if you are working towards the end of a
1000bp alignment, and one of the taxa has only a few hundred base pairs of
sequence, and you wish to insert a gap in the alignment, either place the
unfinished taxon outside of the working area (at the top or bottom) or
add placeholders ("-", "~", ".", or n's) at the end of the short
Unpadlocked: Allows editing of the actual sequence, cutting and pasting;
searching for sequence (but note warning following), changing case of a
block of a single sequence. I know of no way to quickly delete a block or
change case of a block of sequence- one can do this in a PAUP formated
file in Microsoft Word, using the option key and the mouse to highlight a
rectangular block of data.
NEVER use (apple)-F or the "find..." under the edit menu. To search for
any string of bases (say gtttaa), with the sequences unpadlocked,
highlight the string to be searched for (it may be added or pasted at the
beginning of a sequence and then deleted), then with the mouse go to the
edit menu and choose the find "gtttaa" then place the pointer anywhere in
the alignment and use the (apple)-g (find again).
Initially Putting Sequence into the program:
SeqApp can read many different formats: thus you may import files directly
from GenBank, GCG, Fasta, PAUP (some limitations), and PHYLIP (few
limitations). One may open any text file by dragging it over the SeqApp
icon (I keep an alias of SeqApp and other major programs on the desktop
expressly for this). To add sequence to an existing alignment under the
file menu choose open and hold down the shift key while double clicking on
a sequence or click on the append box. New sequences are always placed at
the bottom of the alignment. Be forewarned that GenBank comments are not
saved, even in GenBank format (the recommended format), though I find
Pearson/FastA to be convenient.
Making an Alignment:
While SeqApp has elegant tools for manual alignment, Clustal will
automatically give an initial alignment that can be refined in SeqApp.
Use Pearson/FastA to input a file to Clustal. I suggest tying down the
ends of the alignment by changing all periods (which Clustal ignores) to
N's, and changing the Clustal output file to PIR.
Making a PAUP file and Getting a Printed Alignment:
One may use prettyprint for printing but I do it by saving the file in
PAUP format and then printing it in a word processor. Change to PAUP
format (top right of align window), highlight area to be printed- all
sequences must be long enough to fill highlighted area, as SeqApp cannot
make a PAUP file with some sequences shorter than the alignment). In the
menu under sequence first 'lock indels' then under file choose 'Save
Selection...' and change the filename and save. If some of the
highlighted area has no sequence you will get an error message- fill in
those sequences with '.'.
For printing, Open the saved file in word, highlight all ((apple)-A),
courier font size 9, under file menu choose page setup and change
orientation sideways (to landscape); then choose print preview and change
(drag) the left margin to 0.5" (1.3cm) and drag the page number. You may
want to delete the header , add a paragraph return before each occurrence
of the first taxon to separate the pieces of alignment, and strategically
insert page breaks to avoid breaking up a clump of alignment.
For use in PAUP, change the header so that "missing=;" is "missing=- gap=.
;" and (if in Word) save the file as a 'text only' type.
Translating Sequence to Amino Acid
Translation divides the sequence into triplets starting with the first
character and ignores any triplets with non-coding characters (including
gaps and unknowns). In the sequence window, highlight the names of the
taxa you wish to translate and choose translate under the sequence
window. It is not possible to translate a portion of sequence.
More information about the Bio-soft