notes on a sequence analysis project
Jeremy John Ahouse
ahouse at hydra.rose.brandeis.edu
Wed Jun 23 12:30:48 EST 1993
Notes on a sequencing project.
Now and again there are questions to this group about which software to
use for molecular evolution projects. We just finished one and I thought I
would share with you a description of the tools that we used.
Machine Mac II series (system 7.1 with 8 megs) with access to the net
via ethernet through a Unix system with a POP server that allows mail to be
sent and recieved and access to a news server. Laserwriter for output.
-program <cost>: what it does/did for me.
-Eudora <free>: recieving mail, sending off Blast requests. This is a
wonderful mail program. You will need the latest MacTCP, and I don't know
what Eudora's current deal with apple is about TCP. Not including TCP with
system software is an irritating policy from Apple.
-Turbogopher <free>: getting lots of software, searching GenBank, PIR,
SwissProt,... This is a wonderful application, and if you have access to
the internet and are not using this ... then start. Just the ability to
search all of the sequence databases makes it worth it, and this is a tiny
part of what this program gives you access to.
-NewsWatcher <free>: to read newsgroups like this one. Reading news
groups is still a tough thing to recommend. It takes lots of time, but you
can learn a lot. I have a complicated relationship with news groups.
-SeqApp <free>: This application does many things. I used it primarily
for sequence alignment (SeqApp can align using ClustalV and also by hand).
SeqApp allows the amino acids to be colored (and the new version allows
you to choose any color). This (if you are not color blind - if you are
you can use gray scales) makes seeing the alignments much easier.
SeqApp now also supports CAP for contig alignments so you could use it
easily for the initial sequencing. We did not do it this way, instead we
used GCG at that point.
You can get to email and gopher from here, though I find Turbogpher's
speed too adictive, and Eudora's feature set too complete. Still from
SeqApp you can easily send preformatted blast searches and genbank
This program is in development. This means: get the latest version,
save frequently and don't get mad if something crashes, just try to write
down as best as you remember what caused the crash and send a note to the
author. We all benefit from people taking on this kind of task. I found
that I needed to remove the control panel superClock to avoid a conflict.
You may find things as well.
-HyperCard <used to be free, now ~100 clams>: The hyperTalk language is
the best way to write programs that manipulate text on the macintosh. The
language supports a good array of primitives to manipulate text in
At one point I wanted to shuffle a short sequence (100 times) and
compare the BLAST result for all of those shuffled sequences. So I let HC
do the shuffling and preformatting of the mail messages for me (since BLAST
requires a separate request for each search). It was also trivial to throw
together a stack (the name of HC documents) that calculates sequence
identities for groups of sequences.
-AxoCalc <free>: A C, Pascal, Fortran, Basic Interpreter for
calculating things that need short programs. This is a remarkable tool,
and I encourage you to look at it. It is so handy to have C at my
fingertips this way.
-BBEdit Lite <free>: a wonderful text editor that supports GREP so you
can do interesting searches. I used BBEdit Lite in conjunction with word
to format my sequences. BBEdit will display your sequences without
wrapping so if you use a monospace fonts you can look at all of you
-Anarcho <free>: this is a very minimalist text editor that a friend of
mine is developing. It should be available within a week on the net. I
used it mostly as a place to store and manipulate text when I didn't want
to take up too much memory - a problem that can happen when all of these
apps are doing their work.
-PAUP for Mac <50-100 clams>: Phylogenetic Analysis Using Parsimony
v3.1.1. This is a very complete package that works seamlessly with
-MacClade<~70 clams with a good book>: manipulating trees, and
calculating Most Parsimonious Reconstructions (inferring the states of
hypothetical taxonomic units = the internal nodes). A nice piece of work,
with very nice output. I tried working with the output in FreeHand and had
some problems, so I suggest Canvas (as the authors themselves do).
-MSWord <highly variable - try to get the student bundle>: used for
final output of multiple sequences. Has a wonderful option-click feature
that allows copying and deleting text in columns.
-Canvas <~260 clams>: used to manipulate MacClade output for
publication. This allowed me to make very nice figures, which I had image
set at 1100 dpi at a service bureau. This isn't necessary but it was nice.
-Authorin <free>: used to prepare sequences for submission (via email)
to genbank. The current version is 3.0 and it won't(???!!!) read files
made in previous versions? Use of this to format your sequences before you
send them in is appreciated.
-Nentrez <free but provisional>: we were lucky to have access to Net
Entrez from NCBI for the duration of this project. It made retrieving
citations for sequences much easier. (Why doesn't swissProt include the
names of articles?)
-EndNotePlus <145 clams>: a wonderful way to deal with references.
This program works well with a number of word processors. We used it with
word. It will help you format you paper for whatever journal you are
submitting to. (Why are there more than 3 (short, long, and exhaustive)
reference types anyway???)
-Fetch <free>: an easy way to do FTP from your Mac. One day we will be
able to use Gopher for all of these things, but now there are still things
that you will need fetch for. (and while I am at it, if you are really
going to surf the net regularly make sure to get Stuffit Expander, DeHQX,
-Easy View <free>: I only used this at the end of the project, but it
shows real promise. This application allows you to quickly and easily view
any text file that is stored in a particular folder. If you are doing many
comparisons or have sequenced many genes you will have a folder full of
formatted text files. Easy view allows you to move between them very
Some final comments. It is wonderful to be able to move between all of
these programs. This was facilitated because many of them use or support
text based files. It is far safer and easier to have your data stored as
text files than some proprietary format. (Thank you SeqApp, MacClade, and
Paup). This means you can write programs to manipulate your data (e.g.
HyperCard) and if the company that makes your seq software goes extinct, or
no longer supports your machine, or wants many clams for the upgrade you
are not stuck.
Jeremy John Ahouse
Center for Complex Systems and Biology Dept
Waltham, MA 02254-9110
email: ahouse at hydra.rose.brandeis.edu
Mail from Mac by Eudora 1.3.1 - RIPEM accepted
More information about the Bio-soft