Software to graph sequence alignments
cbkfr01 at mailserv.zdv.uni-tuebingen.de
Fri Mar 5 03:14:31 EST 1993
In <9303020004.AA14577 at umailsrv0.UMD.EDU> MAILER-DAEMON at UMAILSRV0.UMD.EDU (Mail Delivery Subsystem) writes:
>I guess that there are very few molecular biologists who have never
>had the need to produce an optimal seqeunce aligment or at least
>needed to look carefully at one. There are indeed many different
>programs available that optimise these seqeunce alignments, the
>output of which is usually a series of aligned character strings
>with gaps in appropriate places and special characters to denote
>identity or similarity.
>What I'm lookng for is a program that will take a pair of aligned
>sequences, and plot graphically, either the similarity or identity,
>and using a specifible window size, along the entire length of an
>alignment. By doing this it would me far easier to see the regions
>of high or low similarity/identity and see their relative location
>along the length of the alignment.
>Does anyone know of a program to do this ???
Confronted with the problem to produce a graphical output of
aligned sequences for an easy visualization of more and of less
similar regions, I devised a method which displays the degree of
similarity in grades of gray. Each aligned sequence therefore is
presented as a bar filled with stripes of different darkness. It
provides a very compact graphic result, which is also well suited
for sequence presentation in a talk. (This opinion is of course
extremely biased, because its just my own. For your own
impression have a look at my paper in J. Cell Biol. 144, 443-453
(1991).) The display is not intended for short chunks of sequence
(isolated domains, etc.), which reasonably allow for a residue by
residue display (as provided, e.g., by Malined), but for the
comparision of whole proteins, the stuff which normally provokes a
groan of the audience when a slide with thousands of aligned letters
However, I did not produce the graphics with one slick and easy
to use program, which could be offered to and used by others, but
instead wrote some small Hypercard tools for parts of the process
and used Excel and Word (for a lot of search and replace) for
the intermediate steps. In short, a tedious procedure.
I started to integrate the whole process into a single
(Hypercard) program and at the same time to add more flexibility.
I put the project on hold ( the total scheme and some modules
(Hypercard Xternals for the time-critical steps) are waiting in
the drawer) because I realized that there might be already
programs around which can perform this task, possibly from a real
programmer who hacked them together in a flash.
This newsgroup is probably the ideal platform to find out if this
is true, or if somebody else would step forward to do it (it
would take ME a VERY long time), and if there is really a demand
for it. I only hope that the bionet programming gurus are reading
my post, too.
I have no access to the GCG package, so I wonder if
PlotSimilarity, which has been proposed by Steve Thompson, really
performs the same task, summing up over a stretch of aa residues,
or if its graphic output rather shows single separated residues??
For a short description of my project (which would be a Macintosh
Hypercard stack :-):
A set of prealigned sequences (ClustalV-output in NBRF/PIR
format seems a good standard) is first compared pairwise. The
comparision can either check for identity, or using amino acid
cross tables, allowing for the consideration of similarities. In the
next step, the values of identity/similarity are summed up in a
sliding window, the output at this step could also be used to
produce a line or area graph with the respective program.
The result is converted into a postscript file describing the
corresponding bars of grays, including the names of the sequences
and optionally a scale. At that stage, a threshold value for minimal
degree of identity can be chosen, which emphasizes the regions of
homology. This output is in Illustrator format and thus can be
further processed or included in a paper.
I would really like to produce that program (though it would take
me more time than I should reasonably spend on it), but I would
hate to find out later-on that the problem either has long been
solved or that nobody really has a requirement for the gray-bar
stuff. And if somebody would do it faster and on a more platform
independent way (but hey, everybody needs a Mac) he shall speak now...
I hope for some enlightening response.
Kai-Uwe Froehlich, kaifr at mailserv.zdv.uni-tuebingen.de;
Physiologisch-chemisches Institut, Hoppe-Seyler-Str. 4,
7400 Tuebingen, Germany
More information about the Bio-soft