Software to graph sequence alignments

Kai-Uwe Froehlich cbkfr01 at mailserv.zdv.uni-tuebingen.de
Fri Mar 5 03:14:31 EST 1993


In <9303020004.AA14577 at umailsrv0.UMD.EDU> MAILER-DAEMON at UMAILSRV0.UMD.EDU (Mail Delivery Subsystem) writes:

>G'day all,

>I guess that there are very few molecular biologists who have never
>had the need to produce an optimal seqeunce aligment or at least
>needed to look carefully at one. There are indeed many different
>programs available that optimise these seqeunce alignments, the
>output of which is usually a series of aligned character strings
>with gaps in appropriate places and special characters to denote
>identity or similarity.

>What I'm lookng for is a program that will take a pair of aligned
>sequences, and plot graphically, either the similarity or identity,
>and using a specifible window size, along the entire length of an
>alignment. By doing this it would me far easier to see the regions
>of high or low similarity/identity and see their relative location
>along the length of the alignment.

>Does anyone know of a program to do this ???

>BILL 
Confronted with the problem to produce a graphical output of 
aligned sequences for an easy visualization of more and of less 
similar regions, I devised a method which displays the degree of 
similarity in grades of gray. Each aligned sequence therefore is 
presented as a bar filled with stripes of different darkness. It 
provides a very compact graphic result, which is also well suited 
for sequence presentation in a talk. (This opinion is of course 
extremely biased, because its just my own. For your own 
impression have a look at my paper in J. Cell Biol. 144, 443-453 
(1991).) The display is not intended for short chunks of sequence 
(isolated domains, etc.), which reasonably allow for a residue by 
residue display (as provided, e.g., by Malined), but for the 
comparision of whole proteins, the stuff which normally provokes a 
groan of the audience when a slide with thousands of aligned letters 
is presented.

However, I did not produce the graphics with one slick and easy 
to use program, which could be offered to and used by others, but 
instead wrote some small Hypercard tools for parts of the process 
and used Excel and Word (for a lot of search and replace) for 
the intermediate steps. In short, a tedious procedure.

I started to integrate the whole process into a single 
(Hypercard) program and at the same time to add more flexibility. 
I put the project on hold ( the total scheme and some modules 
(Hypercard Xternals for the time-critical steps) are waiting in 
the drawer) because I realized that there might be already 
programs around which can perform this task, possibly from a real 
programmer who hacked them together in a flash.

This newsgroup is probably the ideal platform to find out if this 
is true, or if somebody else would step forward to do it (it 
would take ME a VERY long time), and if there is really a demand 
for it. I only hope that the bionet programming gurus are reading 
my post, too. 

I have no access to the GCG package, so I wonder if 
PlotSimilarity, which has been proposed by Steve Thompson, really 
performs the same task, summing up over a stretch of aa residues, 
or if its graphic output rather shows single separated residues??

For a short description of my project (which would be a Macintosh 
Hypercard stack :-):
A set of prealigned sequences (ClustalV-output in NBRF/PIR 
format seems a good standard) is first compared pairwise. The 
comparision can either check for identity, or using amino acid 
cross tables, allowing for the consideration of similarities. In the 
next step, the values of identity/similarity are summed up in a 
sliding window, the output at this step could also be used to 
produce a line or area graph with the respective program. 

The result is converted into a postscript file describing the 
corresponding bars of grays, including the names of the sequences 
and optionally a scale. At that stage, a threshold value for minimal 
degree of identity can be chosen, which emphasizes the regions of 
homology. This output is in Illustrator format and thus can be 
further processed or included in a paper.

I would really like to produce that program (though it would take 
me more time than I should reasonably spend on it), but I would 
hate to find out later-on that the problem either has long been 
solved or that nobody really has a requirement for the gray-bar 
stuff. And if somebody would do it faster and on a more platform 
independent way  (but hey, everybody needs a Mac) he shall speak now...

I hope for some enlightening response.

Kai-Uwe Froehlich, kaifr at mailserv.zdv.uni-tuebingen.de; 
Physiologisch-chemisches Institut, Hoppe-Seyler-Str. 4, 
7400 Tuebingen, Germany




More information about the Bio-soft mailing list