DNA sequencing....

Craig F. Reese cfreese at super.org
Thu Apr 2 14:03:14 EST 1992

On the topic of DNA sequencing from bionet.biology.computational & 

>  I am posting this in several other newsgroups as well because I have
>not detected a lot of activity in this newsgroup.
>  I am a graduate student in Computer Science working on the DNA
>sequencing problem for Salmonella.  I am not a  biologist, though I
>have had a little biology in the past.  I would like to get in touch
>with anyone else who is trying to use parallel computation (with
>approximate string matching techniques) for producing a superstring
>from a given set of strings of DNA.
>  As I said, I am just starting in this area and any help would be
>appreciated, e.g. pointers to recent work in this field, references
>  Replies can be posted or sent by e-mail to snayar at cs.umr.edu.
>Sanjay Nayar

(I'm assuming that the original message meant "sequence comparison"
as opposed to "sequencing."  If not, then my appologies.  I'm also
leaving out the multiple sequence alignment stuff....)

I'm am by no means the best person to answer this question but maybe I
can get the discussion going.  I would very much like to learn more
about this area.  In particular _I_ am interested in the problem of
searching a large (and I mean LARGE) database and identifying similar
sequences.  I would like to know what techniques would be of most use
to the biological community.  (I'm more interested in what techniques
people would like to use if they had the resources rather than which
programs they are currently running on their PCs because that's the
kind of machine they can get access to.)  I have played with some
straighforward dynamic programming approaches on Suns and CM-2 but have
not taken that work to it's full conclusion.

Anyway, here are some references I have collected.  This is not necessarily 
comprehensive or representative of the field.  It's just the stuff I've
stumbled across trying to find something in this area to work on:

Altschul..., "Basic Local Alignment Search Tool," J. Mol. Biol., 1990,
pp. 403-410.

Edmiston..., "Parallel Processing of Biological Sequence Comparison
Algorithms," Intl. Journal of Parallel Programming, 1988, Vol. 17, 
No. 3.

Jones..., "Protein Sequence Comparison on the Connection Machine CM-2,"
Computers and DNA (Santa Fe Inst. Vol. 7), Addison Wesley, 1990.

Hirschberg, "A Linear Space Algorithm for Computing Maximal Common 
Subsequences," Communications of the ACM, June 1975, Vol. 18, No. 6.

Landau, Vishkin, Nussinov, "Fast Alignment of DNA and Protein Sequences,"
Methods in Enzymology, Vol. 183, 1990(?)

Needleman and Wunsch, "A General Method Applicable to the Search for 
Similarities in the Amino Acid Sequence of Two Proteins," J. Mol. Biol.,
1970, pp 443-453.

Pearson and Lipman, "Improved Tools for Biological Sequence Comparison,"
Proc, Natl. Acad. Sci., Vol. 85, 1988, pp. 24444-24448.

Sellers, "On the Theory and Computation of Evolutionary Distances," SIAM
J. Appl. Math., Vol. 26, No. 4, June 1974.

Smith and Watterman, "Identification of Common Molecular Subsequences,"
J. Mol. Biol., 1981, pp 195-197.

Smith and Waterman, "Comparison of Biosequences," Advances in Applied 
Mathematics 2, 1981, pp 482-489.

Waterman, "General Methods of Sequence Comparison," Bulletin of 
Mathematical Biology, 1984, Vol. 46, No. 4, pp 473-500.

Waterman and Eggert, "A new Algorithm fo r Best Subsequence Alignments
with Application to tRNA-rRNA Comparisons," J. Mol. Biol, 1987, pp 723-728.


I do not have (convienent) access to the biosci groups so if you want to
make sure I see something please followup to comp.parallel or send 
direct Email...

*** The opinions expressed are my own and do not necessarily reflect 
*** those of any other land dwelling mammals or my management....

"The problem ain't what we don't know; it's what we know that just ain't so
Either we take familiar things so much for granted that we never think about 
how they originated, or we "know" too much about them to investigate closely."
Craig F. Reese                           Email: cfreese at super.org
Institute for Defense Analyses/
Supercomputing Research Center
17100 Science Dr.
Bowie, MD  20715-4300

More information about the Comp-bio mailing list