Phylogenetic tree of various globins

Gaston Gonnet gonnet at inf.ethz.ch
Tue Jan 12 10:27:32 EST 1993


During last December, someone requested to our automatic
server the multiple alignment, phylogenetic tree, etc.
of a large set of sequences (hemoglobins, myoglobins,
leghemoglobins, 630 sequences in total).

The job could not be run at the time due to lack of resources,
and I promised the requester to send the answer back once we
could allocate it to a machine with enough memory.  Well,
the job has been done, but I cannot find the original message,
(we must have deleted some files incorrectly) and hence cannot
send the answer back as promised.  Please contact me if you
are still interested in the results.  (Sorry to use the net for
this purpose).

At that time, the question of how large a phylogenetic tree
can be constructed, was also raised.  Our algorithm has the
following characteristics:

For the construction of a phylogenetic tree between n sequences

Storage: It uses two input matrices, both n x n, one with the
	distance between sequences and one with the variance
	of these distances.  Since these are double words, and
	this is the dominant use of storage, you will need
	16*n^2 bytes.

Time: (On a DEC5000 workstation, just the phylogenetic tree
	construction)
	n= 50	 5.3 secs		(random distances)
	n=100	22.0 secs		(random distances)
	n=200	96.9 secs		(random distances)
	n=630   31.3 mins		(tree mentioned above)

	(In the long run, ie. for much higher n, an O(n^3) term
	should dominate the time).

The algorithm approximates the unrooted tree, with variable
length branches, which minimizes the weighted sum of squares
of distance differences (between the given distance and the
"tree" distace).

For more information on these trees and multiple alignments,
send by e-mail the line "help AllAll" to cbrg at inf.ethz.ch


Gaston H. Gonnet, Informatik, ETH, Zurich



More information about the Bioforum mailing list