measuring distances by amino acid composition
Athel Cornish-Bowden
athel at ir2cbm.cnrs-mrs.fr
Thu Jan 13 08:07:54 EST 2000
Thorsten wrote:
>
>James McInerney wrote:
>>
>> Thorsten,
>>
>> One of the programs in the molphy package calculates amino acid distance
>>matrices.
>> There is a reference ot it on Joe Felsenstein's webpage:
>>
>> http://evolution.genetics.washington.edu/
>
>Thanks. I have MOLPHY. However, the program PROTST.EXE does not estimate
>distances but plain aa composition. I would like to estimate distances
>based on the aa compositions of the different proteins.
>
>> I can send you a macintosh version of my program GCUA that will do this,
>> although it needs fasta-formatted protein-coding DNA sequences as input
>> (it converts to proteins and then calculates a distance matrix).
>
>Unfortunately, I don't have a Mac. UNIX or DOS are always welcome.
>
Nice to see that anyone is still interested in doing this after all these
years. I long ago gave up trying to convince people that aa compositions
contained useful information. Unfortunately I don't have a program, but
writing one would be trivial (a few minutes work) if you start with one
that can read sequences. What you need to calculate is 0.5*Sum(square(niA -
niB)), where niA is the number of residues of type i in sequence A, niB is
the same in sequence B, and the sum is over all types of residue. If the
two sequences have the same lengths the result is an estimate of the number
of differences between the aligned sequences. If the lengths are
appreciably different the formula is more complicated, but still easily
programmable (see J. theor. Biol. 76, 369-386 (1979)). This paper
contained an error in the analysis of the statistical properties of the
index defined that was not corrected until a Note Added in Proof on p. 75
of vol. 91 of Methods Enzymol. (1983). Its effect is that earlier
references to 95% confidence actually meant 92.5% confidence.
Athel Cornish-Bowden
