I have a long sequence alignment of roughly 500 sequences. I am
calculating a certain property for each sequence (say charge), and I
want to find the mean value and the standard deviation of this property
over all the sequences. All the sequences are aligned with reference
to a key sequence of known structure. Most of the sequences (~300)
have high homology (~80%) to the key sequence and the remainder have
low homology (~20%). I want to compute statistics that weight the
low homology sequences higher. I thought of weighting each sequence
in the average by 1/(homology^2).
Is there a better weighting scheme? If it is published, what is the
appropriate literature reference.
