# Methods to prove 2-sets of seq. are from diff. populations

Joe Felsenstein joe at evolution.u.washington.edu
Mon Jun 7 07:00:22 EST 1993

```In article <01GYK4XD2GG20003N8 at amc.uva.nl> kuiken at amc.uva.nl (Carla Kuiken) writes:
>Hi everybody,
>I'm trying to find a way to 'prove' that 2 sets of sequences are
>from different populations.  ...
>  ... They look different, and come out separate in phylogenetic
>analysis, but bootstrapping doesn't give the distinction very high
>reliability.

The bootstrap here is probably trying to prove more than you need.
After all, some of the sequences could be the same and still the
two sets could be from different populations.  I wonder if a permutation
test would not be appropriate.  Make up some measure of how different
the two sets of sequences are (say the average of the distances between
all pairs of members in the two different sets).  Then compare the
value you get for the actual data with the distribution of values you
would get if you assigned sequences to groups at random (with the
same sequences and the same group sizes).  If the mean distance between
your two sets is in the top 5% of the distribution, then there is
significantly more than random difference between the two sets.

This would require someone to write a short program which would
assign group memberships at random (by calling sequences "1" and "2"
and then shuffling the array of 1's and 2's), and then computing the
mean distances, and toting up a histogram of them.

These kinds of permutation test are now very common in statistics.

-----
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
Internet:         joe at genetics.washington.edu     (IP No. 128.95.12.41)
Bitnet/EARN:      felsenst at uwavm

```