# analysis questions

Jeff Olsen jolsen at FISH.WASHINGTON.EDU
Wed Feb 5 01:14:26 EST 1997

Maureen,

I found similar results when I tested HWE in a pink salmon population
using a highly polymorphic microsat locus with a null allele.  The p-value
(Ho:HWE) from GENEPOP was 0.00 versus 0.99 from CHIHW.  The potential number
of genotypes in the population was 121 (26 of which were represented in my
sample).  I then did the same comparison on the same population using a
less variable locus with no null allele.  The number of possible genotypes
was 6...4 of which were represented in my sample.  The p-value (Ho:HWE) from
GENEPOP was 0.55 versus 0.57 from CHIHW.

I believe (and hopefully others will comment) the difference in results at
the first locus was due to the difference in statistical tests used by
both programs.  Specifically, GENEPOP uses the exact test (a.k.a.
probability test) that computes the probability of the observed (sample)
genotypic array given HWE.  It then creates other genotypic arrays by
sampling from the allele data set such that each new array has the same
number of each allele. A permutation algorithm is used to compute possible
genotypic arrays when the number of alleles exceeds 4.  The p-value is the
percentage of genotypic arrays with a probability equal to or less than
the sample array.  A better description may be found in Weirs' Genetic
Data Analysis II (pg 95-110) and see Guo,S.W and E.A. Thompson. 1992.
Biometrics 48:361-372.

CHIHW uses a "pseudoprobabilty" test.  First, it performs a chi square
test on the sample genotypic array.  Then is it uses the Monte Carlo
method to create new genotypic arrays from the original allele data set
(each array containing the same number of each allele). A chi square value
(rather than probability) is computed for each array.  Then, analogous to
the true probability test used by GENEPOP, CHIHW computes the p-value as
the percentage of genotypic arrays with a chi-square value equal to or
greater than the sample array.

The question is: does the use of Monte Carlo sampling and the chi-square
test (rather than computing probabilities) make CHIHW a less reliable test
of HWE when the potential number of genetypes is large?  Weir cites a
single study suggesting that's the case (pg 109).

Jeff Olsen

On Mon, 3 Feb 1997 smallm at pbs.dfo.ca wrote:

>
>
>
> Hello, I am analyzing some microsatellite population data.  I have been
> using GENEPOP to calculate Hardy-Weinberg equilibriums.  At one of my
> loci I had a few populations show up out of HWE with a p value of 0.000
> according to GENEPOP.  I ran CHIHW, another program, to compare the
> results with a different method.  These same populations were calculated
> as being in HWE with probabilities ranging from 1.0 to 0.96.  Next I tried
> CHIHW on a locus with a null allele.  The population I tested had the null
> at a frequency of over 50%, 30 observed heterozygotes and 130 expected
> heterozygotes and was obviously out of HWE with a p value of 0.000 using
> GENEPOP.  CHIHW said the populations was in HWE with a probability of 0.96
> yet the Selander D value (probability of the observed hz = expected hz)
> was 0.  Has anyone had conflicting results when comparing GENEPOP to CHIHW.
> Has anyone had populations out of equilibrium test out within equilibrium
> with CHIHW?  I would appreciate any light people can shed on these puzzling
> results.
> 	Thank you,
> 	Maureen Small
>
>