Large Allele numbers

wijsman at max.u.washington.edu wijsman at max.u.washington.edu
Mon Jul 11 23:35:13 EST 1994


>> In my group we are scanning the human genome for genes responsible for a 
>> complex disease.  Not too far into the search, we have run into a few 
>> markers which have 16 or more alleles.  I have been able to modify the 
>> LINKAGE programs (v 5.2) to allow up to 14 alleles, but past that, I get 
>> compiling errors informing me that I am out of memory.  Further examination
>> tells me that the UNKNOWN program creates a matrix of the size:
>>   (maxall)*(maxall+1)/2  X (maxall)*(maxall+1)/2
>> which is too big for DOS to handle.  
>> 
>> My question is, is there any way to get around this limitation by splitting
>> up the pedigree set, or some other method?
>> 
>
>Tim Magnus writes:
> 
>Conservative renumbering will allow you to renumber each family down to
>4 alleles.  The founding parents get 1 through 4.  Each time a spouse 
>marries in, the spouse gets the two alleles missing from their mate.
>(of course - if the alleles are the same size they are numbered the same
>so you will not use all 4 alleles in every mating).
>

This type of renumbering is only possible when the genotypes in the
founders are known, which is frequently not so for complex diseases.  In
fact, in human genetics with the exception of marker mapping in CEPH-type
pedigrees, it is typical that there are some missing genotypes in founders. 
Thus the simple answer to renumber alleles usually does not fix the
problem.

>Jonathon Haines writes:

>This is a recurring problem that has been vexing the genetic linkage
>community for many years.  The basic problem is to preserve the genetic/
>segregation information while reducing the number of alleles to a range
>that allows easy computation.  The method of recoding (recycling) alleles
>described by Ott (AJHG, 1978) works very well, but can only be done when
>the mode of inheritance of the disease is known (thus allowing the recoding
>of spouses).

It is usually possible to recode marker alleles to some extent even if the
mode of inheritance of the disease is not known since what is still desired
with respect to the marker is a labelling which preserves the available
information about the source of each marker allele.  It is important,
however, where the full ancestry of alleles cannot be traced in a pedigree,
that the recoded alleles maintain the allele frequencies appropriate to the
original alleles.

>In a complex disorder, this may not be possible.  If the marker
>in question has 14 alleles in the general population, but only 9 alleles
>in the study population, it is possible to reduce the functional number of
>alleles to 9 or 10.  For the former, we usually adjust the allele
>frequencies to sum to 1 by dividng each allele freqeucny by the sum of
>the (observed) allele frequencies.  For the latter, all the allele
>frequencies remain the same, but the unobserved ones are collapsed into
>a single allele (and frequency).

If there are 9 observed alleles (but we know there are 14 in the
population), then rescaling the frequencies of the observed 9 alleles will
also not produce quite correct results.  Consider the unlikely example of a
huge pedigree with only the most recent generation observed in which the
observed 9 alleles all have very low and equal frequency; if there are
distantly separated relatives who are affected, there is some reasonable
support for linkage since the alleles are rare.  But if we rescale
frequencies to 1/9 per alleles, then sharing of alleles isn't so unlikely. 
Coding the marker with 10 alleles produces correct results as it will
produce the same lod scores as would coding the marker with 14 alleles. 

As Jonathon noted, the multiple-allele problem is a big problem in
analysis.  The multiple allele problem became one of our biggest
bottlenecks since we were analyzing families individually to reduce the
number of alleles in the analysis.  Our partial solution was the following. 
We use LIPED instead of LINKAGE for general 2-point analyses for a number
of reasons which I won't go into.   We modified LIPED so that if we assume
a codominant marker and that alleles are labelled in a predetermined
sequence (which we force through a preprocessor program), we can reread the
specific observed alleles and their frequencies for each family.  The
program then assumes one more allele per family to account for all the
other alleles at the locus.  For genomic screening we don't do any
downcoding (although we do downcode by hand for multipoint analyses and
analyses with multi-looped pedigrees for which even 6 alleles is often too
many).  But these program modifications to allow us to process all our
families together with only the observed number of alleles (plus one) per
pedigree had an enormous effect on our ability to throughput most analyses
relatively quickly.  It is relatively unusual that we find more than 6-7
alleles in any one pedigree, which brings computation time (and memory
requirements) down to reasonable levels.  Thus for 2-point analyses
downcoding is usually not necessary.  I should note that we do our analyses
on a workstation, but I don't see any reason that the modifications we made
should not work on a PC, assuming the fortran is compatible.

Ellen Wijsman
Div of Medical Genetics, RG-25
and Dept of Biostatistics
University of Washington
Seattle, WA   98195
wijsman at u.washington.edu



More information about the Gen-link mailing list