BIONET.MOLBIO.GENE-LINKAGE FAQ

Dean Flanders dean at lenti.med.umn.edu
Thu Mar 30 16:53:06 EST 1995


    BIONET.MOLBIO.GENE-LINKAGE FAQ
    
    FAQ administrative information 
     * Where can I obtain and/or access the bionet.molbio.gene-linkage
       FAQ? 
     * Who created the bionet.molbio.gene-linkage FAQ? 
     * How can I help improve this FAQ? 
     * Contributors to this FAQ.
       
    Information Resources
     * What anonymous FTP sites have programs/utilities useful for
       linkage analysis? 
     * What books are helpful when learning about linkage analysis? 
     * What WWW sites have useful linkage information? 
     * What gopher sites have useful linkage information?
     * What "linkage centers" make information and assistance available
       to researchers? 
     * What journals are useful for linkage analysis? 
     * What courses are offered in linkage analysis?
       
    Gene- linkage software overview
     * What database management programs do people use for linkage data? 
     * What programs are available for pedigree drawing? 
     * What linkage analysis helper programs are available?
     * Why are some programs used primarily for chromosome mapping, while
       others are used for disease mapping? 
     * What programs are used for chromosome mapping? 
     * What programs are used for disease gene mapping? 
     * What programs are available for running genetic simulations?
     * What programs are available to help detect errors in linkage data?
     * What programs help me recode genetic markers?
       
    Linkage package specific information
     * How do I get my CEPH data into CRI-MAP format?
     * How do you calculate MAXHAP? 
     * When should you use binary coding instead of numeric allele
       coding?
     * What do you do when allele frequencies do not add up to 1; for
       example, when alleles are not present in a pedigree under study?
     * I use LINKAGE and/or FASTLINK. Which references should I include
       in my papers? 
     * What is recoding of alleles all about anyway?
       
    Computer administration and optimization
     * How can I increase the speed of the LINKAGE/FASTLINK package on my
       workstation?
       
    FAQ ADMINISTRATIVE INFORMATION
     * Where can I obtain the bionet.gene-linkage FAQ?
       
   It is available by anonymous FTP from lenti.med.umn.edu in
   /pub/linkage. The best way to view the FAQ is via the WWW, from
   http://lenti.med.umn.edu/linkage.html . The FAQ is also available
   via gopher at lenti.med.umn.edu in /Biologically Related
   Information/Linkage Analysis. The FAQ will also be posted in the
   USENET groups bionet.molbio.gene-linkage and news.answers regularly.
   
     * Who created the bionet.molbio.gene-linkage FAQ?
       
   Darrell Root (rootd at ohsu.edu) originally started the
   bionet.molbio.gene-linkage FAQ in May of 1994 in an attempt to share
   information and experiences that may be of use to other people
   involved in linkage analysis. I am Dean Flanders
   (dean at lenti.med.umn.edu), the current maintainer of the FAQ, and began
   my tenure in December of 1994. The FAQ will never serve as a short
   course in linkage analysis, but instead it will ideally be a place to
   help beginners get started in the area and to help experts not make
   the same mistakes as others. All of the information in this FAQ by no
   means comes completely from Darrell or me, but from a large number of
   people that work in the area of linkage analysis. Their names are
   listed at the end of this section of the FAQ.
   
     * How can I help improve this FAQ?
       
   Feel free to send any information that you think would be beneficial
   for other people who are just beginning in linkage or have been doing
   linkage for years to linkage at lenti.med.umn.edu. Also, if there is
   information you would like to see or errors in this FAQ please let us
   know by sending email to linkage at lenti.med.umn.edu. If you would like
   to see something changed or added to the FAQ please to send it in a
   format that can be quickly incorporated into the FAQ, such as
   correcting the errors in the section of the FAQ and emailing it back
   to the FAQ maintainer.
   
     * Contributors to this FAQ:
       
   David Adler, John Attwood, Michael Boehnke, Don Bowden, Michael
   Braverman, Young B Choi, Dave Curtis, Peter Doris, Bennett Dyke, David
   Featherstone, Dean Flanders, Rob Harper, Pierre Janssens, David
   Kikuchi, Tim Little, Tara Matise, Eli Meir, Mike Miller, Jurg Ott,
   Darrell Root, Robert Stodola, Ellen Wijsman, Matthias Wjst, and Kim
   Worley.
   
    INFORMATION RESOURCES
     * What anonymous-FTP sites have programs/utilities useful for
       linkage analysis?
       
   At present there is no one site that serves as a repository for all
   linkage software. So the best way of finding FTP site information is
   to read the software package information below, which should provide
   all of the necessary FTP information.
   
     * What books are helpful when learning about linkage analysis?
       
   Bishop, M. J. “Guide to Human Genome Computing.” Academic Press, 1994.
   
   Davies, K. E. "Human Genetic Diseases - A Practical Approach." IRL
   Press, Oxford England and Washington, D.C., 1986.
   
   Dracopoli, N. C., Haines, J. L., Korf, B. R., Moir, D.T., Morton, C.
   C., Seidman, C. E., Seidman, J. G., Smith, D. R. “Current Protocols in
   Human Genetics.” John Wiley and Sons, Inc., USA, 1994.
   
   Khoury, M. J., Beaty, T. H., and Cohen, B. H. “Fundamentals of Genetic
   Epidemiology.” Oxford University Press, 1993.
   
   Ott, J. “Analysis of Human Genetic Linkage.” Johns Hopkins University
   Press, 1991.
   
   Terwilliger, J. D. and Ott, J. “Handbook of Human Genetic Linkage,”
   Johns Hopkins University Press, 1994.
   
   Thompson, E. A. “Pedigree Analysis in Human Genetics.” Johns Hopkins
   University Press, Baltimore and London, 1986.
   
     * What WWW sites have useful linkage information?
       
   This is in no way an attempt to list the explosion of WWW sites of
   biological interest on the Internet, but it is a listing of some of
   the major ones and ones of particular interest in linkage analysis.
   
   http://www- bprc.mps.ohio-state.edu/cgi-bin/hpp?genetics.html, is a
   very comprehensive listing of resources available on the Internet in
   the area of genetics. In particular there are links to many of the
   genome centers on the Internet.
   
   http://lenti.med.umn.edu/linkage/linkage.html, which is serving as
   linkage analysis home page, will have links to all of the WWW sites
   listed as well as gopher servers and a hypertext version of the FAQ.
   
   http://www.genethon.fr, the Genethon Center, Genethon’s home page.
   
   http://www.chlc.org, the Cooperative Human Linkage Center, CHLC’s home
   page.
   
   http://gdbwww.gdb.org has a version of GDB available and access to
   OMIM.
   
   http://www.pathology.washington.edu has human and mouse standard
   idiograms. The idiograms are useful for making illustrations for gene
   mapping and for constructing abnormal chromosomes. The PostScript
   idiograms can be manipulated band by band with illustration software
   such as Adobe Illustrator, Aldus FreeHand, Canvas, and Altsys
   Virtuoso.
   
   http://diamond.gene.ucl.ac.uk gives access to John Attwood’s software
   on his FTP server as well as local items and the chromosome 9 home
   page. Also, it has the latest versions of Dave Curtis’ software.
   
     * What gopher sites have useful linkage information?
       
   There is one that will be maintained with links to other gophers of
   interest in linkage analysis, as well as links to other gopher servers
   of biologically related information. It is at lenti.med.umn.edu, and
   the path to it is Biologically Related Information/Genetic Linkage
   Analysis.
   
     * What "linkage centers" make information and assistance available
       to researchers?
       
   One such center is the Cooperative Human Linkage Center (CHLC). The
   goal of this center is to generate a high resolution map of the human
   genome and rapidly distribute this information to the genome
   community. They are in the process of identifying more human markers
   and developing high resolution framework maps. One can obtain
   information about CHLC from via gopher from gopher.chlc.org ,
   http://www.chlc.org , ftp://ftp.chlc.org , info-server at chlc.org, or
   help at chclc.org. Among other things, CHLC provides primer selection and
   linkage analysis via email. Information on those services can be found
   by sending email to: primer- server at chlc.org and
   linkage-server at chlc.org.
   
   David Featherston (davidf at caos.kun.nl) from the Dutch EMBnet Node is
   starting a linkage analysis service: software availability,
   support/advice initially, possibly training, and perhaps consultancy.
   At present they have MapMaker/EXP 3.0b, MapMaker/QTL 1.1, Lathrop and
   Lalouel's LINKAGE package, and Schaffer’s FASTLINK package. This means
   that if users have Genomics Package accounts at the CAOS/CAMM Center,
   they can use these programs on their fast computers to analyze their
   data sets. Please contact David Featherston if you are interested in
   more information about such an account.
   
     * What journals are useful for linkage analysis?
       
   American Journal of Human Genetics, Computer Applications in
   Biosciences (CABIOS), Genomics, Genetic Epidemiology, Human Genome
   News (available by gopher from gopher.gdb.org), Human Genome Project
   Journal, Human Heredity, Journal of Computational Biology, Nature
   Genetics.
   
     * What courses are offered on linkage analysis?
       
   There are three primary courses offered throughout the year on human
   linkage analysis. One is a four day course offered once a year at Duke
   University. This intensive course centers on mapping human genetic
   diseases. The concentration is on the entire disease mapping process,
   including clinical classification, pedigree collection, molecular
   genetic analysis, statistical analysis, and gene characterization. The
   course emphasizes the global decision-making process, rather than
   details of specific techniques. For more information write to Genetic
   Methods Course; c/o Dr. Margaret Pericak-Vance; Division of Neurology,
   Box 2900; Duke University Medical Center; Durham, NC 27710, or you can
   send e-mail to genclass at genemap.mc.duke.edu. The remaining two courses
   are both offered by Jurg Ott on the software used for human linkage.
   One is a beginner’s course, and the other an advanced course for those
   familiar with the linkage analysis software. These courses are offered
   several times throughout the year and you can get more information by
   contacting Katherine Montague/Jurg Ott; Columbia University, Unit 58;
   722 West 168th Street; New York, NY 10032. In addition you can fax to
   (212)568-2750 or call (212)960 2507 or email jurg.ott at columbia.edu for
   more information.
   
    GENE- LINKAGE SOFTWARE OVERVIEW
     * What database management programs do people use for linkage data?
       
   One must be aware that some pedigree drawing software can also serve
   as databases for data as well as drawing pedigrees, see the next
   question in the FAQ for a description of those packages.
   
   CEPH DBMS: The CEPH DataBase Management System is specifically
   designed for chromosome mapping with CEPH style pedigrees. It can
   output data in ped.out format for the LINKAGE package. This program
   can now be picked up via anonymous FTP from ftp.cephb.fr in
   pub/ceph_genotype_db.
   
   DOLINK: This DOS custom database program by D. Curtis manages genetic
   data and sets up input files for linkage analysis. It is available
   from ftp.bchs.uh.edu. The DOS and Windows versions of DOLINK program
   help manage genetic data and setup analysis. It is available with the
   C++ source allowing compilation on Unix host running X and possibly a
   Macintosh.
   
   File Express: This is a DOS shareware database which can be used to
   hold data for DOLINK (largely superseded by QDB). It is available as
   fe51-a/b/c.zip via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
   LABMAN and LINKMAN: These are linkage analysis databases for holding
   linkage data and exporting it in various formats for linkage analysis.
   They are available via anonymous FTP from lenti.med.umn.edu in
   /pub/linkage/labman. These databases were developed by P. Adams of
   Columbia University.
   
   LYNKSYS: This custom-made database program was written by J. Attwood
   and S. Bryant. Although they continue to use it, J. Attwood suggests
   using DOLINK instead. LINKSYS is not currently available at any FTP
   sites.
   
   Map Manager: It is a program for the Macintosh which helps analyze the
   results of genetic mapping experiments using backcrosses,
   intercrosses, or recombinant inbred strains. In addition it also has
   tools for statistical analysis of experiments. The program was created
   by K. F. Manly at the Roswell Cancer Institute and is available via
   FTP from mcbio.med.buffalo.edu in /pub/MapMgr.
   
   QDB: This is a database program available as DOS and Windows versions
   and with C++ source allowing compilation for X and possibly Macintosh.
   It is available as qdb16a.zip via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
     * What programs are available for pedigree drawing?
       
   One of the tricks of managing individuals in a mapping study is trying
   to get the database you are using to export your family data in a
   format acceptable for input into pedigree drawing programs. The
   marriage between these two can be of great assistance. However, some
   pedigree drawing programs have databases as a part of the package.
   
   CYRILLIC: This is a pedigree editor for Windows with facilities for
   including marker data which you can then have it output the input
   files for LINKAGE. It is Windows-based, so input of the pedigree is
   very efficient. You also have a data form associated with each
   individual where you can store names and other pertinent data. It also
   has the ability to interface with most standard PC databases.
   
   FTREE: This is a DOS pedigree program written by R. Go at the
   University of Alabama.
   
   GENETREE: GeneTree 1.0 is a DOS package which provides a convenient
   way to draw family tree diagrams suitable for genetics or genealogy.
   The package consists of the GeneTree program, which draws pedigree
   diagrams using a command language; and SC, using a menu driven program
   that facilitates creation of GeneTree commands. GeneTree and SC are
   made available with program manuals, examples of family tree diagrams,
   and a GeneTree Quick Reference Guide. GeneTree is written in C. Note
   that it is a DRAWING program and does not compute genetic parameters.
   The GeneTree program is available from wijsman at max.u.washington.edu at
   a price of $125 (because of licensing fees from a private company
   which wrote one of the drivers used in the program).
   
   KINDRED: This new DOS database program, distributed by Epicenter
   Software, is specifically designed for linkage analysis. A free demo
   is available by calling (818)-304-9487. In addition to database
   duties, this program will draw pedigrees, haplotype marker data, and
   can output data in LINKAGE format.
   
   PEDPAK: This package is designed to handle large datasets for animals.
   The package was written and distributed by Alan Thomas, who is in
   Bath, England. The software is not public domain and must be
   purchased.
   
   PEDDRAW: It is a Macintosh based program, written by B. Dyke, P.
   Mamelka, and J. MacCleur. It is available from bdyke at darwin.sfbr.org
   or Pedigree/Draw; Department of Genetics; Southwest Foundation for
   Biomedical Research; PO. Box 28147; San Antonio, TX 78228-0147. An
   upgrade from a previous version is $10, the current version is 4.4.
   Documentation costs $10 printed and the full package including
   documentation costs $45. There is a script which converts linkage
   format to PEDDRAW available via anonymous FTP at ftp.ee.pdx.edu in
   /pub/users/cat/rootd/convert.new.
   
   PEDRAW: This program is a pedigree drawing program written by D.
   Curtis for DOS and available via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis. The most current version is called
   pedraw16.zip. A companion program to PEDRAW is PEDHELP, it is a pop-up
   help for PEDRAW.
   
   PAP: The Pedigree Analysis Package (PAP) is a set of FORTRAN 77
   programs for computing likelihoods and simulating phenotypes of
   genetic models on pedigrees. It is available via gopher from
   corona.med.utah.edu in Publicly Accessible Software, probes(sts),
   etc./software/pap.
   
     * What linkage analysis helper programs are available?
       
   CEPH2CRI: This program converts to output from the CEPH DBMS into the
   format useable in CRI- MAP. It can be found at ftp.gene.ucl.ac.uk in
   /pub/packages/linkage_utils.
   
   CLUMP: A Monte Carlo method for assessing significance of a
   case-control association study with a multi-allelic marker, available
   as DOS executable and C source. It is available as clump.zip via
   anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   EASISTAT: This is a DOS statistics package, it contains EASIGRAF which
   draws graphs of lod scores and output from FASTMAP. It is available as
   estat21.zip via anonymous FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
   ERPA: A program for carrying out nonparametric linkage analysis,
   available as DOS executable and C source. It is called erpa12.zip via
   anonymous FTP at ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   FIRSTORD: A demonstration of a method for preliminary ordering of loci
   based on two-point lod scores. It is available as DOS executable and C
   source called first11.zip from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
     * Why are some programs used primarily for human chromosome mapping,
       while others are used for human disease mapping?
       
   Any family can be used for chromosome mapping, so CEPH has picked a
   particular family "shape" and generated a large database with these
   families. Programs designed for chromosome mapping can be optimized
   for using these families, reducing the time needed for calculations.
   Only families afflicted with a disease can be used for disease gene
   mapping. As a result, programs designed for disease gene mapping need
   to be able to deal with arbitrary pedigrees. In addition, these
   programs need to be able to handle incomplete penetrance.
   
     * What programs are used for chromosome mapping?
       
   CLINKAGE: This is the special version of the LINKAGE programs for
   3-generation CEPH pedigrees and codominant markers. The PC and VAX
   versions are available by FTP from york.ccc.columbia.edu. The Unix
   version is available from corona.med.utah.edu.
   
   CHROMLOOK: It takes input files in the LIPED format. It was written by
   J. Haines.
   
   CINTMAX: This is a modified version of CILINK which permits the usage
   of different map functions in computing the likelihood. This was
   developed by D. Weeks.
   
   CRI-MAP: This program has been used for chromosome mapping for years.
   It has options which can generate maps, calculate order probabilities,
   and printout recombination data. It works on .gen files with data from
   CEPH style families. It is written in K&R type C code, and the author
   Phil Green has successfully ran it on Unix, DOS, VMS, and Macintosh
   systems. It is not available via anonymous FTP. Phil Green distributes
   CRI-MAP freely ONLY to academics/academic institutions. Contact him
   at: Phil Green, Dept. of Genetics; Box 8232; Washington University
   School of Medicine; St. Louis, MO 63110; USA; Phone (314) 362-5192;
   Fax (314) 362-4137; or email phg at u.washington.edu.
   
   FASTMAP: This program produces quick approximation to multipoint lod
   score, available as a DOS executable and C source as fstmap11.zip from
   ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   MULTIMAP: This LISP based expert system uses an customized version of
   CRI-MAP to create a chromosome map. It is available via anonymous FTP
   from genome1.hgen.pitt.edu. The authors T. Matise, M. Perlin, and A.
   Chakravarti continue to improve the code, add new functions, and
   provide excellent support. When used with the CRI-MAP chrompic option
   (to find double-recombinations to identify possible errors), it is
   incredibly useful. This is Unix-only (supported for DEC-Ultrix,
   HP9000, and Suns). The customized CRI-MAP version (called LISPCRI) is
   distributed at the FTP site, but was not meant to be used
   independently of MULTIMAP.
   
   MAPMAKER: Dr. Eric Lander; Whitehead Institute; 9 Cambridge Center;
   Cambridge, MA 02142; mapm%mitwibr at mitvma.mit.edu.
   
     * What programs are used for disease gene mapping?
       
   APM: Affected Pedigree Member package by Dan Weeks is available via
   anonymous FTP from watson.hgen.pitt.edu. The Affected Pedigree Member
   Method distribution contains the new APM programs, a new file
   conversion utility, and a histogram/statistics generator. To build the
   entire distribution, you need C, Pascal, and FORTRAN compilers, and a
   make utility is also helpful. The programs which are built include:
   APM, a program to calculate the single locus statistic over one or
   several marker loci; SIM, a program to simulate pedigrees and, using
   output files of APM, test for asymptotic normality of the null
   distribution; APMMULT, a program to generate the multilocus statistic;
   SIMMULT, a program like SIM but which simulates recombination and uses
   the output of APMMULT; CHAPM, a program to convert LINKAGE files to
   APM files, or APM files of one format to APM files of another format;
   and HIST, a program to compute various statistical figures, plot a
   histogram, and compute empirical p-values.
   
   ESPA: This is a program used for extended sib pair analysis. It comes
   in a DOS version and can only look at markers containing 5 alleles. It
   was written by Lodeijk Sandkuijl and can be obtained by writing to him
   at Voorstraat 27; Delft 2611 JK; THE NETHERLANDS.
   
   FASTLINK: This is a port of the LINKAGE package to C by A. Schaffer,
   R. Cottingham, and R. Idury. The initial port increased the speed by
   an order of magnitude and they continue to optimize the algorithm and
   code, resulting in continued speed improvements. In addition, FASTLINK
   allows you to compile in "fast" or "slow" mode (the slow version of
   FASTLINK is still much faster than the old linkage programs). The
   "fast" version uses lots of memory, but uses that memory to contain
   some of the intermediate results which are repetitively recalculated
   in the "slow" version (and the old linkage package). Good results can
   be obtained by setting up 300 megs of virtual memory on a Unix
   workstation and using the “fast” version.
   
   GAS: It provides facilities for reading, writing, sectioning and
   performing statistical analyses on phenotypic and genotypic data and
   one of its features is sib pair analysis. It has been developed within
   the Department of Medicine at Oxford University and is available via
   FTP from well.ox.ac.uk in the directory pub/genetics/gas.
   
   GREGOR: It is a piece of DOS based software for producing simulated
   genetic data. It does not perform linkage analysis, but it may be
   useful for testing methods or assumptions about linkage analysis.
   GREGOR is operated by a series of hierarchical menus that permit the
   user to define hypothetical genetic scenarios (gene positions and
   effects) and produce simulated data-sets for a variety of population
   structures. GREGOR is available by FTP from the site
   sifon.cc.mcgill.ca in pub/McGill-Contrib. Questions should be directed
   to the authors tinker at agradm.lan.mcgill.ca or
   mather at agradm.lan.mcgill.ca.
   
   LINKAGE: This package of programs was developed by M. Lathrop with
   help from J. M. Lalouel, C. Jlier, and J. Ott. The LINKAGE package
   consists of several analysis and several utility programs. Versions
   are available for DOS, OS2, VAX, and Unix platforms. Here are some of
   the analysis programs: MLINK: 2-point lod-score calculations at fixed
   recombination distances; LINKMAP: multipoint lod score calculations at
   fixed distances; ILINK: calculates the recombination distance with the
   highest lod-score. Unix versions are available from
   corona.med.utah.edu, DOS and VMS versions are available from
   york.ccc.columbia.edu, or on floppy disks, when you write to:
   Katherine Montague/Jurg Ott; Columbia University, Unit 58; 722 West
   168th Street; New York, NY 10032. Send pre-formatted DOS disks if you
   request linkage by mail. You can send email to jurg.ott at columbia.edu
   if you need more information regarding mail requests for the LINKAGE
   package.
   
   LIPED: This DOS program written by J. Ott calculates probabilities for
   linkage between disease markers and genetic markers. Its input file
   differentiates between phenotypes and genotypes. As a result, this
   program is easiest to use when your data is from "old-style"
   genetic-markers (such as blood phenotype data). This was one of the
   first programs to do linkage analysis calculations, the LINKAGE
   package is more commonly used now.
   
   SAGE: Statistical Analysis Package for Genetic Epidemiology is
   composed of 18 programs: AGEON: Estimating the Distribution of
   Age-of-Onset, ASSOC: marker-trait Associations in Pedigree Data,
   BCROSS: Genetic Hypothesis for Quantitative Data on Inbred strains,
   their F1 and Backcross(es), CLUSTR: Power Transformation to Obtain
   Normality and Homoscedasticity from Clustered Data, FCOR: Family
   Correlations, FSP: Family Structure Program, LODLINK: Lod Score
   Linkage Analysis, MAPLOC: Mapping a Disease Related Trait Relative to
   a Set of Linked markers, MAXFUN: Function maximization Subroutine,
   REGC,REGD,REGTL,REGTN: Segregation Analysis Programs, RELATE:
   Relationship to Proband, SIBPAL: Sib-Pair Linkage Analysis, and
   DBSORT, RENUM, SPLIT: Toolkit Programs. Author Dr. R.C. Elston,
   address Department of Biometry and Genetics; Louisiana State
   University Medical Center; 1901 Perdido Street; New Orleans, Louisiana
   70112, USA. The email contact address is sage at haldne.biogen.lsumc.edu.
   It is available for the following operating systems: VAX, SunOS 4.1.x,
   Apple Macintosh II, and DOS. This program is not shareware and must be
   bought.
   
     * What programs are available for running linkage simulations?
       
   FASTSLINK: This is program is just like SLINK (see SLINK below), but
   it utilizes the enhancements incorporated into FASTLINK. It is
   available via anonymous FTP from watson.hgen.pitt.edu.
   
   SIMAPM: Is the SLINK based simulation program for the APM package.
   This represents a hacked together package which only runs under a Unix
   system. You will need FORTRAN, Pascal, and C compilers to use this
   package. It is available via anonymous FTP from watson.hgen.pitt.edu
   
   SIMLINK: This FORTRAN program developed by L. Ploughman and M. Boehnke
   simulates linkage analysis on a family, and gives you an estimate the
   probability, or power, of detecting linkage in a given family. It
   allows the researcher to determine whether a family has sufficient
   informativeness to detect linkage. SIMLINK requires large quantities
   of memory. It was written for DOS, but has been ported to many
   platforms. It is available from: Michael Boehnke; Department of
   Biostatistics; School of Public Health; University of Michigan; Ann
   Arbor, MI 48109-2029. No postage-money or blank disks are necessary to
   get SIMLINK sent to you. SIMLINK may be available via anonymous FTP
   soon. For further information send email to
   michael.boehnke at um.cc.umich.edu.
   
   SLINK: It is a Pascal program developed by D. Weeks, M. Lathrop, and
   J. Ott. It is similar to SIMLINK. It is more general than SIMLINK in
   that it allows for partial marker typing at the locus to be generated,
   but it runs slower than SIMLINK. It is available from
   york.ccc.columbia.edu and watson.hgen.pitt.edu or on floppies (use the
   same address as for LINKAGE).
   
     * What programs are available to help detect errors in linkage data?
       
   Typically the linkage packages in and of themselves will detect errors
   in linkage data that are obvious, such as impossible phenotypes and
   genotypes, and obvious errors in pedigrees. Typically the programs
   will just grind to halt and allow you to fix the error, and try again
   until you finally succeed. However, errors that “make sense” to
   linkage programs will not be detected.
   
     * What programs help me recode genetic markers?
       
   DOLINK can downcode alleles automatically. However, the main use of
   DOLINK is to prepare files for LINKAGE from a database. In addition P.
   Adams package LABMAN and LINKMANE have features for the recoding of
   alleles.
   
    LINKAGE PACKAGE SPECIFIC INFORMATION
     * How do I get my CEPH data into CRI-MAP format?
       
   You can output the file in linkage format and use link2gen in CRI-MAP.
   The disadvantage here is that your marker names are separated from
   your data and it’s easy to make a mistake and get them mixed up. You
   can output the file in ped.out format and use CEPH2CRI mentioned above
   in the FAQ to do the conversion as well.
   
     * How do you calculate MAXHAP?
       
   MAXHAP is the maximum possible number of haplotypes in your analysis.
   You multiply together the number of alleles at each locus used in a
   particular run; not all loci in your dataset, just the loci you are
   using in that particular calculation. Remember that the affection
   status counts as two alleles, regardless of the number of liability
   classes. For example, if a dataset has the following information: the
   liability classes, marker A has 3 alleles, marker B has 4 alleles, and
   marker C has 5 alleles and your run includes a LINKMAP run between
   affection status, marker A, and marker B, then your MAXHAP must be at
   least 2*3*4=24.
   
     * When should you use binary coding instead of numeric allele
       coding?
       
   Usually there is no advantage to coding disease loci as either binary
   or numeric using liability classes. Generally, binary coding is more
   complex in that we humans often have a hard time thinking that way.
   Some of the codominant phenotypes lend themselves to binary coding;
   for example, ABO blood types: A (101), B (011), O (001), AB (111), and
   unknown (000). Since you cannot distinguish AO from AA at the
   phenotype level you code both genotypes as (101), presence of A and O.
   In reality O represents absence of both A and B. However, do not code
   using (000), since it would be an unknown. Use of binary codes has
   decreased since DNA markers have come into use since they allow one to
   type an individual with respect to genotype. You can use binary codes
   if you have phenotypic data which does not allow for the
   discrimination of the underlying genotype exactly, and one can code it
   as the presence with 1 or absence with 0 of factors such as the A and
   B antigens. Binary codes allow the representing loci with codominant
   and dominant mode of inheritance, while allele number notation is good
   only for codominant loci. Few people use binary factor notation. They
   either use allele numbers for codominant loci, or affection status
   notation for dominant loci. The main reason why binary factor notation
   is still currently used is that CEPH’s database is in that notation.
   
     * What do you do when allele frequencies not add up to 1, for
       example, when alleles are not present in a pedigree under study?
       
   The best approach is to specify n+1 alleles, where there are n alleles
   actually observed in the pedigree. Use the correct allele frequencies
   for the n alleles, and for the n+1 allele, use 1 minus the sum of the
   frequencies of the observed alleles.
   
     * I use LINKAGE and/or FASTLINK, what references should I cite in my
       papers?
       
   FASTLINK users should cite:
   
   Cottingham, R. W. Jr., Idury, R. M., and Schaffer, A. A. “Faster
   Sequential Linkage Computations.” American Journal of Human Genetics.
   53:252-263, 1993.
   
   Schaffer, A. A. , Gupta, S. K., Shriram, K., and Cottingham, R. W. Jr.
   “Avoiding Recomputation in Linkage Analysis”. Human Heredity.
   44(4):225-37, 1994 Jul-Aug.
   
   In addition, all FASTLINK and LINKAGE users should also cite the
   LINKAGE papers:
   
   Lathrop, G.M., Lalouel, J.M., Julier, C. , and Ott, J. “Strategies for
   Multilocus Analysis in Humans.” PNAS. 81:3443-3446, 1984.
   
   Lathrop, G.M. and Lalouel, J.M., “Easy Calculations of LOD Scores and
   Genetic Risks on Small Computers.” American Journal of Human Genetics.
   36:460-465, 1984.
   
   Lathrop, G.M., Lalouel, J.M., and R. L. White. “Construction of Human
   Linkage Maps: Likelihood Calculations for Multilocus Analysis.”
   Genetic Epidemiology. 3:39-52, 1986.
   
     * What is recoding of alleles all about anyway?
       
   One of the problems with highly polymorphic markers is that they can
   increase the computational requirements of the computers by several
   orders of magnitude due to the large number of alleles present. This
   can put the computation of some lod scores out of reach for DOS
   computers and take many days on higher end systems. So it is important
   to use methods that reduce the number of alleles, and recoding will
   reduce the number of alleles in your calculations.
   
   The method of recoding of alleles described by J. Ott in the Annals of
   Human Genetics, 42:255-257 (1978) works very well, but can only be
   done when the mode of inheritance of the disease is known. An article
   inspired by Ott’s original work written M. Braverman in Computers and
   Biomedical Research, 18:24-36 (1985) extends the recoding of alleles
   in two ways: 1) it allows for pedigrees of arbitrary structure, and 2)
   it allows for missing/partially known marker phenotypes. It is usually
   possible to recode marker alleles to some extent even if the mode of
   inheritance of the disease is not known since what is still desired
   with respect to the marker is a labeling which preserves the available
   information about the source of each marker allele. It is important,
   however, where the full ancestry of alleles cannot be traced in a
   pedigree, that the recoded alleles maintain the allele frequencies
   appropriate to the original alleles. In a complex disorder, this may
   not be possible.
   
   Another method is if the marker in question has 14 alleles in the
   general population, but only 9 alleles in the study population, it is
   possible to collapse the functional number of alleles to 9 or 10.
   Usually, adjust the allele frequencies to sum to 1 by dividing each
   allele frequency by the sum of the (observed) allele frequencies. For
   the latter all the allele frequencies remain the same, but the
   unobserved ones are collapsed into a single allele (and frequency). If
   there are 9 observed alleles (but there are 14 in the population),
   then rescaling the frequencies of the observed 9 alleles will also not
   produce quite correct results. Consider the unlikely example of a huge
   pedigree with only the most recent generation observed in which the
   observed 9 alleles all have very low and equal frequency. If there are
   distantly separated relatives who are affected there is some
   reasonable support for linkage since the alleles are rare. But if we
   rescale frequencies to 1/9 per alleles, then sharing of alleles isn't
   so unlikely. Coding the marker with 10 alleles produces correct
   results as it will produce the same lod scores as would coding the
   marker with 14 alleles.
   
    COMPUTER ADMINISTRATION AND OPTIMIZATION
     * How can I increase the speed of the LINKAGE/FASTLINK package on my
       workstation?
       
   1. Use FASTLINK, which is the C version of the LINKAGE package with a
   few algorithmic improvements. It can increase the speed of your
   calculations by an order of magnitude.
   
   2. Setting up lots of paging space, which uses the hard drive as
   virtual memory (300 megs is usually plenty). Note that paging space is
   the same as swap space. Then use the "fast" versions of FASTLINK.
   
   3. Use GCC, which is the GNU/Free Software Foundation C compiler, to
   compile FASTLINK. GCC produces machine language that is about 10%
   faster than Sun's C compiler.
   
   4. Install the generic small kernel instead of the generic kernel. The
   generic kernel has device files for almost everything, and can slow
   the system down. The generic small kernel is configured for a system
   without many devices and without many users. Installing a generic
   small kernel is an option during system installation on Sun
   workstations.
   
   5. Reconfigure your kernel so it has only devices you need. This
   should give you a small improvement in overall system speed, but if
   you are already running the generic-small kernel, additional
   improvement may be so small that it's not worth the trouble. If the
   generic small kernel is insufficient for your system this step is a
   must. The generic kernel will slow down your workstation significantly
   and most of the device support is unnecessary.
   
   6. Don't run your linkage analyses in the background, because running
   programs in the background gives them a lower priority. Either do the
   runs in the foreground or you can use the root password to nice the
   pedin process by -3 to compensate (negative nice values give a higher
   priority). If you need to log out, you can use the screen command and
   "detach" a session so you can log out without programs terminating.
   Later you can log back in and "reattach" the session, which continued
   to run while you were logged out. The screen command is available at
   prep.ai.mit.edu and is also on the O'Reilly Unix Power Tools CD- ROM.
   According to the Sun documentation, nicing below -10 can interfere
   with the operating system and actually reduce the process' speed.
   Running them at the standard default level of 0 is usually sufficient.
   
   7. Runs with 100% penetrance can run faster than runs with incomplete
   penetrance. Of course, if you have an unaffected obligate carrier,
   this won't work. In addition, incomplete penetrance runs may be
   necessary for your research to be "good".
   
   8. Change the block size of your file system. One can increase
   performance of a file system by increasing the block size, thus
   decreasing the number of read-write operations. A block device, such
   as a hard disk, usually accesses a block of data simultaneously. Thus,
   if one is expecting to use large files, having large blocks will be an
   advantage. However, one usually trades the number of bytes lost to
   partial files since one has to increase the fragment size to a number
   larger than 1024, for example 2048. That is, each file or part of a
   file occupies 2048 bytes, a file of 100 bytes will still occupy 2048
   bytes. Therefore, bigger blocks give faster bigger blocks with bigger
   fragments and more lost space.
   
   9. It has been noted that you can increase the speed of programs which
   create/access large files in the /tmp directory by creating a tmpfs
   file system.
   
   10. Of course, buying more RAM will increase your speed. It’s been
   said that increasing RAM from 16 to 32 megs will result in a large
   increase in speed and increasing RAM from 32-64 megs will result in a
   significant increase. However, increasing beyond 64 megs is not
   particularly helpful.





More information about the Gen-link mailing list