BIONET.MOLBIO.GENE-LINKAGE.FAQ

Dean Flanders dean at lenti.med.umn.edu
Fri Mar 15 01:00:23 EST 1996


   
   
   BIONET.MOLBIO.GENE-LINKAGE FREQUENTLY ASKED QUESTIONS (FAQ) AS OF
   1996/22/96
   
   1.0) FAQ ADMINISTRATIVE INFORMATION [1995/05/18] 
   
   1.1) Where can I obtain and/or access the bionet.molbio.gene-linkage
   FAQ? [1995/03/01] 
   
   1.2) Who created the bionet.molbio.gene-linkage FAQ? [1995/03/01] 
   
   1.3) How can I help improve this FAQ? [1995/03/01] 
   
   1.4) Contributors to this FAQ. [1995/09/09] 
   
   1.5) When was the FAQ last updated? [1996/22/96]
   
   2.0) INFORMATION RESOURCES 
   
   2.1) What anonymous FTP sites have programs/utilities useful for
   linkage analysis? [1995/03/01] 
   
   2.2) What books are helpful when learning about linkage analysis?
   [1995/03/01] 
   
   2.3) What WWW sites have useful linkage information? [1996/01/02] 
   
   2.4) What gopher sites have useful linkage information? [1995/03/01] 
   
   2.5) What "linkage centers" make information and assistance available
   to researchers? [1995/12/11] 
   
   2.6) What journals are useful for linkage analysis? [1995/06/02] 
   
   2.7) What courses are offered in linkage analysis? [1995/09/09] 
   
   3.0) GENE-LINKAGE SOFTWARE OVERVIEW 
   
   3.1) What database management programs do people use for linkage data?
   [1995/05/31] 
   
   3.2) What programs are available for pedigree drawing? [1995/04/01] 
   
   3.3) What linkage analysis helper programs are available? [1995/04/01]
   
   
   3.4) Why are some programs used primarily for chromosome mapping,
   while others are used for disease mapping? [1995/03/01] 
   
   3.5) What programs are used for physical mapping? [1995/11/30] 
   
   3.6) What programs are used for disease gene mapping? [1995/09/07] 
   
   3.7) What programs are available for running genetic simulations?
   [1995/11/30] 
   
   3.8) What programs are available to help detect errors in linkage
   data? [1995/11/30] 
   
   3.9) What programs help me recode genetic markers? [1995/03/01] 
   
   4.0) LINKAGE PACKAGE SPECIFIC INFORMATION 
   
   4.1) How do I get my CEPH data into CRI-MAP format? [1995/03/01] 
   
   4.2) How do you calculate MAXHAP? [1995/09/09] 
   
   4.3) When should you use binary coding instead of numeric allele
   coding? [1995/03/01] 
   
   4.4) What do you do when allele frequencies do not add up to 1; for
   example, when alleles are not present in a pedigree under study?
   [1995/03/01]
   
   4.5) I use LINKAGE and/or FASTLINK. Which references should I include
   in my papers? [1995/03/01] 
   
   4.6) What is recoding of alleles all about anyway? [1995/03/01] 
   
   4.7) What do you do when you get thetas greater than 0.5 when using
   linkage? [1996/22/96]
   
   5.0) COMPUTER ADMINISTRATION AND OPTIMIZATION 
   
   5.1) How w can I increase the speed of the LINKAGE/FASTLINK package on
   my workstation? [1995/05/18] 
   
   6.0) MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS
   
   6.1) What screening sets are available for linkage analysis?
   [1995/09/14]
   
   1.0) FAQ ADMINISTRATIVE INFORMATION
   
   1.1) Where can I obtain the bionet.gene-linkage FAQ? [1995/03/01]
   
   It is available by anonymous FTP from lenti.med.umn.edu in
   /pub/linkage. The best way to view the FAQ is via the WWW, from
   http://lenti.med.umn.edu/linkage/linkage.html. The FAQ is also
   available via gopher at lenti.med.umn.edu in /Biologically Related
   Information/Linkage Analysis. The FAQ will also be posted in the
   USENET groups bionet.molbio.gene-linkage and news.answers the 1st and
   15th of each month.
   
   1.2) Who created the bionet.molbio.gene-linkage FAQ? [1995/03/01]
   
   Darrell Root (rootd at ohsu.edu) originally started the
   bionet.molbio.gene-linkage FAQ in May of 1994 in an attempt to share
   information and experiences that may be of use to other people
   involved in linkage analysis. I am Dean Flanders
   (dean at lenti.med.umn.edu), the current maintainer of the FAQ, and began
   my tenure in December of 1994. The FAQ will never serve as a short
   course in linkage analysis, but instead it will ideally be a place to
   help beginners get started in the area and to help experts not make
   the same mistakes as others. All of the information in this FAQ by no
   means comes completely from Darrell or me, but from a large number of
   people that work in the area of linkage analysis. Their names are
   listed at the end of this section of the FAQ.
   
   1.3) How can I help improve this FAQ? [1995/03/01]
   
   Feel free to send any information that you think would be beneficial
   for other people who are just beginning in linkage or have been doing
   linkage for years to linkage at lenti.med.umn.edu. Also, if there is
   information you would like to see or errors in this FAQ please let us
   know by sending email to linkage at lenti.med.umn.edu. If you would like
   to see something changed or added to the FAQ please to send it in a
   format that can be quickly incorporated into the FAQ, such as
   correcting the errors in the section of the FAQ and emailing it back
   to the FAQ maintainer.
   
   1.4) Contributors to this FAQ. [1995/09/09]
   
   David Adler, John Attwood, Michael Boehnke, Marcia Brott, Don Bowden,
   Michael Braverman, Lucien Bachner, Young B Choi, Kevin Crawford, Dave
   Curtis, Peter Doris, Bennett Dyke, David Featherstone, Dean Flanders,
   Jonathan Haines, Rob Harper, Pierre Janssens, David Kikuchi, Wentian
   Li, Tim Little, Tara Matise, Eli Meir, Mike Miller, Jurg Ott, Darrell
   Root, Alex Schaffer, Robert Stodola, Frank Visser, Dan Weeks, Ellen
   Wijsman, Scott Wildenberg, Matthias Wjst, and Kim Worley.
   
   1.5) When was the FAQ last updated? [1996/22/96]
   
   The last update of the FAQ was on 1996/22/96. All sections should
   indicate what month and year they were last updated. In addition one
   can go to the list of updates that are maintained at
   http://lenti.med.umn.edu/linkage/gefaqup.html. This is a list in
   chronological order of updates with direct links to the updates in the
   FAQ.
   
   2.0) INFORMATION RESOURCES
   
   2.1) What anonymous-FTP sites have programs/utilities useful for
   linkage analysis? [1995/03/01]
   
   At present there is no one site that serves as a repository for all
   linkage software. So the best way of finding FTP site information is
   to read the software package information below, which should provide
   all of the necessary FTP information.
   
   2.2) What books are helpful when learning about linkage analysis?
   [1995/03/01]
   
   Bishop, M. J. "Guide to Human Genome Computing." Academic Press, 1994.
   
   
   Davies, K. E. "Human Genetic Diseases - A Practical Approach." IRL
   Press, Oxford England and Washington, D.C., 1986.
   
   Dracopoli, N. C., Haines, J. L., Korf, B. R., Moir, D.T., Morton, C.
   C., Seidman, C. E., Seidman, J. G., Smith, D. R. "Current Protocols in
   Human Genetics." John Wiley and Sons, Inc., USA, 1994.
   
   Khoury, M. J., Beaty, T. H., and Cohen, B. H. "Fundamentals of Genetic
   Epidemiology." Oxford University Press, 1993.
   
   Ott, J. "Analysis of Human Genetic Linkage." Johns Hopkins University
   Press, 1991.
   
   Terwilliger, J. D. and Ott, J. "Handbook of Human Genetic Linkage,"
   Johns Hopkins University Press, 1994.
   
   Thompson, E. A. "Pedigree Analysis in Human Genetics." Johns Hopkins
   University Press, Baltimore and London, 1986.
   
   2.3) What WWW sites have useful linkage information? [1996/01/02]
   
   This is in no way an attempt to list the explosion of WWW sites of
   biological interest on the Internet, but it is a listing of some of
   the major ones and ones of particular interest in linkage analysis.
   
   http://www.yahoo.com/Science/Biology/Genetics/, this is a list of
   sites related to genetics that is kept very up to date.
   
   http://www.gdb.org/Dan/DOE/intro.html, this is a short course of sorts
   that gives some very basic information on how to go about gene
   mapping.
   
   http://lenti.med.umn.edu/linkage/linkage.html, which is serving as
   linkage analysis home page, will have links to all of the WWW sites
   listed as well as gopher servers and a hypertext version of the FAQ.
   
   http://www.genethon.fr, the Genethon Center, Genethon's home page.
   
   http://www.chlc.org, the Cooperative Human Linkage Center, CHLC's home
   page.
   
   http://gdbwww.gdb.org has a version of GDB available and access to
   OMIM.
   
   http://www.pathology.washington.edu has human and mouse standard
   idiograms. The idiograms are useful for making illustrations for gene
   mapping and for constructing abnormal chromosomes. The PostScript
   idiograms can be manipulated band by band with illustration software
   such as Adobe Illustrator, Aldus FreeHand, Canvas, and Altsys
   Virtuoso.
   
   http://www.gene.ucl.ac.uk/~john/programs.html contains software by
   John Attwood.
   
   http://www.gene.ucl.ac.uk/packages/dcurtis/ contains software by Dave
   Curtis.
   
   http://linkage.cpmc.columbia.edu has a lot of useful information on
   linkage analysis; in particular it offers information on software, the
   course offered by J. Ott, and the Linkage Newsletter.
   
   2.4) What gopher sites have useful linkage information? [1995/03/01]
   
   There is one that will be maintained with links to other gophers of
   interest in linkage analysis, as well as links to other gopher servers
   of biologically related information. It is at lenti.med.umn.edu, and
   the path to it is Biologically Related Information/Genetic Linkage
   Analysis.
   
   2.5) What "linkage centers" make information and assistance available
   to researchers? [1995/11/11]
   
   One such center is the Cooperative Human Linkage Center (CHLC). The
   goal of this center is to generate a high resolution map of the human
   genome and rapidly distribute this information to the genome
   community. They are in the process of identifying more human markers
   and developing high resolution framework maps. One can obtain
   information about CHLC from via gopher from gopher.chlc.org ,
   http://www.chlc.org , ftp://ftp.chlc.org , info-server at chlc.org, or
   help at chclc.org. Among other things, CHLC provides primer selection and
   linkage analysis via email. Information on those services can be found
   by sending email to: primer- server at chlc.org and
   linkage-server at chlc.org.
   
   David Featherston (davidf at caos.kun.nl) from the Dutch EMBnet Node is
   starting a linkage analysis service: software availability,
   support/advice initially, possibly training, and perhaps consultancy.
   At present they have MapMaker/EXP 3.0b, MapMaker/QTL 1.1, Lathrop and
   Lalouel's LINKAGE package, and Schaffer's FASTLINK package. This means
   that if users have Genomics Package accounts at the CAOS/CAMM Center,
   they can use these programs on their fast computers to analyze their
   data sets. Please contact David Featherston if you are interested in
   more information about such an account.
   
   A major European center is the Human Genome Mapping Project Resource
   Centre in Hinxton, England. It is funded by the Medical Research
   Council, and has a broad range of software and databases available,
   mainly focused on the Human Genome Project. In the area of Linkage
   analysis it has the following programs available: FASTLINK, CRIMAP,
   MAP MAPMAKER, HOMOZ, PEDPACK, APM, SIMLINK, FASTMAP, COMDS, DOLINK &
   QDB, HANDLINK, GAS and Jurg Ott's collection of programs. The aim is
   to have all major (Unix-based) gene linkage packages available for our
   users. The Center also gives courses on linkage analysis. More
   information about the Centre can be obtained from it's home- page:
   http://www.hgmp.mrc.ac.uk/. If you want to register as user, send
   e-mail to admin at hgmp.mrc.ac.uk for a registration form. For more
   information about the gene-linkage services you can contact Frank
   Visser (fvisser at hgmp.mrc.ac.uk).
   
   INFOBIOGEN: This is the French GDB node that offers also a linkage
   server and assistance in the process of linkage analysis. It uses
   LINKAGE, FASTLINK and other programs running on a Sparc Center 2000E
   with 1 giga RAM, 4 Gig of swap, and 6 CPU's. For furhter information
   contact Lucien Bachner at bachner at infobiogen.fr or look at the
   following web site http://www.infobiogen.fr/.
   
   2.6) What journals are useful for linkage analysis? [1995/06/02]
   
   American Journal of Human Genetics, Annals of Human Genetics, Computer
   Applications in Biosciences (CABIOS), Genomics, Genetic Epidemiology,
   Human Genome News (available by gopher from gopher.gdb.org), Human
   Genome Project Journal, Human Heredity, Journal of Computational
   Biology, Nature Genetics.
   
   2.7) What courses are offered on linkage analysis? [1995/09/09]
   
   There are three primary courses offered throughout the yeart on human
   linkage analysis. One is a four day course offered once per year by
   Drs. Margaret Pericak-Vance and Jonathan Haines. The next course will
   be offered in late April, 1996 in Boston. The focus of the course is
   on the overall design of a human disease gene mapping study, with
   particular emphasis on the problems of common/complex disorders. The
   course covers clinical classification, pedigree ascertainment,
   collection, and follow-up, basic linkage techniques, linkaghe and
   association analysis for complex disorders, laboratroy technqiues for
   genotyping, and gene characterization. The courseemphasizes the global
   decision-making process, rather than details of specific techniques.
   For more information write to Genetic Methods Course; c/o Dr. Margaret
   Pericak- Vance; Division of Neurology, Box 2900; Duke University
   Medical Center; Durham, NC 27710, or you can send e-mail to
   genclass at genemap.mc.duke.edu. The remaining two courses are both
   offered by Jurg Ott on the software used for human linkage. One is a
   beginner's course, and the other an advanced course for those familiar
   with the linkage analysis software. These courses are offered several
   times throughout the year and you can get more information by
   contacting Katherine Montague/Jurg Ott; Columbia University, Unit 58;
   722 West 168th Street; New York, NY 10032. In addition you can fax to
   (212)568- 2750 or call (212)960 2507 or email km165 at columbia.edu for
   more information.
   
   A new beginner's level linkage course will be offered in French
   October 24-25 1995 by INFOBIOGEN, in Villejuif south suburb of Paris.
   It's free for all academic institutions. For furhter information
   contact Lucien Bachner at bachner at infobiogen.fr or
   linkage at infobiogen.fr.
   
   3.0) GENE-LINKAGE SOFTWARE OVERVIEW
   
   3.1) What database management programs do people use for linkage data?
   [1995/05/31]
   
   One must be aware that some pedigree drawing software can also serve
   as databases for data as well as drawing pedigrees, see the next
   question in the FAQ for a description of those packages.
   
   CEPH DBMS: The CEPH DataBase Management System is specifically
   designed for chromosome mapping with CEPH style pedigrees. It can
   output data in ped.out format for the LINKAGE package. This program
   can now be picked up via anonymous FTP from ftp.cephb.fr in
   pub/ceph_genotype_db.
   
   DOLINK: This DOS custom database program by D. Curtis manages genetic
   data and sets up input files for linkage analysis. It is available
   from ftp.gene.ucl.ac.uk. The DOS and Windows versions of DOLINK
   program help manage genetic data and setup analysis. It is available
   with the C++ source allowing compilation on Unix host running X and
   possibly a Macintosh.
   
   File Express: This is a DOS shareware database which can be used to
   hold data for DOLINK (largely superseded by QDB). It is available as
   fe51-a/b/c.zip via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
   LABMAN and LINKMAN: These are linkage analysis databases for holding
   linkage data and exporting it in various formats for linkage analysis.
   They are available via anonymous FTP from lenti.med.umn.edu in
   /pub/linkage/labman. These databases were developed by P. Adams of
   Columbia University.
   
   LYNKSYS: This custom-made database program was written by J. Attwood
   and S. Bryant. Although they continue to use it, J. Attwood suggests
   using DOLINK instead. LINKSYS is not currently available at any FTP
   sites.
   
   Map Manager: It is a program for the Macintosh which helps analyze the
   results of genetic mapping experiments using backcrosses,
   intercrosses, or recombinant inbred strains. In addition it also has
   tools for statistical analysis of experiments. The program was created
   by K. F. Manly at the Roswell Cancer Institute and is available via
   FTP from mcbio.med.buffalo.edu in /pub/MapMgr.
   
   QDB: This is a database program available as DOS and Windows versions
   and with C++ source allowing compilation for X and possibly Macintosh.
   It is available as qdb16a.zip via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
   3.2) What programs are available for pedigree drawing? [1995/04/01]
   
   One of the tricks of managing individuals in a mapping study is trying
   to get the database you are using to export your family data in a
   format acceptable for input into pedigree drawing programs. The
   marriage between these two can be of great assistance. However, some
   pedigree drawing programs have databases as a part of the package.
   
   CYRILLIC: This is a pedigree editor for Windows with facilities for
   including marker data which you can then have it output the input
   files for LINKAGE. It is Windows-based, so input of the pedigree is
   very efficient. You also have a data form associated with each
   individual where you can store names and other pertinent data. It also
   has the ability to interface with most standard PC databases. This
   program is not public domain and is available from Cherwell Scientific
   Publishing. If you would like more information send email to
   csp at sable.ox.ac.uk and they would be very happy to send you a demo of
   the program. Version 2 of Cyrillic should be coming out late summer of
   1995.
   
   FTREE: This is a DOS pedigree program written by R. Go at the
   University of Alabama.
   
   GENETREE: GeneTree 1.0 is a DOS package which provides a convenient
   way to draw family tree diagrams suitable for genetics or genealogy.
   The package consists of the GeneTree program, which draws pedigree
   diagrams using a command language; and SC, using a menu driven program
   that facilitates creation of GeneTree commands. GeneTree and SC are
   made available with program manuals, examples of family tree diagrams,
   and a GeneTree Quick Reference Guide. GeneTree is written in C. Note
   that it is a DRAWING program and does not compute genetic parameters.
   The GeneTree program is available from wijsman at max.u.washington.edu at
   a price of $125 (because of licensing fees from a private company
   which wrote one of the drivers used in the program).
   
   KINDRED: This new DOS database program, distributed by Epicenter
   Software, is specifically designed for linkage analysis. A free demo
   is available by calling (818)-304-9487. In addition to database
   duties, this program will draw pedigrees, haplotype marker data, and
   can output data in LINKAGE format.
   
   PEDPAK: This package is designed to handle large datasets for animals.
   The package was written and distributed by Alan Thomas, who is in
   Bath, England. The software is not public domain and must be
   purchased.
   
   Pedigree/Draw: It is a Macintosh based program, written by B. Dyke, P.
   Mamelka, and J. MacCluer. It is available from bdyke at darwin.sfbr.org
   or Pedigree/Draw; Department of Genetics; Southwest Foundation for
   Biomedical Research; PO. Box 28147; San Antonio, TX 78228-0147. An
   upgrade from a previous version is $10, the current version is 4.4.
   Documentation costs $10 printed and the full package including
   documentation costs $45. There is a script which converts linkage
   format to Pedigree/Draw available via anonymous FTP at ftp.ee.pdx.edu
   in /pub/users/cat/rootd/convert.new.
   
   PEDRAW: This program is a pedigree drawing program written by D.
   Curtis for DOS and available via FTP from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis. The most current version is called
   pedraw16.zip. A companion program to PEDRAW is PEDHELP, it is a pop-up
   help for PEDRAW.
   
   PAP: The Pedigree Analysis Package (PAP) is a set of FORTRAN 77
   programs for computing likelihoods and simulating phenotypes of
   genetic models on pedigrees. It is available via gopher from
   corona.med.utah.edu in Publicly Accessible Software, probes(sts),
   etc./software/pap.
   
   3.3) What linkage analysis helper programs are available? [1995/04/01]
   
   
   CEPH2CRI: This program converts to output from the CEPH DBMS into the
   format useable in CRI- MAP. It can be found at ftp.gene.ucl.ac.uk in
   /pub/packages/linkage_utils.
   
   EASISTAT: This is a DOS statistics package, it contains EASIGRAF which
   draws graphs of lod scores from the output of FASTMAP. The lod scores
   first need to be run through the TABLE utility, which is included in
   the DOLINK and FASTMAP packages. It is available as estat21.zip via
   anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   FIRSTORD: A demonstration of a method for preliminary ordering of loci
   based on two-point lod scores. It is available as DOS executable and C
   source called first11.zip from ftp.gene.ucl.ac.uk in
   /pub/packages/dcurtis.
   
   LINKMED: A program for converting LINKAGE-format files to
   MENDEL-format files. It is available by anonymous FTP from
   watson.hgen.pitt.edu as linkmend.tar.Z.
   
   MAP: A program to convert LINKMAP output into a table of multipoint
   lod scores. It is available by anonymous FTP from watson.hgen.pitt.edu
   as map.tar.Z.
   
   PEDREP: A program for converting a MENDEL-format pedigree file
   ('pedm.dat') to a Pedigree/Draw file for graphical display on a
   Macintosh. It is available by anonymous FTP from watson.hgen.pitt.edu
   as pedprep.tar.Z.
   
   RECODE: A program for recoding character or sized-allele data into
   numbered-allele data. It is available by anonymous FTP from
   watson.hgen.pitt.edu as recode.tar.Z.
   
   3.4) Why are some programs used primarily for human chromosome
   mapping, while others are used for human disease mapping? [1995/03/01]
   
   
   Any family can be used for chromosome mapping, so CEPH has picked a
   particular family "shape" and generated a large database with these
   families. Programs designed for chromosome mapping can be optimized
   for using these families, reducing the time needed for calculations.
   Only families afflicted with a disease can be used for disease gene
   mapping. As a result, programs designed for disease gene mapping need
   to be able to deal with arbitrary pedigrees. In addition, these
   programs need to be able to handle incomplete penetrance.
   
   3.5) What programs are used for physical mapping? [1995/11/30]
   
   CLINKAGE: This is the special version of the LINKAGE programs for
   3-generation CEPH pedigrees and codominant markers. The PC and VAX
   versions are available by FTP from linkage.cpmc.columbia.edu. The Unix
   version is available from corona.med.utah.edu.
   
   CHROMLOOK: This is a program for generating haplotypes of marker data
   in nuclear pedigrees with all individuals genotyped. It identified
   both the maternal and paternal recombination events, and provides the
   resulting haplotypes and recombinants in an easy-to-read format. It
   should be available via FTP server sometime this summer. It was
   written by Jonathan Haines and he can be contacted at
   haines at helix.mgh.harvard.edu.
   
   CINTMAX: This program is an extensively modified version of CILINK. It
   uses map functions to model the transmission of gametes from parent to
   child. Some of these map functions are multilocus feasible, and so can
   be used with more than 3 loci at a time. It is available by anonymous
   FTP from watson.hgen.pitt.edu as cintmax.tar.Z.
   
   CRI-MAP: This program has been used for chromosome mapping for years.
   It has options which can generate maps, calculate order probabilities,
   and printout recombination data. It works on .gen files with data from
   CEPH style families. It is written in K& R type C code, and the author
   Phil Green has successfully ran it on Unix, DOS, VMS, and Macintosh
   systems. It is not available via anonymous FTP. Phil Green distributes
   CRI-MAP freely ONLY to academics/academic institutions. Contact him
   at: Phil Green; Molecular Biotechnology Dept., FJ-20; Fluke Hall on
   Mason Rd.; Univ. of Washington; Seattle, WA 98195; USA; Phone (206)
   685-4341; Fax (206) 685-7344; or email phg at u.washington.edu.
   
   FASTMAP: This program produces quick approximation to multipoint lod
   score, available as a DOS executable and C source as fstmap11.zip from
   ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   MULTIMAP: This LISP based expert system uses an customized version of
   CRI-MAP to create a chromosome map. It is available via anonymous FTP
   from chimera.gene.cwru.edu. The authors T. Matise, M. Perlin, and A.
   Chakravarti continue to improve the code, add new functions, and
   provide excellent support. When used with the CRI-MAP chrompic option
   (to find double-recombinations to identify possible errors), it is
   incredibly useful. This is Unix-only (supported for DEC-Ultrix,
   HP9000, and Suns). The customized CRI-MAP version (called LISPCRI) is
   distributed at the FTP site, but was not meant to be used
   independently of MULTIMAP.
   
   MAPMAKER: Dr. Eric Lander; Whitehead Institute; 9 Cambridge Center;
   Cambridge, MA 02142; mapm%mitwibr at mitvma.mit.edu. MAPMAKER is
   available via FTP at genome.wi.mit.edu in /pub/mapmaker3.
   
   RHMAP: It is a set of three FORTRAN 77 programs that provide the means
   for a complete statistical analysis of RH mapping data. RH2PT is a
   program for data description and two-point analysis. It provides
   estimates of locus-specific retention probabilities and pairwise
   breakage probabilities, two-point lod scores for linkage of the
   various marker pairs, and linkage groups. RHMAP is now also available
   at the following URL http://www.sph.umich.edu/group/statgen/software.
   If you would like email notification of updates please send email to
   boehnke at umich.edu.
   
   3.6) What programs are used for disease gene mapping? [1995/09/07]
   
   APM: The Affected Pedigree Member Method distribution contains the new
   APM programs, a new file conversion utility, and a
   histogram/statistics generator. To build the entire distribution, you
   need C, Pascal, and FORTRAN compilers, and a make utility is also
   helpful. The programs which are built include: APM, a program to
   calculate the single locus statistic over one or several marker loci;
   SIM, a program to simulate pedigrees and, using output files of APM,
   test for asymptotic normality of the null distribution; APMMULT, a
   program to generate the multilocus statistic; SIMMULT, a program like
   SIM but which simulates recombination and uses the output of APMMULT;
   CHAPM, a program to convert LINKAGE files to APM files, or APM files
   of one format to APM files of another format; and HIST, a program to
   compute various statistical figures, plot a histogram, and compute
   empirical p-values. The APMember package by D. Weeks is available via
   anonymous FTP from watson.hgen.pitt.edu. Additionally, there are
   pre-compiled executables of the APM programs for Sun-OS and
   Sun-Solaris available as newapm.sunos.tar.Z newapm.solaris.tar.Z.
   
   CLUMP: A Monte Carlo method for assessing significance of a
   case-control association study with a multi-allelic marker, available
   as DOS executable and C source. It is available as clump.zip via
   anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   ESPA: This is a program used for extended sib pair analysis. It comes
   in a DOS version and can only look at markers containing 5 alleles. It
   was written by Lodeijk Sandkuijl and can be obtained by writing to him
   at Voorstraat 27; Delft 2611 JK; THE NETHERLANDS.
   
   ERPA: A program for carrying out nonparametric linkage analysis,
   available as DOS executable and C source. It is called erpa12.zip via
   anonymous FTP at ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
   
   FASTLINK: This is a much faster implementation of the main programs in
   LINKAGE (LODSCORE, ILINK, MLINK, LINKMAP) in C. The code is faster due
   to the use of new and better algorithms for the time intensive parts
   of the computation. FASTLINK is distributed by A. A. Schaffer from the
   FTP site softlib.cs.rice.edu (cd pub/fastlink). Version 1 of FASTLINK
   was instigated by R. W. Cottingham Jr. with implementation done by R.
   M. Idury and A. A. Schaffer. Version 2 of FASTLINK includes further
   improvements implemented by A. A. Schaffer, S. K. Gupta, and K.
   Shriram, with guidance from R. W. Cottingham Jr. Version 2 includes
   the capability to recover gracefully from a crash of the computer on
   which FASTLINK is running. FASTLINK was initially intended for UNIX
   machines, but the distribution now includes instructions for porting
   to VMS as well as a version for DOS. FASTLINK allows you to compile in
   "fast" or "slow" mode (the slow version of FASTLINK is still much
   faster than the old LINKAGE programs). The "fast" version uses lots of
   memory, but uses the extra memory to contain some of the intermediate
   results which are repetitively recalculated in the "slow" version (and
   the old linkage package). Best speed can be obtained by setting up 300
   megs of virtual memory on a Unix workstation and using the "fast"
   version. Schaffer maintains a mailing list of fastlink users
   (fastlink-list at cs.rice.edu) to answer queries and keep users up to
   date. Schaffer, Gupta, and other colleagues at Rice University have
   implemented parallel versions of FASTLINK for either a shared-memory
   multiprocessor or a network of UNIX workstations. This version is now
   available as FASTLINK 2.3P at the above mentioned FTP site. Write to
   schaffer at cs.rice.edu for more information.
   
   GAS: It provides facilities for reading, writing, sectioning and
   performing statistical analyses on phenotypic and genotypic data and
   one of its features is sib pair analysis. It has been developed within
   the Department of Medicine at Oxford University and is available via
   FTP from well.ox.ac.uk in the directory pub/genetics/gas.
   
   GREGOR: It is a piece of DOS based software for producing simulated
   genetic data. It does not perform linkage analysis, but it may be
   useful for testing methods or assumptions about linkage analysis.
   GREGOR is operated by a series of hierarchical menus that permit the
   user to define hypothetical genetic scenarios (gene positions and
   effects) and produce simulated data-sets for a variety of population
   structures. GREGOR is available by FTP from the site
   sifon.cc.mcgill.ca in pub/McGill-Contrib. Questions should be directed
   to the authors tinker at agradm.lan.mcgill.ca or
   mather at agradm.lan.mcgill.ca.
   
   LINKAGE: This package of programs was developed by M. Lathrop with
   help from J. M. Lalouel, C. Jlier, and J. Ott. The LINKAGE package
   consists of several analysis and several utility programs. Versions
   are available for DOS, OS2, VAX, and Unix platforms. Here are some of
   the analysis programs: MLINK: 2-point lod-score calculations at fixed
   recombination distances; LINKMAP: multipoint lod score calculations at
   fixed distances; ILINK: calculates the recombination distance with the
   highest lod-score. Unix versions are available via gopher from
   corona.med.utah.edu in Publicly Accessible Software, probes(sts),
   etc./software/linkage, DOS and VMS versions are available from
   linkage.cpmc.columbia.edu, or on floppy disks, when you write to:
   Katherine Montague/Jurg Ott; Columbia University, Unit 58; 722 West
   168th Street; New York, NY 10032. Send pre-formatted DOS disks if you
   request linkage by mail. You can send email to km165 at columbia.edu if
   you need more information regarding mail requests for the LINKAGE
   package.
   
   LIPED: This DOS program written by J. Ott calculates probabilities for
   linkage between disease markers and genetic markers. Its input file
   differentiates between phenotypes and genotypes. As a result, this
   program is easiest to use when your data is from "old-style"
   genetic-markers (such as blood phenotype data). This was one of the
   first programs to do linkage analysis calculations, the LINKAGE
   package is more commonly used now.
   
   SAGE: Statistical Analysis Package for Genetic Epidemiology is
   composed of 18 programs: AGEON: Estimating the Distribution of
   Age-of-Onset, ASSOC: marker-trait Associations in Pedigree Data,
   BCROSS: Genetic Hypothesis for Quantitative Data on Inbred strains,
   their F1 and Backcross(es), CLUSTR: Power Transformation to Obtain
   Normality and Homoscedasticity from Clustered Data, FCOR: Family
   Correlations, FSP: Family Structure Program, LODLINK: Lod Score
   Linkage Analysis, MAPLOC: Mapping a Disease Related Trait Relative to
   a Set of Linked markers, MAXFUN: Function maximization Subroutine,
   REGC,REGD,REGTL,REGTN: Segregation Analysis Programs, RELATE:
   Relationship to Proband, SIBPAL: Sib-Pair Linkage Analysis, and
   DBSORT, RENUM, SPLIT: Toolkit Programs. Author Dr. R.C. Elston,
   address Department of Biometry and Genetics; Louisiana State
   University Medical Center; 1901 Perdido Street; New Orleans, Louisiana
   70112, USA. The email contact address is sage at haldne.biogen.lsumc.edu.
   It is available for the following operating systems: VAX, SunOS 4.1.x,
   Apple Macintosh II, and DOS. This program is not shareware and must be
   bought.
   
   X-LINKED APM: X-linked version of the APM programs (single-marker),
   see APM above for more information on APM. It is available by
   anonymous FTP from watson.hgen.pitt.edu as xlinkapm.tar.Z. Also,
   xlinkapm.readm is available there, which is a readme about the
   X-linked version of the APM programs.
   
   3.7) What programs are available for running linkage simulations?
   [1995/11/30]
   
   FASTSLINK: This is program is just like SLINK (see SLINK below), but
   it utilizes the enhancements incorporated into FASTLINK. It is
   available via anonymous FTP from watson.hgen.pitt.edu.
   
   SIMAPM: Is the SLINK based simulation program for the APM package.
   This represents a hacked together package which only runs under a Unix
   system. You will need FORTRAN, Pascal, and C compilers to use this
   package. It is available via anonymous FTP from watson.hgen.pitt.edu
   
   SIMLINK: This FORTRAN program developed by L. Ploughman and M. Boehnke
   simulates linkage analysis on a family, and gives you an estimate the
   probability, or power, of detecting linkage in a given family. It
   allows the researcher to determine whether a family has sufficient
   informativeness to detect linkage. SIMLINK requires large quantities
   of memory. It was written for DOS, but has been ported to many
   platforms. It is available from: Michael Boehnke; Department of
   Biostatistics; School of Public Health; University of Michigan; Ann
   Arbor, MI 48109-2029. No postage-money or blank disks are necessary to
   get SIMLINK sent to you. SIMLINK may be available via anonymous FTP
   soon. For further information send email to boehnke at umich.edu. SIMLINK
   is now also available at the following URL
   http://www.sph.umich.edu/group/statgen/software. If you would like
   email notification of updates please send email to boehnke at umich.edu.
   
   SLINK: It is a Pascal program developed by D. Weeks, M. Lathrop, and
   J. Ott. It is similar to SIMLINK. It is more general than SIMLINK in
   that it allows for partial marker typing at the locus to be generated,
   but it runs slower than SIMLINK. It is available from
   linkage.cpmc.columbia.edu and watson.hgen.pitt.edu or on floppies (use
   the same address as for LINKAGE).
   
   3.8) What programs are available to help detect errors in linkage
   data? [1995/11/30]
   
   Typically the linkage packages in and of themselves will detect errors
   in linkage data that are obvious, such as impossible phenotypes and
   genotypes, and obvious errors in pedigrees. Typically the programs
   will just grind to halt and allow you to fix the error, and try again
   until you finally succeed. However, errors that "make sense" to
   linkage programs will not be detected.
   
   GENO: It is a genotype entry/edit tool that will allow you to easily
   enter and manipulate genotyping data. You can also check the quality
   of your data with the built-in Mendelian inheritance checker. The
   author the of program is Matt Stephenson and can be reaced at
   stephenm at bioimage.mfldclin.edu. The program is available via FTP from
   dgabby.mfldclin.edu in /pub/geno.
   
   GENOCHECK: It is an error checking program designed to identify
   individuals and loci that are likely to contain errors. the
   statistical method was designed to identify typing error, but is
   general enough to pinpoint any unlikely genotype still consistent with
   Mendelian inheritance. The author is Dr. Margaret Gelder Ehm the ftp
   site is at softlib.cs.rice.edu and it is in /pub/GenoCheck. It is
   written for Unix.
   
   3.9) What programs help me recode genetic markers? [1995/03/01]
   
   DOLINK can downcode alleles automatically. However, the main use of
   DOLINK is to prepare files for LINKAGE from a database. In addition P.
   Adams package LABMAN and LINKMAN have features for the recoding of
   alleles.
   
   4.0) LINKAGE PACKAGE SPECIFIC INFORMATION
   
   4.1) How do I get my CEPH data into CRI-MAP format? [1995/03/01]
   
   You can output the file in linkage format and use link2gen in CRI-MAP.
   The disadvantage here is that your marker names are separated from
   your data and it's easy to make a mistake and get them mixed up. You
   can output the file in ped.out format and use CEPH2CRI mentioned above
   in the FAQ to do the conversion as well.
   
   4.2) How do you calculate MAXHAP? [1995/09/09]
   
   MAXHAP is the maximum possible number of haplotypes in your analysis.
   You multiply together the number of alleles at each locus used in a
   particular run; not all loci in your dataset, just the loci you are
   using in that particular calculation. Remember that the affection
   status counts as two alleles, regardless of the number of liability
   classes. For example, if a dataset has the following information: the
   liability classes, marker A has 3 alleles, marker B has 4 alleles, and
   marker C has 5 alleles and your run includes a LINKMAP run between
   affection status, marker A, and marker B, then your MAXHAP must be at
   least 2*3*4=24.
   
   FASTLINK 2.3P includes an auxiliary program called ofm (optimize for
   maxhap) which can be used to automatically recompile the desired
   program with the ideal value of maxhap under the following
   assumptions: using UNIX or VMS (not DOS), running ILINK or LINKMAP or
   MLINK (not LODSCORE), the main script is produced by the LINKAGE
   auxiliary program LCP), and the locus file is produced by the LINKAGE
   auxiliary program PREPLINK; see README.ofm in the FASTLINK
   distribution.
   
   4.3) When should you use binary coding instead of numeric allele
   coding? [1995/03/01]
   
   Usually there is no advantage to coding disease loci as either binary
   or numeric using liability classes. Generally, binary coding is more
   complex in that we humans often have a hard time thinking that way.
   Some of the codominant phenotypes lend themselves to binary coding;
   for example, ABO blood types: A (101), B (011), O (001), AB (111), and
   unknown (000). Since you cannot distinguish AO from AA at the
   phenotype level you code both genotypes as (101), presence of A and O.
   In reality O represents absence of both A and B. However, do not code
   using (000), since it would be an unknown. Use of binary codes has
   decreased since DNA markers have come into use since they allow one to
   type an individual with respect to genotype. You can use binary codes
   if you have phenotypic data which does not allow for the
   discrimination of the underlying genotype exactly, and one can code it
   as the presence with 1 or absence with 0 of factors such as the A and
   B antigens. Binary codes allow the representing loci with codominant
   and dominant mode of inheritance, while allele number notation is good
   only for codominant loci. Few people use binary factor notation. They
   either use allele numbers for codominant loci, or affection status
   notation for dominant loci. The main reason why binary factor notation
   is still currently used is that CEPH's database is in that notation.
   
   4.4) What do you do when allele frequencies not add up to 1, for
   example, when alleles are not present in a pedigree under study?
   [1995/03/01]
   
   The best approach is to specify n+1 alleles, where there are n alleles
   actually observed in the pedigree. Use the correct allele frequencies
   for the n alleles, and for the n+1 allele, use 1 minus the sum of the
   frequencies of the observed alleles.
   
   4.5) I use LINKAGE and/or FASTLINK, what references should I cite in
   my papers? [1995/03/01]
   
   FASTLINK users should cite:
   
   Cottingham, R. W. Jr., Idury, R. M., and Schaffer, A. A. "Faster
   Sequential Linkage Computations." American Journal of Human Genetics.
   53:252-263, 1993.
   
   Schaffer, A. A. , Gupta, S. K., Shriram, K., and Cottingham, R. W. Jr.
   "Avoiding Recomputation in Linkage Analysis". Human Heredity.
   44(4):225-37, 1994 Jul-Aug.
   
   In addition, all FASTLINK and LINKAGE users should also cite the
   LINKAGE papers:
   
   Lathrop, G.M., Lalouel, J.M., Julier, C. , and Ott, J. "Strategies for
   Multilocus Analysis in Humans." PNAS. 81:3443-3446, 1984.
   
   Lathrop, G.M. and Lalouel, J.M., "Easy Calculations of LOD Scores and
   Genetic Risks on Small Computers." American Journal of Human Genetics.
   36:460-465, 1984.
   
   Lathrop, G.M., Lalouel, J.M., and R. L. White. "Construction of Human
   Linkage Maps: Likelihood Calculations for Multilocus Analysis."
   Genetic Epidemiology. 3:39-52, 1986.
   
   4.6) What is recoding of alleles all about anyway? [1995/03/01]
   
   One of the problems with highly polymorphic markers is that they can
   increase the computational requirements of the computers by several
   orders of magnitude due to the large number of alleles present. This
   can put the computation of some lod scores out of reach for DOS
   computers and take many days on higher end systems. So it is important
   to use methods that reduce the number of alleles, and recoding will
   reduce the number of alleles in your calculations.
   
   The method of recoding of alleles described by J. Ott in the Annals of
   Human Genetics, 42:255-257 (1978) works very well, but can only be
   done when the mode of inheritance of the disease is known. An article
   inspired by Ott's original work written M. Braverman in Computers and
   Biomedical Research, 18:24-36 (1985) extends the recoding of alleles
   in two ways: 1) it allows for pedigrees of arbitrary structure, and 2)
   it allows for missing/partially known marker phenotypes. It is usually
   possible to recode marker alleles to some extent even if the mode of
   inheritance of the disease is not known since what is still desired
   with respect to the marker is a labeling which preserves the available
   information about the source of each marker allele. It is important,
   however, where the full ancestry of alleles cannot be traced in a
   pedigree, that the recoded alleles maintain the allele frequencies
   appropriate to the original alleles. In a complex disorder, this may
   not be possible.
   
   Another method is if the marker in question has 14 alleles in the
   general population, but only 9 alleles in the study population, it is
   possible to collapse the functional number of alleles to 9 or 10.
   Usually, adjust the allele frequencies to sum to 1 by dividing each
   allele frequency by the sum of the (observed) allele frequencies. For
   the latter all the allele frequencies remain the same, but the
   unobserved ones are collapsed into a single allele (and frequency). If
   there are 9 observed alleles (but there are 14 in the population),
   then rescaling the frequencies of the observed 9 alleles will also not
   produce quite correct results. Consider the unlikely example of a huge
   pedigree with only the most recent generation observed in which the
   observed 9 alleles all have very low and equal frequency. If there are
   distantly separated relatives who are affected there is some
   reasonable support for linkage since the alleles are rare. But if we
   rescale frequencies to 1/9 per alleles, then sharing of alleles isn't
   so unlikely. Coding the marker with 10 alleles produces correct
   results as it will produce the same lod scores as would coding the
   marker with 14 alleles.
   
   4.7) What do you do when you get thetas greater than 0.5 when using
   LINKAGE? [1996/22/96]
   
   This seems to occur when the GEMINI optimization procedure prefers to
   go for a local optimum of a theta greater than 0.5 as a result of the
   starting theta values being to high in a LINKAGE run using ILINK or
   LODSCORE. This can easily be fixed by modifying the starting theta
   direclty with LCP or editing the LCP generated script. One can also
   modify the starting value with PREPLINK or by editing the data file
   containing allele and disease frequencies. This can be an iterative
   process and one should change theta values by an order of magnitude
   until reasonable thetas are obtained. One must also be careful of
   having intial thetas too low, this can also cause problems in the form
   of erroneous values. One can also run MLINK to examine what is
   happening at different thetas to determine the best starting theta.
   
   5.0) COMPUTER ADMINISTRATION AND OPTIMIZATION
   
   5.1) How can I increase the speed of the LINKAGE/FASTLINK package on
   my workstation? [1995/05/18]
   
   1. Use FASTLINK, which is the C version of the LINKAGE package with a
   few algorithmic improvements. It can increase the speed of your
   calculations by an order of magnitude.
   
   2. Setting up lots of paging space, which uses the hard drive as
   virtual memory (300 megs is usually plenty). Note that paging space is
   the same as swap space. Then use the "fast" versions of FASTLINK.
   
   3. Use GCC, which is the GNU/Free Software Foundation C compiler, to
   compile FASTLINK. GCC produces machine language that is about 10%
   faster than Sun's C compiler.
   
   4. Install the generic small kernel instead of the generic kernel. The
   generic kernel has device files for almost everything, and can slow
   the system down. The generic small kernel is configured for a system
   without many devices and without many users. Installing a generic
   small kernel is an option during system installation on Sun
   workstations.
   
   5. Reconfigure your kernel so it has only devices you need. This
   should give you a small improvement in overall system speed, but if
   you are already running the generic-small kernel, additional
   improvement may be so small that it's not worth the trouble. If the
   generic small kernel is insufficient for your system this step is a
   must. The generic kernel will slow down your workstation significantly
   and most of the device support is unnecessary.
   
   6. Don't run your linkage analyses in the background, because running
   programs in the background gives them a lower priority. Either do the
   runs in the foreground or you can use the root password to nice the
   pedin process by -3 to compensate (negative nice values give a higher
   priority). If you need to log out, you can use the screen command and
   "detach" a session so you can log out without programs terminating.
   Later you can log back in and "reattach" the session, which continued
   to run while you were logged out. The screen command is available at
   prep.ai.mit.edu and is also on the O'Reilly Unix Power Tools CD- ROM.
   According to the Sun documentation, nicing below -10 can interfere
   with the operating system and actually reduce the process' speed.
   Running them at the standard default level of 0 is usually sufficient.
   Some people recommend to run a background job to using nice +19 (!).
   In this way, the job will not interfere with other normal processes
   like login.
   
   7. Runs with 100% penetrance can run faster than runs with incomplete
   penetrance. Of course, if you have an unaffected obligate carrier,
   this won't work. In addition, incomplete penetrance runs may be
   necessary for your research to be "good".
   
   8. Change the block size of your file system. One can increase
   performance of a file system by increasing the block size, thus
   decreasing the number of read-write operations. A block device, such
   as a hard disk, usually accesses a block of data simultaneously. Thus,
   if one is expecting to use large files, having large blocks will be an
   advantage. However, one usually trades the number of bytes lost to
   partial files since one has to increase the fragment size to a number
   larger than 1024, for example 2048. That is, each file or part of a
   file occupies 2048 bytes, a file of 100 bytes will still occupy 2048
   bytes. Therefore, bigger blocks give faster bigger blocks with bigger
   fragments and more lost space.
   
   9. It has been noted that you can increase the speed of programs which
   create/access large files in the /tmp directory by creating a tmpfs
   file system.
   
   10. Of course, buying more RAM will increase your speed. It's been
   said that increasing RAM from 16 to 32 megs will result in a large
   increase in speed and increasing RAM from 32-64 megs will result in a
   significant increase. However, increasing beyond 64 megs is not
   particularly helpful.
   
   6.0) MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS 
   
   6.1) What screening sets are available for linkage analysis?
   [1995/09/14] 
   
   For humans there are the Weber lab screening sets: 3, 3A, 4, 4A, 5,
   5A, and 6 . Primers for the markers within these sets are available
   from Research Genetics, both in unlabeled and fluorescent
   dye-conjugated forms. The information on these screening sets can be
   downloaded via FTP from dgabby.mfldclin.edu, they are in /pub.
EOF 



More information about the Gen-link mailing list