bionet.molbio.gene-linkage FREQUENTLY ASKED QUESTIONS (part 2 of 3)

Conan the Librarian rootd at ee.pdx.edu
Sun Nov 20 03:51:44 EST 1994


   however, uses a text interface, so you don't need a fancy x-window (or
   MacMosaic or WinMosaic) to browse the web (this is great when I dial
   in from home, and only have an ascii terminal). You can find lynx (and
   several other web-browsers) at ftp.isri.unlv.edu in
   /pub/mirror/infosystems/WWW/clients. Of course, you could always
   use archie to find other ftp sites with lynx. 

   I can telnet to the internet. Can I access the web? [harper;22Jul94] 

   Surprise surprise, not everyone has a workstation or Xwindows, and
   many scientists only have simple vt100 emulation on their desktop
   machine. They read about www, gopher, archie, etc, but due to
   hardware or software limitations they can not get at any of the goodies
   on the net. There are not many "public access" sites that allow you to
   open up a telnet session and then choose from the most popular services
   on the net today. 

   Well for anyone who can open up a telnet session you can now play
   with the big boys even though your equipment is from the last decade. 

   Give the command: 

   telnet info.funet.fi 

   and you will be presented with the following menu. 

           Finnish University and Research Network FUNET

                        Information Service

   The following information services are available:
   gopher       Menu-based global information tool
   www          World Wide Web, Global hypertext web
   wais         Wide Area Information Server, global databases on
                on different topics
   x500         X.500 clients are on  nic.funet.fi, login: dua, no password
   archie       Database of Internet Archive contents
   exit         Exit FUNET information services

   What www sites have useful genetic-linkage information?
   [rootd;21may94][rootd;19Nov94] 

      http://www.genethon.fr the genethon center 
      http://www.chlc.org the Cooperative human linkage center 
      http://www.gdb.org/hopkins.html has a molecular genetics
      primer 
      http:gdbwww.gdb.org has a very useful but restricted version
      of GDB available. It may also be possible to access OMIM
      without an account here--but I haven't tried it out. 
      "http://www.ai.sri.com/people/pkarp/mimbd/rsmith.html/
      has the Survey of Molecular Biology Databases and Servers 
      http://mendel.berkeley.edu/dog.html is the home of the dog
      genome project. 
      http://www.pathology.washington.edu has human and mouse
      standard idiograms.The idiograms are useful for making
      illustrations for gene mapping, i.e. physical, and for
      constructing abnormal chromosome illustrations, like
      translocations, deletions, etc. The PostScript versions produce
      high quality output - can be sent to lino for publication figures.
      The PostScript idiograms can be manipulated band-by-band
      with illustration software such as Adobe Illustrator, Aldus
      FreeHand, Canvas, Altsys Virtuoso, etc. 

   What "linkage centers" make information and assistance available to
   researchers? [rootd;29may94] 

   The cooperative human linkage center can be contacted at: 
      gopher.chlc.org 
      http://www.chlc.org (world-wide-web) 
      ftp://ftp.chlc.org (ftp) 
      info-server at chlc.org (an automated information service) or 
      help at hclc.org (real humans!) 
   We encourage people to use the info-server first, and then explore the
   gopher or www site before trying to contact a human at help. It will
   probably be faster too, since the humans at chlc are working on tons of
   neat projects. 

   Among other things, CHLC provides primer selection and linkage
   analysis via email. Information on those services can be found by
   sending email to: 
      primer-server at chlc.org 
      linkage-server at chlc.org 

   According to Bob Stodola at chlc: 

   "Currently, our email server is fairly crude -- it does crimap
   two-point analysis and maps the data with respect to the CHLC
   markers. Our plan is to include a substantially enhanced version
   which replicates what CHLC is using in terms of data diagnostics
   and mapping information." 

   And another center:

   I am David Featherston, from the Dutch EMBnet Node, where we are
   starting a linkage analysis service: software availability and
   support/advice (at first),and (if I ever get my Drosophila/F2 geneticists
   head wrapped around pedigrees and maximum likelihood) training and
   perhaps consultancy. At present, we have MapMaker/EXP 3.0b,
   MapMaker/QTL 1.1, Lathrop and Lalouel's Linkage programmes and
   Schaffer et al's Fastlink versions of ILINK, LINKMAP, LODSCORE
   and MLINK on offer. "On offer" means that if a user has a Genomics
   Package account at the CAOS/CAMM Center, they can use these
   programmes on our fast computers to analysetheir data sets. Anyone is
   welcome to contact me for information about what elseis included in a
   Genomics Package account, and for the details about opening one. 

   Their ftp site is camms1.caos.kun.nl, and the email contact is
   davidf at caos.kun.nl

   What journals are useful for genetic linkage analysis? [Young Bae Choi
   put this on the net, I edited it--rootd,19may94] 
      Journal of Computational Biology (New!) Editors: Michale
      Waterman and David Kingsbury Contact: 
      dkinsbu at merlot.welch.jhu.edu or msw at hto.usc.edu 
      Human Genome Project Journal (?) Contact: Tim Stearns 
      tim at eeg.com 
      Computer Applications in Biosciences (CABIOS) 
      American Journal of Human Genetics 
      Nature Genetics 
      Genomics 
      Human genome news, available by gopher from gopher.gdb.org 
      Human Heredity, edited by Jurg Ott (jurg.ott at columbia.edu) 

   GENE-LINKAGE SOFTWARE OVERVIEW

   What database management programs do people use for
   genetic-linkage data? [rootd;15may94] 
      Paradox:This is a full database-management system available
      from Borland computer company for IBM machines. Like most
      other "full feature" databases, it is reliable and supported on
      most IBM platforms, but not tailored specifically to the needs
      of genetic researchers. It has a good educational discount. We
      use it, but have to repeatedly set up our report-formats for
      linkage output. Getting liped output format is nontrivial. 
      Linksys: This custom-made database program was written by J
      Attwood and S Bryant. Although they continue to use it, John
      Attwood suggests using dolink instead. Linksys is not currently
      available at any ftp sites. 
      Dolink: This DOS custom database program (by D Curtis I
      think??) manages genetic data and sets up input files for your
      analysis. It is available from ftp.bchs.uh.edu. 
      Kindred:This new DOS database program, distributed by
      Epicenter Software, is specifically designed for linkage
      analysis. A free demo is available by calling (818)-304-9487.
      In addition to database duties, this program (according to the ad,
      not from personal experience) will draw pedigrees, haplotype
      marker data, and can output in linkage format. The demo did
      not work on our IBM because our monitor is from the stone
      age. We were able to get the demo to run on a Power-PC Mac
      with SoftWindows emulation, but it crashed the Mac when we
      hit the escape-key during the demo. Be forewarned: the list
      price is about $500. 
      CEPH:This database is specifically designed for chromosome
      mapping with ceph-style-pedigrees. It can output data in
      ped.out format or linkage format. Our version (5.0) fails when
      we output over 90 markers, but not the entire dataset. Santosh
      Mishra wrote a program (called mkcrigen) which converted the
      ped.out files to .gen files. Unfortunately we only have an old
      binary which was compiled with a maximum of about 85
      markers. If you try to convert a ped.out file to a .gen file with
      more than 85 markers, your final .gen file is messed up. Santosh
      Mishra modified the program to work with 500 markers, but
      we do not have any source code for mkcrigen (any version) and
      we do not have a binary for the improved version. Some other
      labs output the data in linkage format and convert that to .gen
      format. We don't like that because that separates the marker
      name from the marker data, and can result in errors. I believe
      that the ceph database is available on the ceph ftp site, but I do
      not have the address. [Also see the "What is Cryllic?" question] 

   Please send comments on database programs you use! 

   What programs are available for pedigree drawing? [rootd;16may94] 
      peddraw (IBM version): This program (Possibly written by
      Dave Curtis) is a pedigree drawing program for IBMs available
      from ftp.bchs.uh.edu in the /pub/gene-server/dos directory. I
      have never used it. 
      ftree: This is another IBM pedigree program written by Rodney
      Go at the University of Alabama. I have a copy, but do not
      know where this program is available. I don't use it, but some
      old pedigrees in a notebook look very pretty. 
      peddraw (Mac Version): This program, written by B Dyke, P
      Mamelka, and J MacCleur, is available from: 
         bdyke at darwin.sfbr.org
         Pedigree/Draw
         Department of Genetics
         Soutwest Foundation for Biomedical Research
         PO. Box 28147
         San Antonio, TX 78228-0147

      An upgrade from a previous version is $10 (current version =
      4.4). Documentation costs $10 (get it). The full package
      including documentation costs $45. The best thing about (mac)
      peddraw is that the text file formats are included in the
      documetation. I have a sed-awk-sh script which converts
      linkage format to peddraw format, making generation of large
      pedigrees easy. My simple script is available via anonymous ftp
      at ftp.ee.pdx.edu in /pub/users/cat/rootd/convert.new 

      Genetree: "The GeneTree 1.0 package provides a convenient
      way to draw family tree diagrams suitable for genetics or
      geneology using an IBM PC or compatible computer. The
      package consists of the GeneTree program, which draws
      pedigree diagrams using a command language, and SC, a
      menu-driven program that facilitates creation of GeneTree
      commands. GeneTree and SC are made available with program
      manuals, examples of family tree diagrams, and a GeneTree
      Quick Reference Guide. GeneTree is written in the C
      programming language. Note that it is a DRAWING program
      and does not compute genetic parameters." The genetree
      program is available from wijsman at max.u.washington.edu at a
      price of $125 (because of licensing fees from a private company
      which wrote one of the drivers used in the program) 

   [Also see the "What is Cryllic?" question] 

   Why are some programs used primairly for chromosome mapping,
   while others are used for disease-mapping? [rootd;15may94] 

   Any family can be used for chromosome mapping, so CEPH has picked
   a particular family "shape" and generated a large database with these
   families. Programs designed for chromosome mapping can be
   optimized for using these families, reducing the time needed for
   calculations. Only families afflicted with a disease can be used for
   disease-gene-mapping. As a result, programs designed for
   disease-gene-mapping need to be able to deal with arbitrary pedigrees.
   In addition, these programs need to be able to handle
   incomplete-penetrance. 

   What programs are used for chromosome mapping? [rootd;21may94] 

      crimap: This program has been used for chromosome mapping
      for years. It has options which can generate maps, calculate
      order probablities, and printout recombination data. It works on
      .gen files with data from CEPH-style families. It is written in
      K&R type C code, and Phil Green (the author) has successfully
      run it on UNIX, DOS, VMS, and Macintosh systems. It is not
      available via anonymous ftp. 
      multimap:This Lisp-based expert system uses an customized
      version of crimap to create a chromosome map. It is available
      via anonymous ftp from genome1.hgen.pitt.edu. The authors (T
      Matise, M Perlin, and A Chakravarti) continute to improve the
      code, add new functions, and provide excellent support. When
      used with the crimap chrompic option (to find
      double-recombinations to identify possible errors), it is
      incredibly useful. This is Unix-only (supported for Dec-Ultrix,
      HP9000, and Suns). The customized crimap version (called
      lispcri) is distributed at the ftp site, but was not meant to be
      used independently of multimap. 
      mapmaker: MAPMAKER 
         Dr. Eric Lander
         Whitehead Institute
         9 Cambridge Center
         Cambridge, MA 02142
         mapm%mitwibr at mitvma.mit.edu
      CHROMLOOK: This is somewhat similar to
      chrompic-crimap (I hear that the output is easier to read). It
      takes input files in the liped format. It was written by Jonathan
      Haines, and I currently do not have an ftp site for this program. 
      clinkage: This is the special version of the LINKAGE
      programs for 3-generation (CEPH) pedigrees and codominant
      markers. PC version available by ftp from
      york.ccc.columbia.edu. Unix version from corona.med.utah.edu.
   a 

   What programs are used for disease-gene mapping? [rootd;21may94] 

      LABMAN and LINKMAN are made available free of charge
      (via anonymous ftp) by Dr. Phil Adams of Columbia
      University. I don't know what they do, or what the specific ftp
      site is, but a paper reference is: Genetic Epidemiology (1994,
      vol. 11, no. 1, pp. 87-94. 
      Simlink: This fortran program (by L Ploughhman and M
      Boehnke) simulates linkage analysis on a family, and gives you
      an "estimate the probability, or power, of detecting linkage
      given family history information on a set of identified
      pedigrees." It allows the researcher to determine whether a
      family has sufficient informativeness to detect linkage. In
      addition, it can help the researcher to decide how far apart to
      seperate their genetic probes without "missing" the disease
      locus (ie. Do I use probes seperated by 30cM? or will 40cM be
      close enough given the informativeness of this family). This can
      save the researcher considerable time and money. The
      researcher won't waste money doing a genome search on an
      insufficiently-informative family. Large families can be
      "trimmed" during the initial genome-search, and then the entire
      family can be used later during marker-localization. Simlink
      data can be useful on grant applications (to prove that the
      family you propose to analyze is sufficiently informative).
      Simlink requires large quantities of memory. It was written for
      IBM's, but has been ported to many platforms including: 
         Sequent symmetry S8000's. 
            It is available from:
            Michael Boehnke
            Michael.Boehnke at um.cc.umich.edu
            Department of Biostatistics
            School of Public Health
            University of Michigan
            Ann Arbor, MI 48109-2029
         No postage-money or blank disks are necessary to get
         simlink sent to you (Thanks Dr. Boehnke!) Simlink
         "may" be available via anonymous ftp "soon"
      Slink: This Pascal program (by D Weeks, M Lathrop, J. Ott) is
      similar to Simlink. It is more general than Simlink in that it
      allows for partial marker typing at the locus to be generated,
      but it runs slower than Simlink. Available from
      york.ccc.columbia.edu or on floppies (see Linkage). 
      Liped: This IBM program (written by Jurg Ott) calculates
      probabilities for genetic linkage between disease-markers and
      genetic-markers. It's input file differentiates between
      phenotypes and genotypes. As a result, this program is easiest to
      use when your data is from "old-style" genetic-markers (such
      as blood phenotype data). 

      Cathy Falk writes: There ARE a couple of versions of Liped
      around that work on the Sun, but each one seems to have its own
      developmental path (from the original), so it's not so easy to
      describe. We have a version that came from UCLA (Dr. Anne
      Spence) which we have had running on the Sun for some time.
      It accepts up to 6 alleles per locus, and we now want to increase
      that. It also has a somewhat different structure for the input
      files. Dr. Peggy Pericek-Vance, at Duke, has a version that
      accepts up to 8 alleles, but it is a modification of an earlier
      LIPED and is not totally compatible with our current (UCLA)
      version. Dr. Ott has a PC version which he thinks would be easy
      to modify for the Sun, and Dr. David Greenberg informed me
      that he has a version for DEC (VMS) machines. 

      GREGOR: is a piece of software (IBM PC compatible) for
      producing simulated genetic data. It does not perform linkage
      analysis, but it may be useful for _testing_ methods or
      assumptions about linkage analysis. GREGOR is operated by a
      series of hierarchical menus that permit the user to define
      hypothetical genetic scenarios (gene positions and effects) and
      produce simulated data-sets for a variety of population
      structures. GREGOR is available by ftp from the site
      "sifon.cc.mcgill.ca" in a "pkzip" archive called "pub/McGill-
      Contrib/GREGOR.ZIP". Further information can be found in
      the following reference: N.A. Tinker and D.E. Mather. 1993.
      GREGOR: software for genetic simulation. J. Hered.
      84(3):237- 238. Questions should be directed to the authors:
      (tinker at agradm.lan.mcgill.ca or
      mather at agradm.lan.mcgill.ca). [thanks to tinker for sending
      this--rootd] 
      Linkage: This package of programs, developed by Mark
      Lathrop with help from JM Lalouel, C Jlier, and J Ott. Jurg Ott
      maintains the IBM versions. The Linkage package consists of
      several analysis programs (each of which do a particular type of
      analysis) and several utility programs (which make the analysis
      programs easy to use). Versions are available for IBM's and
      unix platforms. Here are some of the analysis programs: 
         mlink: 2-point lod-score calculations at fixed
         recombination distances 
         linkmap: multipoint lod-score calcuations at fixed
         distances 
         ilink: calculates the recombination distance with the
         highest lod-score 
      Unix versions are available from corona.med.utah.edu PC and
      VMS versions are available from york.ccc.columbia.edu, or on
      floppy disks, when you write to: 
         Katherine Montague/Jurg Ott
         Columbia University, Unit 58
         722 West 168th Street
         New York, NY 10032
      Send a bunch of preformatted IBM disks if you request linkage
      by mail. Jurg Ott (jurg.ott at columbia.edu can send you more
      information regarding mail-requests for the linkage package). 
      fastlink: This is a port of the linkage package to C (by A
      Schaffer, R Cottingham, and R Idury). The initial port increased
      the speed by an order of magnitude. They continue to optimize
      the algorithm and code, resulting in continued speed
      improvements. In addition, fastlink allows you to compile in
      "fast" or "slow" mode (the slow version of fastlink is still much
      faster than the old linkage programs). The "fast" version uses a
      ton of memory, but uses that memory to contain some of the
      intermediate results which are repetitively recalculated in the
      "slow" version (and the old linkage package). We obtain good
      results by setting up 300 megs of virtual memory on our sparc
      and using the fast version (at one point we ran a fastlink
      linkmap run with 700 haplotypes). The fastlink programs are
      also more portable. Earlier versions of fastlink required
      installation of p2c (the free-software foundation's pascal-to-C
      converter). That is no longer necessary. 
      Affected Pedigree Member Method package by Dan Weeks
      and available via anonymous ftp from watson.hgen.pitt.edu.
      Here's some info that Dr. Weeks sent me:

      The Affected Pedigree Member Method distribution contains
      the new APM programs, a new file conversion utility, and a
      histogram/statistics generator (all of which are version 2.0). 

      To build the entire distribution, you need C, Pascal, and Fortran
      compilers. A make utility is also helpful. 

      Instructions on building the distribution are in the file HowTo.
      Please read the file READ_ME_FIRST before doing anything.
      For an introduction to the APM programs, read the Intro file.
      For a list of known bugs, read the BUGS file. 

         The programs which are built include: 
         apm, a program to calculate the single locus statistic
         over one or several marker loci 
         sim, a program to simulate pedigrees and, using output
         files of apm, test for asymptotic normality of the null
         distribution 
         apmmult, a program to generate the multi-locus statistic
         simmult, a program like sim but which simulates
         recombination and uses the output of apmmult 
         chapm, a program to convert LINKAGE files to APM
         files, or APM files of one format to APM files of
         another format 
         hist, a program to compute various statistical figures,
         plot a histogram, and compute empirical p-values 

      emaillink: I was working on a system to allow people to submit
      FASTLINK runs via email, but it's on indefinite hold while I
      work on classes and stuff. Perhaps I'll get back to it
      someday--rootd. 

   What programs are available to help detect errors in linkage data? 

   By linkage data, I mean any genetic-linkage dataset, not just those for
   the Lathrop Linkage package. This is an important question, and I
   simply do not know the answer. 

   I've used the crimap-chrompic option, and played with xpic/phap a
   little bit, but I really hope some people send me some information on
   this topic. 

   What is Cyrllic? [P Janssens; 22may94] 

   Cyrillic is a pedigree editor, with facilities for including marker data,
   you can then ask it to interface with LINKAGE, i.e. it creates the input
   files for MAKEPED, and runs the whole show. It is Windows based, so
   input of the pedigree is very efficient. You also have a data form
   associated with each individual where you can store names, DNA
   numbers, etc. If you want I can email you version 1.11, to have a look
   at. They also have technical support by email from Oxford. Let me
   know if you are interested. I had to learn to use the program here, and
   teach everyone else in the lab. Just before I started working here they
   had bought it. I also had to learn the old way of preparing the datain
   files for MAKEPED, and I promise you that I will never look back.
   There were some serious bugs in version 1, but as far as I can tell it has
   all been fixed quite nicely. There are of course some features that they
   are still busy implementing, but it is an excellent interface with
   LINKAGE! 

   What programs help me recode genetic markers? [dcurtis;20Jul94]

   If anyone's interested, DOLINK can downcode alleles automatically.
   I'm not sure if it uses the same algorithm as Ott's, but it's described in
   the documentation along with potential drawbacks. The main use of
   DOLINK is to prepare files for LINKAGE etc. from a database. It's at
   diamond.gene.ucl.ac.uk in /pub/packages/dcurtis. I've _nearly_ got a
   version ready to run under X (current is only for DOS) + I will try to
   accelerate this if there is huge public interest. 

   LINKAGE PACKAGE SPECIFIC INFORMATION

   How do you calculate MAXHAP? [rootd;15may94] 

   Maxhap is the maximum possible number of haplotypes in your
   analysis. You multiply together the number of alleles at each locus
   used in a particular run (not all the loci in your dataset, just the loci
   you
   use). Remember that affection status counts as two alleles, regardless
   of the number of liability classes. 

   For example, if a dataset has the following information: 

   affection status: 4 liability classes
   Marker A: 3 alleles
   marker B: 4 alleles
   marker C: 5 alleles

   And your run includes a linkmap run between affection-status, A, and
   B, then your MAXHAP must be (at least) 2*3*4 

   When should you use binary coding instead of numeric allele coding?
   [Gerard Tromp;29may94] 

   Usually, there is no advantage to coding disease or loci as either binary
   or numeric using liability classes. Generally binary coding is more
   complex in that we humans have a hard time thinking that way. Some
   co-dominant phenotypes lend themselves to binary coding e.g. ABO
   bloodtypes: 

   A   - 1 0 1
   B   - 0 1 1
   O   - 0 0 1
   AB  - 1 1 1
   unk - 0 0 0

   in this case, one codes the O type factor as present in all cases except
   unknowns. Since one cannot distinguish AO from AA at the phenotype
   level one codes both genotypes as 1 0 1, presence of A and O. In reality
   O represents absence of both A and B. One can however not code that
   using 0 0, since 0 0 would be an unknown. 

   Use of binary codes has decrease since DNA markers have come into
   use, as they allow one to type an individual with respect to genotype.



More information about the Gen-link mailing list