bionet.molbio.gene-linkage FREQUENTLY ASKED QUESTIONS (part 2 of 3)
Conan the Librarian
rootd at ee.pdx.edu
Sun Nov 20 03:51:44 EST 1994
however, uses a text interface, so you don't need a fancy x-window (or
MacMosaic or WinMosaic) to browse the web (this is great when I dial
in from home, and only have an ascii terminal). You can find lynx (and
several other web-browsers) at ftp.isri.unlv.edu in
/pub/mirror/infosystems/WWW/clients. Of course, you could always
use archie to find other ftp sites with lynx.
I can telnet to the internet. Can I access the web? [harper;22Jul94]
Surprise surprise, not everyone has a workstation or Xwindows, and
many scientists only have simple vt100 emulation on their desktop
machine. They read about www, gopher, archie, etc, but due to
hardware or software limitations they can not get at any of the goodies
on the net. There are not many "public access" sites that allow you to
open up a telnet session and then choose from the most popular services
on the net today.
Well for anyone who can open up a telnet session you can now play
with the big boys even though your equipment is from the last decade.
Give the command:
and you will be presented with the following menu.
Finnish University and Research Network FUNET
The following information services are available:
gopher Menu-based global information tool
www World Wide Web, Global hypertext web
wais Wide Area Information Server, global databases on
on different topics
x500 X.500 clients are on nic.funet.fi, login: dua, no password
archie Database of Internet Archive contents
exit Exit FUNET information services
What www sites have useful genetic-linkage information?
http://www.genethon.fr the genethon center
http://www.chlc.org the Cooperative human linkage center
http://www.gdb.org/hopkins.html has a molecular genetics
http:gdbwww.gdb.org has a very useful but restricted version
of GDB available. It may also be possible to access OMIM
without an account here--but I haven't tried it out.
has the Survey of Molecular Biology Databases and Servers
http://mendel.berkeley.edu/dog.html is the home of the dog
http://www.pathology.washington.edu has human and mouse
standard idiograms.The idiograms are useful for making
illustrations for gene mapping, i.e. physical, and for
constructing abnormal chromosome illustrations, like
translocations, deletions, etc. The PostScript versions produce
high quality output - can be sent to lino for publication figures.
The PostScript idiograms can be manipulated band-by-band
with illustration software such as Adobe Illustrator, Aldus
FreeHand, Canvas, Altsys Virtuoso, etc.
What "linkage centers" make information and assistance available to
The cooperative human linkage center can be contacted at:
info-server at chlc.org (an automated information service) or
help at hclc.org (real humans!)
We encourage people to use the info-server first, and then explore the
gopher or www site before trying to contact a human at help. It will
probably be faster too, since the humans at chlc are working on tons of
Among other things, CHLC provides primer selection and linkage
analysis via email. Information on those services can be found by
sending email to:
primer-server at chlc.org
linkage-server at chlc.org
According to Bob Stodola at chlc:
"Currently, our email server is fairly crude -- it does crimap
two-point analysis and maps the data with respect to the CHLC
markers. Our plan is to include a substantially enhanced version
which replicates what CHLC is using in terms of data diagnostics
and mapping information."
And another center:
I am David Featherston, from the Dutch EMBnet Node, where we are
starting a linkage analysis service: software availability and
support/advice (at first),and (if I ever get my Drosophila/F2 geneticists
head wrapped around pedigrees and maximum likelihood) training and
perhaps consultancy. At present, we have MapMaker/EXP 3.0b,
MapMaker/QTL 1.1, Lathrop and Lalouel's Linkage programmes and
Schaffer et al's Fastlink versions of ILINK, LINKMAP, LODSCORE
and MLINK on offer. "On offer" means that if a user has a Genomics
Package account at the CAOS/CAMM Center, they can use these
programmes on our fast computers to analysetheir data sets. Anyone is
welcome to contact me for information about what elseis included in a
Genomics Package account, and for the details about opening one.
Their ftp site is camms1.caos.kun.nl, and the email contact is
davidf at caos.kun.nl
What journals are useful for genetic linkage analysis? [Young Bae Choi
put this on the net, I edited it--rootd,19may94]
Journal of Computational Biology (New!) Editors: Michale
Waterman and David Kingsbury Contact:
dkinsbu at merlot.welch.jhu.edu or msw at hto.usc.edu
Human Genome Project Journal (?) Contact: Tim Stearns
tim at eeg.com
Computer Applications in Biosciences (CABIOS)
American Journal of Human Genetics
Human genome news, available by gopher from gopher.gdb.org
Human Heredity, edited by Jurg Ott (jurg.ott at columbia.edu)
GENE-LINKAGE SOFTWARE OVERVIEW
What database management programs do people use for
genetic-linkage data? [rootd;15may94]
Paradox:This is a full database-management system available
from Borland computer company for IBM machines. Like most
other "full feature" databases, it is reliable and supported on
most IBM platforms, but not tailored specifically to the needs
of genetic researchers. It has a good educational discount. We
use it, but have to repeatedly set up our report-formats for
linkage output. Getting liped output format is nontrivial.
Linksys: This custom-made database program was written by J
Attwood and S Bryant. Although they continue to use it, John
Attwood suggests using dolink instead. Linksys is not currently
available at any ftp sites.
Dolink: This DOS custom database program (by D Curtis I
think??) manages genetic data and sets up input files for your
analysis. It is available from ftp.bchs.uh.edu.
Kindred:This new DOS database program, distributed by
Epicenter Software, is specifically designed for linkage
analysis. A free demo is available by calling (818)-304-9487.
In addition to database duties, this program (according to the ad,
not from personal experience) will draw pedigrees, haplotype
marker data, and can output in linkage format. The demo did
not work on our IBM because our monitor is from the stone
age. We were able to get the demo to run on a Power-PC Mac
with SoftWindows emulation, but it crashed the Mac when we
hit the escape-key during the demo. Be forewarned: the list
price is about $500.
CEPH:This database is specifically designed for chromosome
mapping with ceph-style-pedigrees. It can output data in
ped.out format or linkage format. Our version (5.0) fails when
we output over 90 markers, but not the entire dataset. Santosh
Mishra wrote a program (called mkcrigen) which converted the
ped.out files to .gen files. Unfortunately we only have an old
binary which was compiled with a maximum of about 85
markers. If you try to convert a ped.out file to a .gen file with
more than 85 markers, your final .gen file is messed up. Santosh
Mishra modified the program to work with 500 markers, but
we do not have any source code for mkcrigen (any version) and
we do not have a binary for the improved version. Some other
labs output the data in linkage format and convert that to .gen
format. We don't like that because that separates the marker
name from the marker data, and can result in errors. I believe
that the ceph database is available on the ceph ftp site, but I do
not have the address. [Also see the "What is Cryllic?" question]
Please send comments on database programs you use!
What programs are available for pedigree drawing? [rootd;16may94]
peddraw (IBM version): This program (Possibly written by
Dave Curtis) is a pedigree drawing program for IBMs available
from ftp.bchs.uh.edu in the /pub/gene-server/dos directory. I
have never used it.
ftree: This is another IBM pedigree program written by Rodney
Go at the University of Alabama. I have a copy, but do not
know where this program is available. I don't use it, but some
old pedigrees in a notebook look very pretty.
peddraw (Mac Version): This program, written by B Dyke, P
Mamelka, and J MacCleur, is available from:
bdyke at darwin.sfbr.org
Department of Genetics
Soutwest Foundation for Biomedical Research
PO. Box 28147
San Antonio, TX 78228-0147
An upgrade from a previous version is $10 (current version =
4.4). Documentation costs $10 (get it). The full package
including documentation costs $45. The best thing about (mac)
peddraw is that the text file formats are included in the
documetation. I have a sed-awk-sh script which converts
linkage format to peddraw format, making generation of large
pedigrees easy. My simple script is available via anonymous ftp
at ftp.ee.pdx.edu in /pub/users/cat/rootd/convert.new
Genetree: "The GeneTree 1.0 package provides a convenient
way to draw family tree diagrams suitable for genetics or
geneology using an IBM PC or compatible computer. The
package consists of the GeneTree program, which draws
pedigree diagrams using a command language, and SC, a
menu-driven program that facilitates creation of GeneTree
commands. GeneTree and SC are made available with program
manuals, examples of family tree diagrams, and a GeneTree
Quick Reference Guide. GeneTree is written in the C
programming language. Note that it is a DRAWING program
and does not compute genetic parameters." The genetree
program is available from wijsman at max.u.washington.edu at a
price of $125 (because of licensing fees from a private company
which wrote one of the drivers used in the program)
[Also see the "What is Cryllic?" question]
Why are some programs used primairly for chromosome mapping,
while others are used for disease-mapping? [rootd;15may94]
Any family can be used for chromosome mapping, so CEPH has picked
a particular family "shape" and generated a large database with these
families. Programs designed for chromosome mapping can be
optimized for using these families, reducing the time needed for
calculations. Only families afflicted with a disease can be used for
disease-gene-mapping. As a result, programs designed for
disease-gene-mapping need to be able to deal with arbitrary pedigrees.
In addition, these programs need to be able to handle
What programs are used for chromosome mapping? [rootd;21may94]
crimap: This program has been used for chromosome mapping
for years. It has options which can generate maps, calculate
order probablities, and printout recombination data. It works on
.gen files with data from CEPH-style families. It is written in
K&R type C code, and Phil Green (the author) has successfully
run it on UNIX, DOS, VMS, and Macintosh systems. It is not
available via anonymous ftp.
multimap:This Lisp-based expert system uses an customized
version of crimap to create a chromosome map. It is available
via anonymous ftp from genome1.hgen.pitt.edu. The authors (T
Matise, M Perlin, and A Chakravarti) continute to improve the
code, add new functions, and provide excellent support. When
used with the crimap chrompic option (to find
double-recombinations to identify possible errors), it is
incredibly useful. This is Unix-only (supported for Dec-Ultrix,
HP9000, and Suns). The customized crimap version (called
lispcri) is distributed at the ftp site, but was not meant to be
used independently of multimap.
Dr. Eric Lander
9 Cambridge Center
Cambridge, MA 02142
mapm%mitwibr at mitvma.mit.edu
CHROMLOOK: This is somewhat similar to
chrompic-crimap (I hear that the output is easier to read). It
takes input files in the liped format. It was written by Jonathan
Haines, and I currently do not have an ftp site for this program.
clinkage: This is the special version of the LINKAGE
programs for 3-generation (CEPH) pedigrees and codominant
markers. PC version available by ftp from
york.ccc.columbia.edu. Unix version from corona.med.utah.edu.
What programs are used for disease-gene mapping? [rootd;21may94]
LABMAN and LINKMAN are made available free of charge
(via anonymous ftp) by Dr. Phil Adams of Columbia
University. I don't know what they do, or what the specific ftp
site is, but a paper reference is: Genetic Epidemiology (1994,
vol. 11, no. 1, pp. 87-94.
Simlink: This fortran program (by L Ploughhman and M
Boehnke) simulates linkage analysis on a family, and gives you
an "estimate the probability, or power, of detecting linkage
given family history information on a set of identified
pedigrees." It allows the researcher to determine whether a
family has sufficient informativeness to detect linkage. In
addition, it can help the researcher to decide how far apart to
seperate their genetic probes without "missing" the disease
locus (ie. Do I use probes seperated by 30cM? or will 40cM be
close enough given the informativeness of this family). This can
save the researcher considerable time and money. The
researcher won't waste money doing a genome search on an
insufficiently-informative family. Large families can be
"trimmed" during the initial genome-search, and then the entire
family can be used later during marker-localization. Simlink
data can be useful on grant applications (to prove that the
family you propose to analyze is sufficiently informative).
Simlink requires large quantities of memory. It was written for
IBM's, but has been ported to many platforms including:
Sequent symmetry S8000's.
It is available from:
Michael.Boehnke at um.cc.umich.edu
Department of Biostatistics
School of Public Health
University of Michigan
Ann Arbor, MI 48109-2029
No postage-money or blank disks are necessary to get
simlink sent to you (Thanks Dr. Boehnke!) Simlink
"may" be available via anonymous ftp "soon"
Slink: This Pascal program (by D Weeks, M Lathrop, J. Ott) is
similar to Simlink. It is more general than Simlink in that it
allows for partial marker typing at the locus to be generated,
but it runs slower than Simlink. Available from
york.ccc.columbia.edu or on floppies (see Linkage).
Liped: This IBM program (written by Jurg Ott) calculates
probabilities for genetic linkage between disease-markers and
genetic-markers. It's input file differentiates between
phenotypes and genotypes. As a result, this program is easiest to
use when your data is from "old-style" genetic-markers (such
as blood phenotype data).
Cathy Falk writes: There ARE a couple of versions of Liped
around that work on the Sun, but each one seems to have its own
developmental path (from the original), so it's not so easy to
describe. We have a version that came from UCLA (Dr. Anne
Spence) which we have had running on the Sun for some time.
It accepts up to 6 alleles per locus, and we now want to increase
that. It also has a somewhat different structure for the input
files. Dr. Peggy Pericek-Vance, at Duke, has a version that
accepts up to 8 alleles, but it is a modification of an earlier
LIPED and is not totally compatible with our current (UCLA)
version. Dr. Ott has a PC version which he thinks would be easy
to modify for the Sun, and Dr. David Greenberg informed me
that he has a version for DEC (VMS) machines.
GREGOR: is a piece of software (IBM PC compatible) for
producing simulated genetic data. It does not perform linkage
analysis, but it may be useful for _testing_ methods or
assumptions about linkage analysis. GREGOR is operated by a
series of hierarchical menus that permit the user to define
hypothetical genetic scenarios (gene positions and effects) and
produce simulated data-sets for a variety of population
structures. GREGOR is available by ftp from the site
"sifon.cc.mcgill.ca" in a "pkzip" archive called "pub/McGill-
Contrib/GREGOR.ZIP". Further information can be found in
the following reference: N.A. Tinker and D.E. Mather. 1993.
GREGOR: software for genetic simulation. J. Hered.
84(3):237- 238. Questions should be directed to the authors:
(tinker at agradm.lan.mcgill.ca or
mather at agradm.lan.mcgill.ca). [thanks to tinker for sending
Linkage: This package of programs, developed by Mark
Lathrop with help from JM Lalouel, C Jlier, and J Ott. Jurg Ott
maintains the IBM versions. The Linkage package consists of
several analysis programs (each of which do a particular type of
analysis) and several utility programs (which make the analysis
programs easy to use). Versions are available for IBM's and
unix platforms. Here are some of the analysis programs:
mlink: 2-point lod-score calculations at fixed
linkmap: multipoint lod-score calcuations at fixed
ilink: calculates the recombination distance with the
Unix versions are available from corona.med.utah.edu PC and
VMS versions are available from york.ccc.columbia.edu, or on
floppy disks, when you write to:
Katherine Montague/Jurg Ott
Columbia University, Unit 58
722 West 168th Street
New York, NY 10032
Send a bunch of preformatted IBM disks if you request linkage
by mail. Jurg Ott (jurg.ott at columbia.edu can send you more
information regarding mail-requests for the linkage package).
fastlink: This is a port of the linkage package to C (by A
Schaffer, R Cottingham, and R Idury). The initial port increased
the speed by an order of magnitude. They continue to optimize
the algorithm and code, resulting in continued speed
improvements. In addition, fastlink allows you to compile in
"fast" or "slow" mode (the slow version of fastlink is still much
faster than the old linkage programs). The "fast" version uses a
ton of memory, but uses that memory to contain some of the
intermediate results which are repetitively recalculated in the
"slow" version (and the old linkage package). We obtain good
results by setting up 300 megs of virtual memory on our sparc
and using the fast version (at one point we ran a fastlink
linkmap run with 700 haplotypes). The fastlink programs are
also more portable. Earlier versions of fastlink required
installation of p2c (the free-software foundation's pascal-to-C
converter). That is no longer necessary.
Affected Pedigree Member Method package by Dan Weeks
and available via anonymous ftp from watson.hgen.pitt.edu.
Here's some info that Dr. Weeks sent me:
The Affected Pedigree Member Method distribution contains
the new APM programs, a new file conversion utility, and a
histogram/statistics generator (all of which are version 2.0).
To build the entire distribution, you need C, Pascal, and Fortran
compilers. A make utility is also helpful.
Instructions on building the distribution are in the file HowTo.
Please read the file READ_ME_FIRST before doing anything.
For an introduction to the APM programs, read the Intro file.
For a list of known bugs, read the BUGS file.
The programs which are built include:
apm, a program to calculate the single locus statistic
over one or several marker loci
sim, a program to simulate pedigrees and, using output
files of apm, test for asymptotic normality of the null
apmmult, a program to generate the multi-locus statistic
simmult, a program like sim but which simulates
recombination and uses the output of apmmult
chapm, a program to convert LINKAGE files to APM
files, or APM files of one format to APM files of
hist, a program to compute various statistical figures,
plot a histogram, and compute empirical p-values
emaillink: I was working on a system to allow people to submit
FASTLINK runs via email, but it's on indefinite hold while I
work on classes and stuff. Perhaps I'll get back to it
What programs are available to help detect errors in linkage data?
By linkage data, I mean any genetic-linkage dataset, not just those for
the Lathrop Linkage package. This is an important question, and I
simply do not know the answer.
I've used the crimap-chrompic option, and played with xpic/phap a
little bit, but I really hope some people send me some information on
What is Cyrllic? [P Janssens; 22may94]
Cyrillic is a pedigree editor, with facilities for including marker data,
you can then ask it to interface with LINKAGE, i.e. it creates the input
files for MAKEPED, and runs the whole show. It is Windows based, so
input of the pedigree is very efficient. You also have a data form
associated with each individual where you can store names, DNA
numbers, etc. If you want I can email you version 1.11, to have a look
at. They also have technical support by email from Oxford. Let me
know if you are interested. I had to learn to use the program here, and
teach everyone else in the lab. Just before I started working here they
had bought it. I also had to learn the old way of preparing the datain
files for MAKEPED, and I promise you that I will never look back.
There were some serious bugs in version 1, but as far as I can tell it has
all been fixed quite nicely. There are of course some features that they
are still busy implementing, but it is an excellent interface with
What programs help me recode genetic markers? [dcurtis;20Jul94]
If anyone's interested, DOLINK can downcode alleles automatically.
I'm not sure if it uses the same algorithm as Ott's, but it's described in
the documentation along with potential drawbacks. The main use of
DOLINK is to prepare files for LINKAGE etc. from a database. It's at
diamond.gene.ucl.ac.uk in /pub/packages/dcurtis. I've _nearly_ got a
version ready to run under X (current is only for DOS) + I will try to
accelerate this if there is huge public interest.
LINKAGE PACKAGE SPECIFIC INFORMATION
How do you calculate MAXHAP? [rootd;15may94]
Maxhap is the maximum possible number of haplotypes in your
analysis. You multiply together the number of alleles at each locus
used in a particular run (not all the loci in your dataset, just the loci
use). Remember that affection status counts as two alleles, regardless
of the number of liability classes.
For example, if a dataset has the following information:
affection status: 4 liability classes
Marker A: 3 alleles
marker B: 4 alleles
marker C: 5 alleles
And your run includes a linkmap run between affection-status, A, and
B, then your MAXHAP must be (at least) 2*3*4
When should you use binary coding instead of numeric allele coding?
Usually, there is no advantage to coding disease or loci as either binary
or numeric using liability classes. Generally binary coding is more
complex in that we humans have a hard time thinking that way. Some
co-dominant phenotypes lend themselves to binary coding e.g. ABO
A - 1 0 1
B - 0 1 1
O - 0 0 1
AB - 1 1 1
unk - 0 0 0
in this case, one codes the O type factor as present in all cases except
unknowns. Since one cannot distinguish AO from AA at the phenotype
level one codes both genotypes as 1 0 1, presence of A and O. In reality
O represents absence of both A and B. One can however not code that
using 0 0, since 0 0 would be an unknown.
Use of binary codes has decrease since DNA markers have come into
use, as they allow one to type an individual with respect to genotype.
More information about the Gen-link