BCM Genome Center YAC Database online
Robert Cottingham
bwc at bcm.tmc.edu
Mon Oct 18 14:24:40 EST 1993
The Baylor College of Medicine (BCM) Human Genome Center has as one of its
activities the screening of YAC libraries. Primarily this work has been
done on the original CEPH YAC library. To date about 1000 yacs have been
screened. Of these about one-third were screened for labs outside of BCM.
This data is regularly submitted to GDB, however we are frequently asked
questions not easily answered by GDB such as which yacs are positive for a
particular probe.
Answers to questions like this can help others avoid duplicating work we
have already done. So we have setup a new directory on our ftp/gopher
service which can help answer such questions. Information is available
on yacs and contigs built from these yacs. The data is presented as
simple flat files which can be analyzed directly, or loaded as tab-delimited
data into a variety of database systems. Further explanation is given in
the README included below.
-Bob
------------------------------------------------------------
Bob Cottingham Phone: 713/798-4275
Cell Biology & Human Genome Center Fax: 798-5386
Baylor College of Medicine Email: bwc at bcm.tmc.edu
Houston, TX 77030
------------------------------------------------------------
INTRODUCTION
------------
The file yaclist contains the latest, publically released list of yacs
which have been screened in the Baylor College of Medicine Human
Genome Center's YAC Screening Lab. The yaclab provides a screening
service primarily for researchers within the Genome Center, but has
also fulfilled many requests from outside the Center. Researchers
requesting a screen are allowed to specify that the data will not be
released publically for 6 months according to the NIH/DOE guidelines.
After 6 months the data is made publically available.
Frequently researchers outside the Center ask for information about
yacs screened here. The data from the yaclab is regularly submitted to
GDB, so we have in the past suggested that those interested obtain the
information from there. However, in an effort to make the information
more accessible, we are now providing this list which answers the
most common questions:
1. What yacs are positive for a particular probe?
2. What yacs map to a particular region?
3. Who can I contact about a particular probe/yac?
In addition, the file ctglist contains the list of contigs created with
the yaclist data using the program Segmap [1]. This file can be used to
answer questions like:
4. Which yacs are in a common contig?
5. What contigs exist in a particular chromosome?
HOW TO OBTAIN A COPY OF YACLIST or CTGLIST
------------------------------------------
To obtain a copy of the yaclist follow these directions as appropriate
for your machine. Using ftp,
ftp gc.bcm.tmc.edu connect to ftp server
login: anonymous login as anonymous
password:<your email address> use your email address as the password
cd yac go to the yac directory
ls lists the files in the directory
get yaclist (or ctglist) retrieves yaclist from the directory
bye logs off the ftp connection
Using gopher,
Host: gc.bcm.tmc.edu
Port: 70
Go into the yac directory and fetch the file of interest.
FILE STRUCTURE
--------------
The file yaclist is in a simple tab-delimited ASCII text file format.
As such it is easy to manipulate with various text manipulation tools,
editors and database management systems. The data for a library
screen using one probe is provided on each line. The fields within a
line are:
Locus name as given in GDB
Primer Name 1 the first primer sequence name
Primer Name 2 the second primer sequence name
Primer Sequence 1 the first primer sequence
Primer Sequence 2 the second primer sequence
Band Location chromosomal map position
Contact Person the Principle Investigator who requested the screen
Institution BCM - Baylor College of Medicine
YAC/Library each positive yac and its library separated
by a space. The yac/lib field usually
contains repeated entries separated by spaces
for each of the positive yacs.
The file ctglist is also tab delimited and can be treated in the same
manner as the yaclist. The Primer Pair and YAC fields are space delimited
between items in the list. Fields within a line are:
Contig Name given by Segmap - based on most frequent primer pair
in contig.
Chromosome Number
Start Band Chromosome band where contig begins
End Band Chromosome band where contig ends
Contig Size given by Segmap - based on linear programming
algorithms used in contig construction.
Primer Pair Primer pair from yaclist which pulls yacs to make
contig. The list may contain multiple entries
separated by spaces.
YAC Yac which was pulled by primer pair and with other
yacs form contig. The list will contain multiple
entries separated by spaces.
HOW TO USE
----------
Once you have obtained a copy of the yaclist (see above), it is
possible to answer the common questions. For instance, to answer
the question:
Which yacs contain the locus D17S29?
one can....
grep D17S29 yaclist (on the unix command line)
or
using emacs, search for D17S29
or
using vi, search for /D17S29/
to find the line:
D17S29 2812 2813
TCTTCATCCCTACGTATCACTAGGCC CACCCCATTCTCCGTCTGTCCCCTTGC
17p11.2. Jim Lupski BCM
A25H9 ST. LOUIS 411A10 CEPH A25H9 ST. LOUIS
so the answer to the above question is the yacs:
A25H9 in the ST. LOUIS library
and 411A10 in the CEPH library
Another question could be:
What sequences have pulled YAC A101D4?
using the same search techniques as above, for instance:
grep A101D4 yaclist
returns the two lines:
D3S601 ML7 ML8
GTTGGCTATGGGTAGAATTGG CAGGGTAGCCTTGATCTAAGT
3p25 Michael Lerman BCM
A101D4 ST. LOUIS A80G10 ST. LOUIS B62H5 ST. LOUIS
and
D3S601 P5-1 P5-2
ATCTATTGACAGGGTGCTCT ACATCCAGTGGCTGACGTGT
3p25 Michael Lerman BCM
A101D4 ST. LOUIS A289A6 ST. LOUIS A80G10 ST. LOUIS B62H5 ST. LOUIS
so the answer to this question is the two primer sequences:
GTTGGCTATGGGTAGAATTGG CAGGGTAGCCTTGATCTAAGT
and
ATCTATTGACAGGGTGCTCT ACATCCAGTGGCTGACGTGT
are known to be positive for YAC A101D4.
A final example question. Suppose you are interested in what yacs
have been pulled in chromosome band 6p21. Again using the same
search techniques, for instance:
grep 6p21 yaclist
to find the line:
TCTE1 B10c B10d
TCTGACAGTTCCGGAGTGCA AGAGCCTGGTCTCACAAGAG
6p21 Huda Zoghbi BCM
A149E2 ST. LOUIS A149E3 ST. LOUIS A150A5 ST. LOUIS
so the known yacs in 6p21 are A149E2, A149E3 and A150A5 all in the
St. Louis library.
Using the ctglist, and the techniques explained above, you can do the
following:
grep A149E2 (a yac mentioned in the previous example)
to find the line:
B10c/B10d 6 p21.3 p21.1 47 B10c/B10d
A149E2 A149E3 A150A5
so A149E2 is in a contig named B10c/B10d which is approximately 47kbps long.
The contig is formed using the yacs A149E2 A149E3 A150A5 and the sts B10c/B10d
and is located between p21.3 and p21.1 on chromosome 6.
These are some of the simple questions that can be answered with these
data files. The data contained in these files can be imported to any
database program that has a feature to import ascii files. The fields
are tab delimited, which is the standard for most databases.
CONTACTS
--------
If you have any questions, problems, comments regarding the YAC list
or lab services, email yaclab at bcm.tmc.edu to contact us.
REFERENCES
----------
[1] C. Magness, Y. Xu, and P. Green. SEGMAP -- A program for computing and
displaying YAC-based STS-content maps. Washington University School of
Medicine, 1993.
More information about the Bionews
mailing list