BCM Genome Center YAC Database online

Robert Cottingham bwc at bcm.tmc.edu
Mon Oct 18 14:24:40 EST 1993


The Baylor College of Medicine (BCM) Human Genome Center has as one of its
activities the screening of YAC libraries.  Primarily this work has been
done on the original CEPH YAC library.  To date about 1000 yacs have been 
screened.  Of these about one-third were screened for labs outside of BCM.
This data is regularly submitted to GDB, however we are frequently asked
questions not easily answered by GDB such as which yacs are positive for a 
particular probe.

Answers to questions like this can help others avoid duplicating work we
have already done.  So we have setup a new directory on our ftp/gopher
service which can help answer such questions.  Information is available
on yacs and contigs built from these yacs.  The data is presented as
simple flat files which can be analyzed directly, or loaded as tab-delimited
data into a variety of database systems.  Further explanation is given in
the README included below.

-Bob

        ------------------------------------------------------------
        Bob Cottingham                        Phone:  713/798-4275
        Cell Biology & Human Genome Center    Fax:        798-5386
        Baylor College of Medicine            Email: bwc at bcm.tmc.edu
        Houston, TX   77030
        ------------------------------------------------------------



INTRODUCTION
------------

The file yaclist contains the latest, publically released list of yacs
which have been screened in the Baylor College of Medicine Human
Genome Center's YAC Screening Lab.  The yaclab provides a screening
service primarily for researchers within the Genome Center, but has
also fulfilled many requests from outside the Center.  Researchers
requesting a screen are allowed to specify that the data will not be
released publically for 6 months according to the NIH/DOE guidelines.
After 6 months the data is made publically available.

Frequently researchers outside the Center ask for information about
yacs screened here.  The data from the yaclab is regularly submitted to
GDB, so we have in the past suggested that those interested obtain the
information from there.  However, in an effort to make the information
more accessible, we are now providing this list which answers the
most common questions:

	1. What yacs are positive for a particular probe?
	2. What yacs map to a particular region?
	3. Who can I contact about a particular probe/yac?

In addition, the file ctglist contains the list of contigs created with
the yaclist data using the program Segmap [1]. This file can be used to
answer questions like:
	
	4. Which yacs are in a common contig?
	5. What contigs exist in a particular chromosome?



HOW TO OBTAIN A COPY OF YACLIST or CTGLIST
------------------------------------------

To obtain a copy of the yaclist follow these directions as appropriate
for your machine.  Using ftp,

  ftp gc.bcm.tmc.edu		connect to ftp server
  login: anonymous		login as anonymous
  password:<your email address> use your email address as the password
  cd yac			go to the yac directory
  ls				lists the files in the directory
  get yaclist (or ctglist)	retrieves yaclist from the directory
  bye				logs off the ftp connection

Using gopher,

  Host: gc.bcm.tmc.edu
  Port: 70

Go into the yac directory and fetch the file of interest.



FILE STRUCTURE
--------------

The file yaclist is in a simple tab-delimited ASCII text file format.
As such it is easy to manipulate with various text manipulation tools,
editors and database management systems.  The data for a library
screen using one probe is provided on each line.  The fields within a
line are:

    Locus name		as given in GDB

    Primer Name 1	the first primer sequence name

    Primer Name 2	the second primer sequence name

    Primer Sequence 1	the first primer sequence

    Primer Sequence 2	the second primer sequence

    Band Location	chromosomal map position

    Contact Person	the Principle Investigator who requested the screen

    Institution		BCM - Baylor College of Medicine

    YAC/Library		each positive yac and its library separated
			by a space.  The yac/lib field usually 
			contains repeated entries separated by spaces
			for each of the positive yacs.

The file ctglist is also tab delimited and can be treated in the same
manner as the yaclist. The Primer Pair and YAC fields are space delimited
between items in the list. Fields within a line are:
    
    Contig Name		given by Segmap - based on most frequent primer pair
			in contig.
    
    Chromosome		Number

    Start Band		Chromosome band where contig begins

    End Band		Chromosome band where contig ends

    Contig Size		given by Segmap - based on linear programming 
			algorithms used in contig construction.

    Primer Pair		Primer pair from yaclist which pulls yacs to make
			contig.  The list may contain multiple entries
			separated by spaces.

    YAC			Yac which was pulled by primer pair and with other
			yacs form contig.  The list will contain multiple
			entries separated by spaces.



HOW TO USE
----------

Once you have obtained a copy of the yaclist (see above), it is
possible to answer the common questions.  For instance, to answer
the question:

	Which yacs contain the locus D17S29?

one can....
 
	grep D17S29 yaclist  (on the unix command line)
  or
	using emacs, search for D17S29
  or 
	using vi, search for /D17S29/
      
to find the line:

	D17S29	2812	2813
	TCTTCATCCCTACGTATCACTAGGCC	CACCCCATTCTCCGTCTGTCCCCTTGC
	17p11.2.	Jim Lupski	BCM
	 A25H9 ST. LOUIS 411A10 CEPH A25H9 ST. LOUIS

so the answer to the above question is the yacs:

	A25H9 in the ST. LOUIS library
  and	411A10 in the CEPH library


Another question could be:
    
	What sequences have pulled YAC A101D4?

using the same search techniques as above, for instance:

	grep A101D4 yaclist

returns the two lines:

	D3S601	ML7	ML8
	GTTGGCTATGGGTAGAATTGG	CAGGGTAGCCTTGATCTAAGT
	3p25	Michael Lerman	BCM
	 A101D4 ST. LOUIS A80G10 ST. LOUIS B62H5 ST. LOUIS
  and
	D3S601	P5-1	P5-2
	ATCTATTGACAGGGTGCTCT	ACATCCAGTGGCTGACGTGT
	3p25	Michael Lerman	BCM
	 A101D4 ST. LOUIS A289A6 ST. LOUIS A80G10 ST. LOUIS B62H5 ST. LOUIS

so the answer to this question is the two primer sequences:

	GTTGGCTATGGGTAGAATTGG   CAGGGTAGCCTTGATCTAAGT
  and 
        ATCTATTGACAGGGTGCTCT    ACATCCAGTGGCTGACGTGT

  are known to be positive for YAC A101D4.


A final example question.  Suppose you are interested in what yacs
have been pulled in chromosome band 6p21.  Again using the same
search techniques, for instance:

	grep 6p21 yaclist

to find the line:
 
	TCTE1   B10c    B10d
	TCTGACAGTTCCGGAGTGCA    AGAGCCTGGTCTCACAAGAG    
	6p21    Huda Zoghbi  BCM
	 A149E2 ST. LOUIS A149E3 ST. LOUIS A150A5 ST. LOUIS

so the known yacs in 6p21 are A149E2, A149E3 and A150A5 all in the
St. Louis library.

Using the ctglist, and the techniques explained above, you can do the
following:
	
	grep A149E2   (a yac mentioned in the previous example)

to find the line:

	B10c/B10d       6       p21.3   p21.1   47      B10c/B10d	
	A149E2 A149E3 A150A5 

so A149E2 is in a contig named B10c/B10d which is approximately 47kbps long.
The contig is formed using the yacs A149E2 A149E3 A150A5 and the sts B10c/B10d
and is located between p21.3 and p21.1 on chromosome 6.


These are some of the simple questions that can be answered with these 
data files.  The data contained in these files can be imported to any 
database program that has a feature to import ascii files.  The fields 
are tab delimited, which is the standard for most databases.



CONTACTS
--------

If you have any questions, problems, comments regarding the YAC list
or lab services, email yaclab at bcm.tmc.edu to contact us.



REFERENCES
----------

[1] C. Magness, Y. Xu, and P. Green. SEGMAP -- A program for computing and
    displaying YAC-based STS-content maps.  Washington University School of
    Medicine, 1993.



More information about the Bionews mailing list