(none)

arnold at GENOME.GENETICS.UGA.EDU arnold at GENOME.GENETICS.UGA.EDU
Mon Nov 21 10:41:00 EST 1994


	      	A fast random cost algorithm for physical mapping


PROGRAM DESCRIPTION:

        This program is to assist in constructing a physical map
        of a whole chromosome or chromosome fragment from a
        binary clone/probe hybridization matrix.  

	The theory for the physical mapping method can be
        found in

        Cuticchia, A.J., J. Arnold, and W.E. Timberlake (1992).
        The use of simulated annealing in chromosome reconstruction
        experiments based on binary scoring.  Genetics 132: 591-601

        The input data is a binary clone/probe hybridization matrix (with 
        a '1' indicating hybridization and a '0' indicating no hybridization)
        of n clones (rows) and m probes (columns).  Each row of
        the clone/probe hybridization matrix is the digital 'call number'
        of a particular clone.  When clones overlap, they tend
        to hybridize to the same probes, and their digital call numbers
        tend to be similar.  Probes (columns) link together clones
        into contiguous blocks or contigs by their shared pattern
        of clonal hybridization.  This program for ordering cloned DNA
        fragments into 'contig map' does so by permuting the rows
        of the clone/probe hybridization matrix so that clones
        with similar call numbers appear next to each other.
        DNA fragments with a high degree of overlap are expected
        to show a high degree of similarity in their profiles.
        A distance is defined between each clone by counting
        the number of differences in their digital call numbers.
        The ordering process is based on minimizing the sum of the linking
        distances between clones as a function of their ordering along the
        chromosome. 


        The algorithm has been used to map the fourth chromosome of
        Aspergillus nidulans, the whole genome of Schizosaccharomyces
        pombe genome, and a region of human chromosome 9.

	This algorithm used here to minimize the total linking distance
        is a new one called random cost algorithm. It is detailed in 

	Wang, Y., Prade, R. A., Griffith, J., Timberlake, W.E., and Arnold, J.
	(1994) A fast random cost algorithm for physical mapping. PNAS,
	91, 11094-11098


PROGRAM INPUT

	A typical batch file is given as follows:

	$set def [wang.cm.bootstrap]
	$run cost 
	cost.dat
	probe.dat
	cost.map
	rec
	0

	The first line is to set the default directory.

	The second line is to run this program

	The third line is the input file name. In the input hybridization file,
	the first line should be the number of probes, clones  and bootstrap
        run numbers. For all other lines, the first ten columns are reserved
        for the clone name, and the hybridization data should start at any
         column after 10th column. Total length of each 
	line is defined by MAX_BUFFER in the program.

	The fourth line is the probe names file name. It contains the probe
        names in the same order across the columns as in the input file.

	The fifth line is the output file name. The finally reconstructed
        physical map, the statistical confidence statistics from the
        bootstrap run and other statistics are written to this file.

	The sixth line is the name of the file that stores the total linking
        distance for each bootstrap run. This is for tracing how the program
        is running.

	The seven line is the seed for random number generator.

  
PROGRAM INPUT LIMITATIONS:

        The number of clones must be between 1 and 600 for a DEC VAX
	station 4000.

        Filenames (with directory path, if specified) must be
        no longer than 40 characters.

	If the bootstrap run number is not given in the input binary
        hybridization data file, it will be set to the default number of 1000. 


PROGRAM SPEED:

        The program assembled a physical map of 593  clones
        probed with 115 probes within less than 2 mins on a DEC
	VAX station 4000. Total time for 1000 bootstraps run will take
	30 CPU hours on this workstation.


OBTAINING THE SOFTWARE:

        The software is only distributed via
        Internet using EMAIL. Please send an EMAIL request to:

                    ARNOLD at BSCR.UGA.EDU

        if you wish copies of the program. I will EMAIL you:

        1) a C program  cost.c;

        2) this documentation file, cost.DOC;

        3) a test input file, cost.dat;

	4) a test probe name file, probe.dat

        4) an example output file, map.cost ; and

        5) a command file, cost.COM.

        This last file is what you would use to submit a batch job in
        the VAX/VMS operating system to generate cost.MAP.



SOFTWARE SUPPORT IN THE USE OF THE PROGRAMS:

        If you have questions about
        the programs, please contact Yuhong Wang currently located
        at University of Georgia:

                    wang at bscr.uga.edu
        or myself.


HARDWARE LIMITATIONS:

        The programs have been run without modification on VAXstations,
        a DECstation 3100,  a Silicon Graphics IRIS 4D70/GT workstation
        and IBM Risc 6000 workstation.



  . - - - - - - - - - - - Jonathan Arnold - - - - - - - - - - - - - - - .
  |                       Dept. of Genetics,                            |
  |                       University of Georgia                         |
  |                       Athens, Georgia 30602                         |
  | Phone:       (706) 542-1449                                         |
  | messages:    (706) 542-8000                                         |
  | FAX:         (706) 542-3910                                         |
  | Internet:    ARNOLD at BSCR.UGA.EDU                                    |
  | Alternate:   ARNOLD at BSCF.UGA.EDU                                    |
  . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - .




More information about the Bio-soft mailing list