IUBio

agp2ace.pl

Richard Durbin rd at sanger.ac.uk
Thu Jul 25 05:21:34 EST 2002


I wrote an agp2ace.pl script with Dan Lawson's help that converts the C.briggsae .agp
file into a .ace file.  You need an S_child tag AGP_fragment for this to work, and a
recent version of acedb that allows fragments of sequences to be aligned using SMAP.

This handles segments in both orientations and multiple segments of a sequence mapping
to different parts of the golden path, though these have to be in the same parent
sequence (chromosome for standard human .agps) - it checks that requirement.

It only knows about N, F and W fragment types for now.  I don't know what else there
is.

I have not tried it on a human or mouse .agp file - would be interesting.

This should get put into wtools/ in the release.

Richard

Here it is:

#!/usr/local/bin/perl
#
# agp2ace.pl
#
# Usage: perl agp2ace.pl < agpfile > acefile
#
# Richard Durbin 020712
#
# adapted from agp2superlink.pl prototype from dl1, with help from dl1

use strict;

# Strategy is to pass once through the input keeping track for each
# fragment of which parent sequence (e.g. chromosome) it belongs to, and 
# the min and max values in that.  We also save all informative lines in
# a cache @lines.  The second pass, through @lines, creates output.
#
# Note that current acedb SMAP requires that each fragment can only
# contribute to one parent, and I suspect that if it contributes on
# both strands there will be trouble....

my (%parent, %xmin, %xmax) ;	# parent contig, min and max in parent
my @lines ;			# cache for useful input lines

while (<>) {
    chomp ;
    my ($seq,$x1,$x2,$count,$type,$frag,$y1,$y2,$strand) = split ;
    ($type eq 'N') && next ;	# ignore N lines
    ($type eq 'W' || $type eq 'F') || die "Bad type in line: $_\n" ;
    push (@lines, $_) ;
    if ($strand eq '-') {
	my $tmp = $y1 ; $y1 = $y2 ; $y2 = $tmp ;
    }
    if (!$parent{$frag}) {
	$parent{$frag} = $seq ;
	$xmin{$frag} = $x1 ;
	$xmax{$frag} = $x2 ;
    } elsif ($parent{$frag} ne $seq) {
	die "Double parent for fragment $frag : $parent{$frag} and $seq\n" ;
    } else {
	if ($x1 < $xmin{$frag}) { $xmin{$frag} = $x1 ; }
	if ($x2 > $xmax{$frag}) { $xmax{$frag} = $x2 ; }
    }
}

my $supercontig = "" ;
while ($_ = shift @lines) {
    my ($seq,$x1,$x2,$count,$type,$frag,$y1,$y2,$strand) = split ;
    ($seq eq $parent{$frag}) || die "Bad line $_\n" ;
    if ($seq ne $supercontig) {
        print  "\n\n// New supercontig $seq\n" ;
	$supercontig = $seq ;
        print "\nSequence : \"$supercontig\"\n" ;
    }
    my $len = $y2-$y1+1 ;
    if ($strand eq '+') {
	print "AGP_fragment $frag $xmin{$frag} $xmax{$frag} Align $x1 $y1 $len\n" ;
    } else {
	print "AGP_fragment $frag $xmax{$frag} $xmin{$frag} Align $x2 $y1 $len\n" ;
    }
}

print "\n\n// end of file\n" ;

# end of file
---





More information about the Acedb mailing list

Send comments to us at biosci-help [At] net.bio.net