I wrote an agp2ace.pl script with Dan Lawson's help that converts the C.briggsae .agp
file into a .ace file. You need an S_child tag AGP_fragment for this to work, and a
recent version of acedb that allows fragments of sequences to be aligned using SMAP.
This handles segments in both orientations and multiple segments of a sequence mapping
to different parts of the golden path, though these have to be in the same parent
sequence (chromosome for standard human .agps) - it checks that requirement.
It only knows about N, F and W fragment types for now. I don't know what else there
is.
I have not tried it on a human or mouse .agp file - would be interesting.
This should get put into wtools/ in the release.
Richard
Here it is:
#!/usr/local/bin/perl
#
# agp2ace.pl
#
# Usage: perl agp2ace.pl < agpfile > acefile
#
# Richard Durbin 020712
#
# adapted from agp2superlink.pl prototype from dl1, with help from dl1
use strict;
# Strategy is to pass once through the input keeping track for each
# fragment of which parent sequence (e.g. chromosome) it belongs to, and
# the min and max values in that. We also save all informative lines in
# a cache @lines. The second pass, through @lines, creates output.
#
# Note that current acedb SMAP requires that each fragment can only
# contribute to one parent, and I suspect that if it contributes on
# both strands there will be trouble....
my (%parent, %xmin, %xmax) ; # parent contig, min and max in parent
my @lines ; # cache for useful input lines
while (<>) {
chomp ;
my ($seq,$x1,$x2,$count,$type,$frag,$y1,$y2,$strand) = split ;
($type eq 'N') && next ; # ignore N lines
($type eq 'W' || $type eq 'F') || die "Bad type in line: $_\n" ;
push (@lines, $_) ;
if ($strand eq '-') {
my $tmp = $y1 ; $y1 = $y2 ; $y2 = $tmp ;
}
if (!$parent{$frag}) {
$parent{$frag} = $seq ;
$xmin{$frag} = $x1 ;
$xmax{$frag} = $x2 ;
} elsif ($parent{$frag} ne $seq) {
die "Double parent for fragment $frag : $parent{$frag} and $seq\n" ;
} else {
if ($x1 < $xmin{$frag}) { $xmin{$frag} = $x1 ; }
if ($x2 > $xmax{$frag}) { $xmax{$frag} = $x2 ; }
}
}
my $supercontig = "" ;
while ($_ = shift @lines) {
my ($seq,$x1,$x2,$count,$type,$frag,$y1,$y2,$strand) = split ;
($seq eq $parent{$frag}) || die "Bad line $_\n" ;
if ($seq ne $supercontig) {
print "\n\n// New supercontig $seq\n" ;
$supercontig = $seq ;
print "\nSequence : \"$supercontig\"\n" ;
}
my $len = $y2-$y1+1 ;
if ($strand eq '+') {
print "AGP_fragment $frag $xmin{$frag} $xmax{$frag} Align $x1 $y1 $len\n" ;
} else {
print "AGP_fragment $frag $xmax{$frag} $xmin{$frag} Align $x2 $y1 $len\n" ;
}
}
print "\n\n// end of file\n" ;
# end of file
---