Consed, sequence assemble
andy law
andy.law at bbsrc.ac.uk
Fri Oct 15 03:51:09 EST 1999
In article <7u5hg0$g1l$1 at news.tamu.edu>, "Mei" <hmpeng at ppserver.tamu.edu> wrote:
> I am interested in assemble some of the EST sequences that I have downloaded
> from Entrez. So far, I am using csplit command in unix, then use a perl
> script to rename files. Finally, use a shell script to generate fake phd
> files for Consed. This approach works well if I have less than 100
> sequences, because csplit only split up to 99 files. Id like to know how
> to split and rename the fasta file according to the gi numbers in the
> definition lines when I have large number of sequences to assemble. A hint
> in how to write a perl script for this purpose will be greatly appreciated.
>
The following *should* do what I *think* you need. No guarantees
whatsoever though.
Later,
Andy Law
---------
#!/usr/bin/perl -w
#
# Andy Law 15th October 1999
#
# Do with this what you will. No restrictions. If you can make money from it,
# then good luck to you
use strict;
# Grab all the input (from STDIN) and strip any end of line characters
# Exit if nothing was supplied
# Die if the first line doesn't begin with a '>'
my (@lines) = <>; chomp(@lines);
exit 0 unless scalar (@lines);
die "First line doesn't begin with a '>'" unless $lines[0] =~ /^>/;
# For each line in turn, look for a leading '>'
# If there is one, strip out the first characters in that line
# after the >, stopping just before the first space, |, ;, : or /
# character. This is the file name.
#
# Open a file with that name for writing into.
#
# Note that this will overwrite previous versions with the same name (
# as identified by our method above. You could get smart here by looking
# for the existence of the file and adding a counter until you found an
# 'empty slot'
#
# Write the contents of the line into the file we just opened
my $line;
foreach $line (@lines) {
if ($line =~ /^>/) {
my $seqname = $line;
$seqname =~ s/^> *([^ |;:\/]+).*/$1/;
open OUTFILE, ">$seqname" or die "Can't open file '$seqname' for
writing";
}
print OUTFILE $line, "\n";
}
More information about the Bio-soft
mailing list