Readseq update (reformat biosequences)

Don Gilbert gilbertd at bio.indiana.edu
Wed Jun 6 05:26:04 EST 2001



Readseq: Read & reformat biosequences

Home of this package is 
http://iubio.bio.indiana.edu/soft/molbio/readseq/java/

Readseq was written originally around 1989 a component of a
sequence analysis program, but when I added a simple command-line
interface, it took on a life of its own for data conversions.
It's main contribution to bioinformatics is that it takes on the
job of guessing what your input biosequence format is, converting
it to what your software knows how to handle.

This version includes a Graphic User Interface (GUI) for those
who prefer not to learn the many command line options (which
probably includes most but hardy bioinformaticians :), or if your
workstation lacks a command-line.
 
This version includes a Common Gateway Interface (CGI) for use
from a web server.  One web server that runs it is IUBio Archive
( http://iubio.bio.indiana.edu/cgi-bin/readseq.cgi )
 
This version also supports the same old Command Line Interface
from 1992, so you can drop it in place of the older, slower,
bugger, less functional Readseq from that decade.

Formats supported are
 GenBank, EMBL, Pearson|Fasta, GCG, MSF, Clustal, NBRF,
 PIR|CODATA, ACEDB, Phylip, Plain|Raw, PAUP|NEXUS, XML,
 FlatFeat|FFF, GFF, BLAST, Pretty, SCF , DNAStrider, IG|Stanford

Programmers will find the source code here also. Anyone can use
Readseq. There are no copyright restrictions. It is in the PUBLIC
DOMAIN.

For detailed documentation, see 
http://iubio.bio.indiana.edu/soft/molbio/readseq/java/Readseq2-help.html
 

-- Don Gilbert

software at bio.indiana.edu, June 2001
Bioinformatics group, 
Biology Department & Cntr. Genomics & Bioinformatics, 
Indiana University, Bloomington, Indiana

Release notes

Readseq version 2.1.1 (29 May 2001) updates: 
  Added  -reverse option for reverse-complement of sequence
  Feature extraction of complement() locations now does reverse-complement
  Added feature subrange extractions:  
    -subrange=-1000..10    extract subrange of sequence for feature locations
    -subrange=1..end      
    -subrange=end-10..end+99  
            -1000...10 is 1000 bases upstream/before to 10 in feature,
             1..end is full feature range (default for no -subrange option)
             end-10..end+99 is 10 before end to 99 after end of feature
             only valid with -feat/-nofeat
    -extract=1000..9999   extract all features, sequence in given base range

  Added pair/unpair option for combining feature files and sequence files
      pair=1 myseq.gff myseq.fasta format=genbank 
          == means combine features + sequence in one output file
      unpair=1 myseq.genbank format=fff 
          == means output paired features, sequence files from one input

  Added ClustalW alignment, AceDB sequence formats
  Added FFF/Flat-Feature-Format (one-line DDBJ/EMBL/GenBank Feature Table)
  Added GFF/General-Feature-Format  
  Fixed EMBL to handle aminos - SwissProt format; fixed GenBank to do aminos  
  A Perl script to convert readseq source to javac compatible form is included.
  Various bug fixes; Java 1.2/3 compatibility

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu

---





More information about the Bio-soft mailing list