ReadSeq updated !

Don Gilbert gilbertd at sunflower.bio.indiana.edu
Thu Nov 14 00:50:35 EST 1991


* ReadSeq -- 14 Nov 91
*
* Reads and writes nucleic/protein sequences in various
* formats. Data files may have multiple sequences.

Readseq has been updated.   There have been several bug corrections
and a number of enhancements (see below).  If you are using earlier
versions (or programs you use use it), I recommend you update to
this release.

Readseq is particularly useful as it detects many sequence formats;
this detection has been improved.  Validation tests are included
so you can ensure that the current program has compiled and is working
properly.

If you use it with either GCG format or Gary Olsen VMS sequence editor
format, you should definitely update your copy:
  :( Previous versions delete bases from every Olsen print format file.
  :( Previous versions can duplicate the bases in the last line of a GCG
    format file.  This will occur when the GCG format file was previously
    _written_ by readseq, then read in a second time.  GCG format files
    written by GCG programs were not subject to this flaw.

This program is available thru anonymous ftp, in this manner:
  my_computer> ftp  ftp.bio.indiana.edu  (or IP address 129.79.224.25)
    username:  anonymous
    password:  my_username at my_computer
  ftp> cd molbio/readseq
  ftp> get readseq.shar
  ftp> bye

readseq.shar is a Unix shell archive of the readseq files.
This file can be editted by any text editor to reconstitute the
original files, for those who do not have a Unix system or an
Unshar program.  Read the top of this .shar file for further
instructions.

There are also pre-compiled executables for the following computers:
Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax,
Macintosh. Use binary ftp to transfer these, except Macintosh.  The
Mac version is just the command-line program in a window, not very
handy.

File conversions handled by readseq:
       1. IG/Stanford            8. Pearson/Fasta
       2. GenBank/GB             9. Zuker
       3. NBRF/PIR              10. Olsen (in only)
       4. EMBL                  11. Phylip3.4/Phylip (out only)
       5. GCG                   12. Phylip3.3/Interleaved (out only)
       6. DNAStrider            13. Plain/Raw
       7. Fitch

Recent changes:
17 Oct 91.
  * corrected bug in reading Olsen format
  (serious-deletion)
10 Nov 91.
  * corrected bug in reading some GCG format files
    (serious-last line duplicated)
  + add format name parsing (-fgenbank, -ffasta, ...)
  + Phylip v3.4 output format (== v3.2, sequential)
  + add checksum output to all forms that have document
  + skip mail headers in seq file
  + add pipe for standard input == seq file (with -p)
  * fold in parts of MacApp Seq object
  * strengthen format detection
  * clarify program structure
  * remove fixed sequence size limit (now dynamic, sizeof memory)
  * check and fold in accumulated bug reports:
  *   Now ANSI-C fopen(..,"w") & check open failure
  *   Define -DFIXTOUPPER for nonANSI C libraries that mess
      up toupper/tolower
  = No command-line changes; callers of readseq main() should be okay
  - ureadseq.h functions have changed; client programs need to note.
  + added Unix and VMS Make scripts, including validation tests

This program may be freely copied and used by anyone.
Developers are encourged to incorporate parts in their
programs, rather than devise their own private sequence
format.

This should compile and run with any ANSI C compiler.
Please advise me of any bugs, additions or corrections.
                                  -- Don
-- 
Don Gilbert                                     gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405




More information about the Bio-soft mailing list