Duplicate use of entry codes

Thu Aug 27 11:54:00 EST 1992

Summary of this mail:

I am asking permisssion to redistribute an amended version of your software
CREATEEMBL.FOR and CREATEGB.FOR. In fact I would be rather pleased if
you would adopt it yourself, as I think others could benefit.

Maybe its a bit late now - it would have been superb when GCG has a 100KB
limit, but perhaps the current limits will be broken before too long.


Cary O'Donnell


I asked "Jack A.M. Leunissen -- CAOS/CAMM Center" <JACKL at nl.KUN.CAOS.CAOS>
the following:

>> Have you an agreement with NBRF for the redistribution of their s/w?
>> If so - I have made some amendments of my own that I would like to be made
>> available. Who do I ask? Should I put it on EMBL file-server, or via you?

He replied:

>I asked NBRF for permission to modify and distribute the modified subroutines.
>You should contact David George at NBRF about that (GEORGE at GUNBRF.bitnet).
>If you do, ask him what he thinks about distributing the stuff via the file-
>server (I forgot to do so!). 
>Maybe we then can merge things on the server into one XQS-package.
The following is part of the 00README.TXT file I have prepared:

What is this software for?
Two FORTRAN source files are distributed by PIR for reformatting the EMBL
and Genbank distribution tapes: CREATEEMBL.FOR and CREATEGB.FOR.

Derivatives of those programs are included in this save set which allow the 
creation of PIR format databases with different sequence-length limits. Any 
sequences that exceed the defined limit will be split, and overlaps created. 
Most importantly the splitting of sequences is documented: both within the
database created, and in the .NAM file.

See the source code files for extended comment on how this is achieved.

Input files
The files can be the individual EMBL or Genbank divisions, or the concatenated
divisions. Daily updates are assumed to be concatenated entries, where the 
individual entries are in EMBL or Genbank format. The programs recognise the 
specific database names EMNEW, SWISSPROT, GBNEW, GBONLY, GENPEPT.

    ****    The GBONLY source file is assumed to be a set of *****
    ****          concatenated GCG-format sequences          ****

If required this restriction can be avoided by calling the database something 

The protein sequence files SWISSPROT and GENPEPT can be formatted. If the
database is given one of these names, the correct ('P1') protein identification
flag is placed in the PIR format output file. (The GENBANK350KB.FOR program 
has NOT been tested with a GenPept data file).

Output files
..REF, .SEQ, .TTL (optional)  - PIR format sequence files.
..HEADER                      - GCG version 7.0 HEADER file 
..NAM                         - Documented processing file
*DIVISION.TMP                - An alternative to the SHORTDIR.NDX file showing
                               each entry code's division. (Useful for 
                               concatenated files)

The development of this software began when the GCG software had a 100KB limit
to sequences. Since the middle of 1991, the GCG package has increased its
limit to 350KB. But maybe this software will become useful again some day....

How to change the sequence limit
The two most important limits to change are the MAXSEQ and OVERLAP parameters
in the source code of EMBL350KB.FOR and GENBANK350KB.FOR. These must be 
altered at the top of the main program, and in the subroutine WRITE_TITLES. 
An example 'Test version' is commented out.

AFRC Computing Division         JANET   : AFRC.ARCB::ODONNELL
West Common                     INTERNET: ODONNELL at ARCB.AFRC.AC.UK
Harpenden                       Tel: (+44) 582 762271 ext 229
Herts AL5 2JE                   Fax: (+44) 582 761710
U.K.                            (AFRC = Agricultural & Food Research Council)

More information about the Proteins mailing list