IUBio

How get a sequence into EMBL databank?

katerice at embl-heidelberg.de katerice at embl-heidelberg.de
Mon Jan 18 07:34:54 EST 1993


In article <1993Jan15.143953.1 at lure.latrobe.edu.au>, botbvk at lure.latrobe.edu.au writes:
> Dear Neters,
> 	could somebody tell me how to put a sequence into the databank?
> Where can I get the forms from? and what Email address should I send
> the sequence to? Is there a particular format for the sequence file?
>                                    Thanks you in advance
> 
> Beat von Kanel
> Department of Botany
> La Trobe University
> Bundoora Vic.
> Australia  3083
> 
> Email:botbvk at lure.latrobe.edu.au

Dear Beat,
Here are some details for submitting to the EMBL database.

Submission address is:		datasubs at embl-heidelberg.de
General address is:		datalib at embl-heidelberg.de
Update address is:		update at embl-heidelberg.de
				(for updates and notification of publication)

FTP address:			ftp.embl-heidelberg.de 
Fileserver:			netserv at embl-heidelberg.de


There are two main methods of submission. One is to use the Authorin package
(from Intelligenetics) for Mac or IBM-PC which we make available by ftp or
on our fileserver (see above). 

The other is to fill out our submission form and append the relevant nucleotide
sequence (and amino acid where relevant). I have included the submission form
below which includes directions on how best to fill it out.

We prefer to have the sequence in GCG format, but any simple format will do.

I hope this answers your questions. Please do not hesitate to contact me
further.

Regards,
	Kate
-----------------------------------------------------------------------------
 Kate Rice, EMBL                                | Post: EMBL Data Library
                                                |       European Molecular
						|          Biology Laboratory
 Internet: KateRice at EMBL-Heidelberg.DE          |       Postfach 10-2209
                                                |       D-6900 Heidelberg
 Phone:   +49-6221-387523                       |       Germany

   
                      SEQUENCE DATA SUBMISSION FORM


This form solicits the information needed for a nucleotide or amino acid
sequence database entry.  By completing and returning it to us promptly you help
us to enter your data in the database accurately and rapidly.   These data will
be shared among the following databases:  DNA Data Bank of Japan (DDBJ; Mishima,
Japan); EMBL Nucleotide Sequence Database (Heidelberg, FRG);  GenBank (Los
Alamos, NM, USA.); National Biomedical Research Foundation Protein
Identification Resource (NBRF-PIR; Washington, D.C., USA.); Martinsried
Institute for Protein Sequence Data (MIPS; Martinsried, FRG); International
Protein Information Database in Japan (JIPID; Noda, Japan) and Swiss-Prot
Protein Sequence Database (Geneva, Switzerland and Heidelberg, FRG)

Please answer all questions which apply to your data.  If you submit 2 or more
non-contiguous sequences, copy and fill out this form for each additional
sequence.  Please include in your submission any additional sequence data which
are not reported in your manuscript but which have been reliably determined (for
example, introns or flanking sequences).  When submitting nucleic acid
sequences containing protein coding regions, also include a translation
(SEPARATELY from the nucleic acid sequence). Independently sequenced peptides
receive Swiss-Prot accession numbers. Then send  (1) this form,  (2) a copy of
your manuscript (if available) and  (3) your sequence data (in machine readable
form) to the address shown below.  Information about the various ways you can
send us your data and about formats for the sequence data is given in the
following two sections. 

Thank you.


SUBMITTING DATA TO THE EMBL DATA LIBRARY

We are happy to accept data submitted in either of the following ways:  
(1) Electronic file transfer: files can be sent via Internet to:
DATASUBS at EMBL-Heidelberg.DE (ask your local network expert for help or phone
us).  Please ensure that each line in your file is not longer than 80
characters; longer lines often get truncated when they are sent.  (2) Floppy
disks: we can read Macintosh and IBM- compatible diskettes.  Please use the
'save as text only' feature of your editor to save your submission (i.e., in
ASCII format), as otherwise we might have difficulty processing it. The EMBL
Data Library can be contacted as follows: 

    EMBL Data Library Submissions      E-mail     DATASUBS at EMBL-Heidelberg.DE
    Postfach 10.2209                   Telefax    (+49) 6221 387 519
    D-6900 Heidelberg                  Telephone  (+49) 6221 387 258
    Federal Republic of Germany

When we receive your data we will assign them an accession number, which serves
as a reference that permanently identifies them in the database. We will inform
you what accession number your data have been given and we recommend that you
cite this number when referring to these data in publications. 

If your manuscript has already been accepted for publication, the accession
number can be included at the galley proof stage as a note added in proof.  So
that we can process your data and inform you of your accession number before
you receive the galley proofs, please return this form to us as soon as
possible.  We suggest that the note added in proof should read approximately as
follows:  "The nucleotide sequence data reported in this paper will appear in
the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession
number(s) ________." 


FORMATS FOR SUBMITTED DATA

We would appreciate receiving the sequence data formatted as follows:  

Each sequence should include the names of the authors.  

Each distinct sequence should be listed separately using the same number of
bases/residues per line.  The length of each sequence in bases/residues should
be clearly indicated. 

Enumeration should begin with a "1" and continue in the direction 5' to 3' (or
amino- to carboxy- terminus). 

Amino acid sequences should be listed using the one-letter code. 

Translations of protein coding regions in nucleotide sequences should be
submitted in a separate computer file from the nucleotide sequences themselves.

The code for representing the sequence characters should conform to the
IUPAC-IUB standards, which are described in:  Nucl. Acids Res. 13: 3021-3030
(1985) (for  nucleic  acids) and J. Biol. Chem. 243: 3557-3559 (1968) and Eur.
J. Biochem 5: 151- 153 (1968) (for  amino acids). 



I.  GENERAL INFORMATION
==============================================================================
Your last name                   first name                middle initials
------------------------------------------------------------------------------
Institution
------------------------------------------------------------------------------
Address


------------------------------------------------------------------------------
Computer mail address                  Telex number
------------------------------------------------------------------------------
Telephone                              Telefax number
==============================================================================
On what medium and in what format are you sending us your sequence data? 
(see instructions at the beginning of this form)
  [ ] electronic mail
  [ ] diskette
        computer:			operating system:           
	editor:                         filename:
  [ ] magnetic tape (specify format)
==============================================================================


II.  CITATION INFORMATION
==============================================================================
These data represent 
[ ]new submission  [ ]correction (Accession number:                    )
==============================================================================
These data are  [ ] published  [ ] in press  [ ] submitted  [ ] in preparation  
                [ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
title of paper

------------------------------------------------------------------------------
journal                     volume, first-last pages, year
------------------------------------------------------------------------------
Do you agree that these  data can be made  available immediately? 
  [ ] yes    no, they can be made available after:                    (date)    
Data published before the stated date will be made available on publication
==============================================================================
Does the sequence  which you are  sending with this form  include  data that 
do NOT appear in the above citation? 
  [ ] no     
  [ ] yes, from position _______ to _______  [ ] bases OR 
                                             [ ] amino acid residues 
     (If your sequence contains 2 or more such spans,  use the feature table 
     in section IV to indicate their positions) 
If so, how should these data be cited in the database?
  [ ] published  [ ] in press  [ ] submitted  [ ] in preparation  
  [ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
address (if different from that given in section I)


------------------------------------------------------------------------------
title of paper

------------------------------------------------------------------------------
journal                     volume, first-last pages, year
==============================================================================
List references to papers  and/or  database  entries which report sequences 
overlapping with that submitted here.

1st author     journal, vol., pages, year and/or database, accession number
------------------------------------------------------------------------------

------------------------------------------------------------------------------

==============================================================================


III.  DESCRIPTION OF SEQUENCED SEGMENT

Wherever possible, please use standard  nomenclature or conventions.  If  a
question  is not applicable to your sequence, answer by writing N.A. in the
appropriate space; if the information is relevant but not available,  write 
a question mark (?). 
==============================================================================
What kind of molecule did you sequence?   (check all boxes which apply)

 [ ] genomic DNA    [ ] genomic RNA   [ ] cDNA to mRNA   [ ] cDNA to genomic RNA
 [ ] organelle DNA  [ ] organelle RNA    please specify organelle: 
 [ ] tRNA           [ ] rRNA          [ ] snRNA          [ ] scRNA
 for viruses: [ ] virus  or  [ ] provirus  or  [ ] viroid    [ ] DNA or [ ] RNA 
              [ ] ds     or  [ ] ss        or  [ ] circular  [ ] enveloped
                                                         or  [ ] nonenveloped
 [ ] other nucleic acid.  please specify:

 [ ] peptide  [ ] sequence assembled by  [ ] overlap of sequenced fragments     
                                         [ ] homology with related sequence
                                         [ ] other.  please specify:

              [ ] partial:               [ ] N-terminal
                                         [ ] C-terminal  
                                         [ ] internal fragment
==============================================================================
length of sequence                 [ ] bases or  [ ] amino acids
Have you checked for vector contamination?
------------------------------------------------------------------------------
gene name(s) (e.g., lacZ) 
------------------------------------------------------------------------------
gene product name(s) (e.g., beta-D-galactosidase)
------------------------------------------------------------------------------
Enzyme Commission number (e.g., EC 3.2.1.23)
------------------------------------------------------------------------------
gene product subunit structure (e.g., hemoglobin alpha-2 beta-2)
==============================================================================
The following items refer to the original source of the  molecule you have 
sequenced. If there exists no entry in the database for your organism, please
specify further classification details.

  organism (species) (e.g., Mus musculus)             subspecies
                                                      plant cultivar
------------------------------------------------------------------------------
  strain (e.g., K12, BALB/c)                          substrain
------------------------------------------------------------------------------
  name/number of individual/isolate (e.g., patient 123; influenza virus 
  A/PR/8/34)
------------------------------------------------------------------------------
  developmental stage                        [ ] germ line   [ ] rearranged
------------------------------------------------------------------------------
  haplotype                    tissue type                cell type
------------------------------------------------------------------------------
  allele                       variant                    [ ] macronuclear
==============================================================================
The  following  items  refer  to the  immediate experimental  source of the 
submitted sequence.
  name of cell line (e.g., Hela; 3T3-L1) or plant cultivar
------------------------------------------------------------------------------
  clone library				clone(s), subclone(s)
==============================================================================
The following items refer to the  position of the submitted sequence in the 
genome.
  chromosome (or segment) name/number
------------------------------------------------------------------------------
  map position                   units:  [ ] genome %  [ ] nucleotide number
                                         [ ] other:
==============================================================================
Using single words or short phrases, describe the properties of the sequence 
in terms of: 

  -  its associated phenotype(s);
  -  the biological/enzymatic activity of its product;
  -  the general functional  classification of the gene  and/or gene product
  -  macromolecules to which the gene product can bind  (e.g., DNA, calcium, 
     other proteins);
  -  subcellular localization of the gene product;
  -  any other relevant information.
  -  homology (>100bp/30aa)
  -  tissues in which protein/mRNA is expressed


==============================================================================


IV.  FEATURES OF THE SEQUENCE

Please  list  below  the  types  and  locations of all significant  features   
experimentally  identified within the sequence.   Be sure that your sequence 
is numbered beginning with "1."  Use < or > if a feature extends beyond the
beginning or end of the indicated sequence span. 

In the column marked                   fill in

      feature          type of feature (see information below)
      from             number of first base/amino acid in the feature
      to               number of last base/amino acid in the feature
      bp               an "x" if numbering refers to position of a base pair  
                       in a nucleotide sequence 
      aa               an "x" if  numbering  refers to  position of an amino 
                       acid residue in a peptide sequence
      id               indicate  method by which the feature was identified.  
                       E  =  experimentally;  S  =  by similarity with known 
                       sequence or to an established consensus sequence; P = 
                       by similarity  to  some  other  pattern,  such  as an 
                       open reading frame
      comp             an  "x"  for a nucleotide sequence feature located on 
                       strand complementary to that reported here

Significant features include:

  -  regulatory signals (e.g., promoters, attenuators, enhancers)
  -  transcribed  regions  (e.g., mRNA, rRNA, tRNA).  (indicate reading frame 
     if start and stop codons are not present)
  -  regions  subject to  post-transcriptional  modificaton  (e.g.,  introns, 
     modified bases)
  -  translated regions
  -  extent of  signal  peptide,  prepropeptide,  propeptide,  mature peptide
  -  regions subject to post-translational modification  (e.g.,  glycosylated 
     or phosphorylated sites)
  -  other  domains/sites  of  interest  (e.g.,  extracellular  domain,  DNA-
     binding domain, active site, inhibitory site)
  -  sites involved in bonding (disulfide, thiolester, intrachain, interchain)
  -  regions of protein secondary structure  (e.g., alpha helix or beta sheet)
  -  conflicts with sequence data reported by other authors
  -  variations and polymorphisms

The first 2 lines of the table are filled in with examples.

==============================================================================
Numbering for features on submitted sequence  [ ] matches manuscript
                                              [ ] does not match manuscript
==============================================================================
             feature               from        to         bp  aa   id    comp
------------------------------------------------------------------------------
EXAMPLE     TATA box               276         282         x        S
------------------------------------------------------------------------------
EXAMPLE      exon 1                301        >382         x
==============================================================================

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

==============================================================================


Be sure to include your sequence in electronic form


E8/11.92 (Last change: 15-Dec-1992)



More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net