Sequence Release Policies

AT theo at mendel.Berkeley.EDU
Wed Apr 2 20:40:03 EST 1997

To:       Arabidopsis Community
From:     The PGEC Sequencing Laboratory
Subject:  Policies Regarding Release and Annotation of the Arabidopsis
          Genomic Sequences

I want to inform the Arabidopsis community that during the course of
sequencing BAC and YAC clones of Chromosome 1, as part of the SPP
Consortium we have submitted to GenBank sequencing data in three phases
over the last four months.  Below are the definitions of these data:

Phase I: Phase I sequences are preliminary data obtained during shotgun
sequencing.  Phase I includes various sizes of unordered DNA contigs
after E. coli and vector sequences have been removed.  We usually submit
Phase I data when the number of contigs is less than 10.  Upon
submission, GenBank issues an accession number which is posted on our
web site.  It is also directly accessible at the AtDP.  As the
sequencing progresses and the number of contigs becomes smaller, we
update the submitted sequences.

Phase II: Phase II data are sequences that comprise one contig.  Phase
II releases (updates) are made when Phase I data reach that stage.

Phase III: Phase III data comprise the finished sequence.  Phase III
submissions represent finished BAC or YAC sequences without sequence
errors or inconsistencies.  Phase III results from extensive finishing
and editing of the genomic sequence.  Among the three phases, Phase III
is the most time consuming to obtain. After phase III has been completed
and submitted to GenBank, the annotation of the sequence is initiated.
Upon completion, the submission is then updated with its annotation.

I am happy to inform you that our first two submitted Phase III BAC
sequences, F19P19 and F7G19, have been annotated.  I want to thank the
Informatics Group of the Genome Sequencing Center at Washington
University for helping us to establish our annotation capabilities as
well as for their help in annotating our first BAC, F19P19.  The
annotation was carried out with aceDP software and it can be viewed at
our web site: (http://pgec-genome.pw.usda.gov).  In addition, the
available ESTs have been assigned to the annotated sequences.  We are
currently working with the staff of AtDP so that these annotated data
will also be accessible through their server.

We have submitted to GenBank the following sequences:

BAC  F19P19    Phase III      AC000104       Annotated
BAC  F7G19     Phase III      AC000106       Annotated
BAC  F21M12    Phase I        AC000132
BAC  F19K23    Phase II       AC000375
YAC  yUP8H12   Phase I        AC000098

Since the annotation process is laborious, Phase III data will always be
first submitted WITHOUT annotation.  As annotation progresses, they will
be updated accordingly.  The lag period will be between 1-2 weeks.  I
believe that our policy of immediate release of sequencing data is
highly important to our colleagues in the Arabidopsis community.  It
allows the community to use the data (making markers, searching for
their favorite genes, etc.) as they are generated on "the sequencing
assembly line".

Sakis Theologis
Plant Gene Expression Center
800 Buchanan Street
Albany, CA 94710
theo at mendel.berkeley.edu

