Release 2 notes

Sima Misra sima at fruitfly.bdgp.berkeley.edu
Wed Nov 1 12:39:09 EST 2000


ANNOTATED DROSOPHILA GENOMIC SEQUENCE
Release 2 Notes
October 2000

RELEASE 2

The annotated D. melanogaster sequence was first released on March 24,
2000, and constitutes Release 1 of the genomic sequence.  Approximately
330 of the gaps in that sequence have now been filled. Some annotations
have been corrected or added by Celera/BDGP, but no annotations have yet
been corrected by FlyBase.  Celera/BDGP recently submitted this new
annotated sequence to GenBank as Release 2.  Because FlyBase/BDGP will
continually update the sequence and annotations on approximately a six
month cycle, there will be future releases (see below for information
about RELEASE 3).  Multiple versions of sequence and annotations present
organizational challenges both to the public databases and to BDGP, and
will probably cause confusion. We will try to make it easy to distinguish
among the various releases.

For now, Release 2 is available only at the data libraries (NCBI, EBI,
DDBJ) but not on the BDGP or FlyBase web sites, and not through the
"Drosophila genome" BLAST database at the NCBI. In the future, both
Release 1 and Release 2 versions of our GadFly annotation database will be
available at the BDGP and FlyBase web sites. Users must be certain to
check the Release number of any genomic sequence or annotation.  Version
numbers appear after the accession number, for example:

Date            Release         GenBank Version
----            -------         ---------------
March 2000      Release 1       AE003452.1
October 2000    Release 2       AE003452.2

If the genomic sequence did not change between March and October, GenBank
has retained the .1 version number but changed the date to October, for
example:

Date            Release         GenBank Version
----            -------         ---------------
March 2000      Release 1       AE003650.1
October 2000    Release 2       AE003650.1

Links from FlyBase/BDGP Release 1 pages (e.g., from GadFly annotation
report pages) to accession numbers at NCBI may go to the Release 2
sequence, though they should go to Release 1 sequence.  We are working on
fixing this.  In the future, accession number links from FlyBase/BDGP
Release 1 pages will go to Release 1 sequence at the data libraries (NCBI,
EBI, DDBJ), links from FlyBase/BDGP Release 2 pages will go to Release 2
sequence, and so on. Links from FlyBase gene reports will go to the most
recent release.  You can always query at NCBI using the accession with
version number.

Release number will appear prominently at the top of each GadFly query and
report page, and also at the download sites for sequence and XML-formatted
annotations.  Please make a note of the release number you are working
with.

Because of limited resources, certain analyses (for example, the mapping
of P element insertions) performed on the Release 1 data will not be
repeated on Release 2. However, the Release 1 results will always be
accessible, and we will repeat these analyses for Release 3.

Some statistics comparing the releases:

Number of genes:        13991 in Release 1
                        13744 in Release 2
Number of peptides:     14080 in Release 1
                        14332 in Release 2
Number of unchanged peptide sequences in Release 2:     13218
          changed peptide sequences:                    748
          new transcripts:                              336
          transcripts deleted/changed name:             114

RELEASE 3

The BDGP is currently finishing the genomic sequence to high quality
(Phase 3) and FlyBase/BDGP is reannotating this finished sequence to
create Release 3, which will gradually be deposited in GenBank during
2001. Release 3 will provide improvements in annotation and sequence
quality relative to Release 2, and will include the corrections submitted
by the public in error reports.

TRANSPOSABLE ELEMENTS

As a result of the whole genome shotgun assembly, the sequence of each
transposon in Releases 1 and 2 is a consensus derived from a number of
elements of that transposon type. The extent of the consensus varies among
the transposons depending on the length of the traces that run from unique
sequence into the transposon.  The sequence is most often not the actual
sequence of the particular transposon at that location.  Users are warned
not to base too much on any analysis of these transposable element
sequences.

As we finish the sequence to high quality, we will attempt to replace
these consensus sequences with the actual sequences present at each
location in the y; cn bw sp strain.  This corrected sequence will be found
in Release 3.

QUESTIONS?

Thank you for your patience while we devise an appropriate way to manage
the problems presented by the new releases and transposons.  Please
address any questions to bdgp at fruitfly.berkeley.edu.

-Sima Misra
for FlyBase/BDGP



---






More information about the Dros mailing list