Greetings, maize researchers.
Please see below to learn about the status of RefGen_v3. Thanks to the War=
e group (USDA-ARS and CSHL) for providing this information!
Carolyn Lawrence
__________________________
Maize B73 Reference Assembly Update & Release
The Maize Genome Sequencing Project is preparing to release a new version o=
f the maize B73 reference, designated B73 RefGen_v3. The new reference ass=
embly is improved over the current version, RefGen_v2, primarily in the inc=
lusion of new genic regions, generated from a whole genome shotgun assembly=
of 454 sequences, which fill gaps in the current BAC clone-based assembly.=
Annotation of the amended assembly results in the definition of several h=
undred new and/or improved gene models. This document describes the schedu=
le andprotocol for public release of the new assembly.
The Project will release RefGen_v3, via maizesequence.org, maizegdb.org, an=
d gramene.org, following submission and acceptance of the following data se=
ts to the International Nucleotide Sequence Database Collaboration (INSDC),=
also known as DDBJ/EMBL/GenBank:
1) B73 RefGen_v2 pseudomolecule scaffold sequence and AGP (short for =93=
A Golden Path=94, the table that specifies how the component contigs are co=
mbined to build the pseudomolecule scaffold sequences).
2) The 454 whole genome shotgun assembly that serves as components of th=
e new assembly.
3) B73 RefGen_v3 pseudomolecule scaffold sequence and AGP.
Best Practices for Supporting Reference Genomes:
Data providers, including single-organism community databases, multi-organi=
sm browsers, and NCBI, have struggled in recent years to maintain standardi=
zed and consistent representations of genome data within a given species. =
The existence of disparate sequence data,coordinate systems, and identifier=
s harms the scientific community by preventing interoperability and fractur=
ing the research literature. Forcing researchers to reconcile such differe=
nces hampers scientific progress. These problems have prompted new policie=
s amongst data providers to insist on INSDC submission as a prerequisite fo=
r hosting genome data. Examples include the Browser Genome Release Agreeme=
nt between the Ensembl, NCBI, and UCSC groups.
In addition to providing a unified source of data, submission to the INSDC =
ensures legitimacy of the assembly by application of rigorous standards. T=
he vetting process includes, among other aspects:
1) Ensuring that component contigs are already accessioned in DDBJ/EMBL/=
GenBank.
2) Screening of component contigs for non-target organism contamination.
3) Validating appropriate positioning and classification of gaps.
4) Ensuring that AGP specification agrees with pseudomolecule sequence.
5) Using standardized formatting for accurate representation of alldata =
and metadata.
The risk of not submitting to INSDC prior to release is realized when this =
validation process necessitates changes to the assembly or coordinate syste=
m, thus causing discrepancy with the released version. Experience with the=
submission of RefGen_v2, currently in use throughout the community, is ill=
ustrative of this problem. While this submission is still in process, feed=
back from validation has so far included i) contamination of sequence from =
non-maize organisms; ii) inappropriate gap placement and length representat=
ion; iii) unacceptable construction of a =93chr0=94 to represent unanchored=
scaffolds (chr0 needs to be broken up into individual scaffolds). We are =
fortunate that GenBank is making allowances for RefGen_v2 so as to maintain=
consistency of annotation coordinates with the public release already in u=
se.
Process and Status:
The flow chart illustrates process and status for submissions. For both an=
notations and AGP the process is iterative until final acceptance: test sub=
mission, feedback, revision, new test submission. While issues with v2 iden=
tified to this time have been incorporated into the preliminary AGP of v3, =
the final approved v2 AGP is critical to make a smooth submission of v3 on =
the heels of v2. Similarly, final approval of v2 annotation files will be =
important for the submission of v3 annotations, as the vast majority of gen=
es will only require adjustments of coordinates. Final approval of the 454=
whole genome shotgun assembly is also on the critical path for release of =
RefGen_v3. However, only a relatively small subset of the entire assembly =
is relevant to the v3 AGP, and NCBI is giving priority to these. These hav=
e already passed contamination screening and we do not anticipate any addit=
ional issues to what should be a straightforward submission of nucleotide s=
equence.
For a figure describing the GenBank Submission process, history, and status=
, please visit http://images.maizegdb.org/public/genbank_submission.jpg.
//////////////////////////////////////////////////////////////////////
Carolyn J. Lawrence, Ph.D.
USDA-ARS Research Geneticist
http://www.lawrencelab.org<http://www.lawrencelab.org/>
http://www.maizegdb.org<http://www.maizegdb.org/>
carolyn.lawrence from ars.usda.gov<mailto:carolyn.lawrence from ars.usda.gov>
(515) 294-4294 Office
(515) 294-5332 Lab
//////////////////////////////////////////////////////////////////////
This electronic message contains information generated by the USDA solely f=
or the intended recipients. Any unauthorized interception of this message o=
r the use or disclosure of the information it contains may violate the law =
and subject the violator to civil or criminal penalties. If you believe you=
have received this message in error, please notify the sender and delete t=
he email immediately.