[Genbank-bb] INSDSeq XML for GenBank Records : Moving to Version 1.4

Mark Cavanaugh cavanaug at ncbi.nlm.nih.gov
Wed Dec 14 15:53:46 EST 2005


Greetings GenBank Users,

This topic is not directly related to GenBank Release/Update
content, or to the GenBank flatfile format, but it is still
probably appropriate for this group.

Some of you may know that DDBJ, EMBL, and GenBank have developed
an XML DTD for sequence records called INSDSeq .

The purpose of INSDSeq is to provide a near-uniform representation
for sequence records from all three participants in the International
Nucleotide Sequence Database (INSD) collaboration.

As it currently stands, INSDSeq does not provide very much
structure beyond what users are already familiar with from the
GenBank, EMBL, and DDBJ flatfile formats.

An example of a GenBank record in INSDSeq format can be viewed
at this URL:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&list_uids=146274&dopt=gbc

Although elements like INSDInterval have been defined to create
a more easily parsable structure for feature locations, overall
INSDSeq is clearly still wedded to the flatfile representation.
Nevertheless, INSDSeq is being provided with the hopes that it
will prove useful to those who bulk-process sequence data at the
flatfile-format level of detail.

Currently, NCBI services can deliver INSDSeq data that conforms
to version 1.3 of the DTD .

However, within several weeks, we expect to move to version 1.4
of INSDSeq . Here is a summary of the expected changes:

- Element INSDSeq_create-date is now optional.

- Added element INSDReference_position .

- Removed element INSDReference_medline .

- Added element INSDReference_xref (defined by INSDXref) .

- Added elements INSDFeature_operator, INSDFeature_partial5,
  and INSDFeature_partial3 to INSDFeature .

- Added element INSDInterval_iscomp to INSDInterval .

- Added element INSDInterval_interbp to INSDInterval .

- Added element INSDSeq_project, in anticipation of the flatfile
  linetype that will be introduced for Genome-Project identifiers.

We will follow-up on this announcement soon, by distributing
the DTD for INSDSeq 1.4 .

Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS





More information about the Genbankb mailing list