PID html links

francis at NCBI.NLM.NIH.GOV francis at NCBI.NLM.NIH.GOV
Wed Mar 27 08:54:28 EST 1996

> From: staffan at (Staffan Bergh)
> Subject: PID html links

> having just finished the latest release of MycDB, and working on
> enhancing the webserver, I looked for a way of providing hot links to
> PID entries in the e-series (produced by EMBL), and couldn't find any.
> For the g-series (produced by NCBI) it appears that one can use the
> Entrez links using the g-number with the leading letter stripped off
> (thus for PID g886302 use:

yes, the above is correct, and the various report formats are
available.  Above you chose 'Dopt=r', and the following are available:

For protein entries, it can be : 

      'g' GenPept format 
      'r' Report format 
      'f' FASTA format 
      'a' ASN.1 format 
      'd' Entrez document summary format 
      'm' MEDLINE links 
      'p' protein neighbors 
      'n' nucleotide links 
      't' structure links 

> with db set to either n or p, which seems a bit counterintuitive, it
> returns the aa sequence in both cases ...)

Well, doing the URL you describe above does not return anything if the
db=n, but does return the report you want with db=p, because the gi
number (PID) is that of a protein.  The gi for the corresponding nucleotide
(after looking for it) is:

LOCUS       MSGDNAB     40571 bp    DNA             BCT       30-JUN-1995
DEFINITION  Mycobacterium leprae cosmid L222 DNA sequence, 27 CDS features.
NID         g886301


so the URL link to this GenBank format report would be:

And in there you would also have the lonks to all the proteins within.

>   Is access to EMBL PID entries possible, or planned? This would be
> very helpful ...

I will let one of my EMBL colleagues answer this one ...

>   Is it true that all NCBI PID entries are accessible through the
>   Entrez server in the way shown above?

yes, this is true.  The info on how to link all records in the
nucleotide, protein sequences, as well as PDB structures, Medline
references and soon graphical views of maps present in the genomes
division of Entrez are outlined in this URL:

>   Am I missing something here? Is there some information on PID on the
>   'Net, other than the brief mentions I've already found in the EMBL
> and GenBank release notes?

We have been using PID's for a few years now, but we've only recently
called them that, they really are our 'gi' number, which identify uniq
protein or DNA sequences.  They are called NID in the GenBank flat
file, eg:

NID         g886301

and follow the same rule as the PID, in that the prefix indicates 
which database issued the number (you can also see the db_xref 
page at:

which describes this a _bit_ more, ever so slightly, but a place to see
about all the other db_xref which are possible, albeit not fully used
at this time)

So the gi numbers are discussed a bit in the gbrel.txt (GenBank release
notes), but I think I have to agree with Staffan, there is not much
discussion of these in a ingle document.

I think we may have to do this soon, as they are somewhat central to
the the whole tracking we do here!

thank you for your suggestion here!

all the best,


| B.F. Francis Ouellette  
| GenBank
| francis at   

> Staffan Bergh
> Biochemistry, KTH, S-100 44 Stockholm, Sweden
> email: staffan at         + Don't let that horse eat that violin
> phone: int+46 8 790 9230              +               cried Chagall's mother
> fax: int+46 8 24 54 52                + but he kept right on painting
>                                       +             -- Lawrence Ferlinghetti
> <A HREF="">Webmaster</A> and 
> <A HREF="">MycDB maintainer</A>

More information about the Embl-db mailing list