[Protein-crystallography] Quick-and-dirty searches of both PDB and
EMDB at PDBe
Gerard DVD Kleywegt
(by gerard from xray.bmc.uu.se)
Fri Apr 1 10:12:03 EST 2011
As part of its recent winter update, the Protein Data Bank in Europe (PDBe;
http://pdbe.org/) has improved its facility that allows for tandem searches of
PDB and EMDB. It was designed to allow users to carry out many of their
day-to-day searches (without the need to fill out a complex form or learn a
special query syntax). Simply type what you are looking for, click the SEARCH
button, and we will do our best to dig up relevant information, be it in the
PDB, in EMDB or on our website.
QUICK ACCESS TO ENTRIES, SERVICES, SEQUENCES
If you go to the PDBe home page (http://pdbe.org/), you will see a Google-like
search box in the friendly green banner near the top of the page (just below
our motto, "Bringing Structure to Biology"). You can use this search box in a
number of ways:
- type a PDB code (e.g., 1cbs), and you will be taken directly to the summary
page for that entry. You can type any valid code, even if it's not in the
current release, so you can use this facility to obtain information about the
status of entries that have not been released yet (e.g., 2yd0) or entries that
are no longer in the archive (e.g., theoretical models).
- type a valid EMDB code (e.g., 1607) and you will be taken straight to the
summary page for that entry.
HINT: if, instead of being taken directly to a summary page for a certain PDB
or EMDB code, you want to actually search PDB and/or EMDB for references to
that particular code, simply enclose it in double quotes. For instance,
searching for 1mi6 will take you to the summary page for PDB entry 1mi6,
whereas searching for "1mi6" will give you a set of hits in both PDB and EMDB
that all contain a reference to 1mi6.
- type something resembling a PDBe service or resource name and chances are
that the name will be recognised and you will be taken straight to that service
or resource (e.g., autodep, emdep, pdbemotif, pdbepisa, pdbefold, pdbechem,
quips, portfolio, etc.).
- you can search the protein sequences in the PDB by entering seq: (or
sequence:) followed by a (partial) amino-acid sequence in one-letter code
(e.g., seq:GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTA). The sequence will be
compared to all protein sequences in the PDB using FastA, and the results will
be presented to you for further analysis in the PDBe sequence browser (see
Of course you can do general text-based searches of the PDB and EMDB as well -
just type one or more search terms in the box and hit the SEARCH button.
- If you type a single search term and it gives hits in the PDB, you will get a
results page with a tree structure on the left which shows in which categories
the term was found. For instance, if you look for Jones, that could be an
author, but it could also be part of the name of a molecule (e.g., Bence Jones
protein). By clicking on an appropriate branch in the tree, you select only
those entries for which the search term occurs in that data category (e.g.,
author or PDB compound).
- If you type more than one search term, only entries that contain all these
terms will be selected as hits. For instance, if you search for "kleywegt po4"
- without the quotes - you will get only one hit, 1CBQ. Note that if you
enclose your search terms in double quotes, you will only get hits that match
exactly (i.e., the complete search expression must occur somewhere in the
entry, not just all of the keywords individually). For instance, searching for
"HCV NS3 protease" yields 31 hits in the PDB if you enclose the terms in double
quotes, but 177 hits if you don't.
Note that there are two tabs on the results page - one labelled "PDB entries"
and the other "EMDB entries". If you do a search for Baumeister, you will get
14 hits in the PDB. If you click on the "EMDB entries" tab, you will find that
there are 10 hits in EMDB.
HINT: if you want the EMDB results tab to become active straightaway, preface
your search term(s) by "emdb:" (without the quotes), e.g. search for
emdb:saibil and you will immediately get the list of 56 EMDB hits.
The search results are sorted by release date by default, with the most
recently released entries at the top. This ensures that if you read an exciting
paper about new ClpC structures, a search for clpc will give you the latest
entries first. You can change the sort order and criterion with a drop-down
Each entry that is found as a hit in a search is shown in a panel that contains
useful summary information and allows you to launch various searches and
services with a single mouse-click. If you do a search for hiv-1, for example,
you will get many hits in the PDB and two dozen in EMDB:
- For each PDB hit you will see: the PDB code, a small image of the structure,
the resolution (for X-ray and EM structures), the title of the entry, a set of
PDBprints that provide at-a-glance information about the entry (see
http://pdbe.org/pdbprints). Two action buttons are also shown: "Entry summary"
and "Download PDB file" - when pressed, they will do what they promise. If you
click on "More ..." (or on "Expand all ..." at the top of the results tab) you
will see even more information, namely the release date, information about the
publication describing the structure, possible cross-references to EMDB entries
and four more action buttons ("Quick links to related PDBe services"), namely:
* "Download other files" (takes you to a download page with mmCIF files,
experimental data files, etc.),
* "Quaternary structure" (which takes you straight to the PISA results about
* "Similar structures" (which will automatically launch an SSM/PDBeFold
search of the PDB to look for structures with similar folds),
* "Motifs and sites" (which will take you to the PDBeMotif analysis of the
structure - this may not always work for very recent entries, but we are
working on solving this issue).
- For each EMDB hit you will see similar information as for the PDB hits.
Instead of a PDB file, there will be an action button to "Download header
file". If the EM map/tomogram has been released, there will also be a button to
download it. If you click "More ..." you will sometimes see "Other EMDB entries
from this publication" (if one paper describes more than one EMDB entry).
NOTE: if you search for hiv-1 today, you will note that the top 2 EMDB hits
have release dates of 29 March 2011, but the maps are not yet available for
download. This has to do with the way entries are released in practice (EMDB
and PDB use a weekly release cycle). Once the release date has arrived, an EMDB
entry will be flagged (for release) on the first Thursday following the release
date, which means the map will become available in the next weekly release
(which will be on the first Wednesday after that Thursday).
LIMITATIONS AND SEARCH TIPS
As you have seen, the Google-like box in the PDBe banner allows you to carry
out many standard searches quickly and accurately, but with some limitations,
- you cannot use regular expressions or operators such as NOT, AND and OR
- you cannot use wildcards (e.g., searching for "vanil*" will not give any
- if you search by author name, you will get better results if you only provide
the surname (searching for "kleywegt gj" only returns one hit, and it's not a
structure determined by that person)
- at present, there is no way of ranking the results by relevance (the search
includes PDB keywords and the PubMed abstract, both of which can lead to false
In general, the more search terms you enter, the fewer results you will get (as
they are all required to occur). Useful search terms are (combinations of) the
- surnames of authors (e.g., rossmann, sixma, allerston, akke, walse)
- names of proteins (e.g., HMG CoA reductase, Lon protease, bacteriorhodopsin)
- (parts of) species names (e.g., "plasmodium falciparum")
- common names of chemical compounds (e.g., retinol, sildenafil, nadph)
- database identifiers from EC, PubMed, UniProt, etc. (e.g., entering 20890284
will retrieve two structures that have that number as their PubMed identifier;
searching for CH60_ECOLI will retrieve 30 hits, 28 of which have that as a
UniProt identifier; searching for 22.214.171.124 will return 105 hits, 63 of which
contain a lactate dehydrogenase with that EC number)
- (part of) a valid GO term (e.g., "intracellular protein transport", "signal
sequence", "anchored to membrane")
If you want to carry out more sophisticated searches, in which you can specify
that you want to search for a term in a particular category of information
(e.g., looking for "parkinson" in an abstract rather than as an author, or
looking for "cancer" as part of a reference rather than a keyword), you can use
the PDBe advanced search facilities. An action button labelled "Advanced
search" is available between the green PDBe banner and the search results.
We welcome your comments, bug reports and feature requests on the
"quick-and-dirty" PDBe search facility. Please use the feedback button at the
top of any PDBe web page.
Gerard J. Kleywegt, PDBe, EMBL-EBI, Hinxton, UK
gerard from ebi.ac.uk ..................... pdbe.org
Secretary: Pauline Haslam pdbe_admin from ebi.ac.uk
More information about the Xtal-log