Use of Sequence Retrieval System (SRS) with LISTA

Reinhard Doelz doelz at comp.bioz.unibas.ch
Wed Dec 15 06:02:56 EST 1993


Dear colleaguues, 

the following shall demonstrate to you the use of the Sequence Retrieval
System (SRS) software (Thure Etzold, Heidelberg) with the LISTA database
( Patrick  Linder(1),  Reinhard Doelz  (2),  Marie-Odile Mosse(3),  Jaga
Lazowska(3) and Piotr P. Slonimski(3); 1 Dept. of Microbiology,  Biozen-
trum,Klingelbergstr. 70,4056 Basel,Switzerland; 2 Biocomputing,  Biozen-
trum, Klingelbergstr. 70,4056 Basel,Switzerland; 3  Centre de  Genetique
Moleculaire,Laboratoire propre du CNRS associe a  l'Universite Pierre et
Marie Curie,F-91190 Gif sur Yvette,France). 

Prerequisites: 

(1) Get your software manager to install SRS, from ftp.embl-heidelberg.de
in UNIX or VMS operating systems. 
(2) Get your software manager to install all necessary databases and in-
dices as described in the manuals, including the LISTA database. 
(3) start SRS. 

Benefits: 
Walk around databases, sequences, links, and gather answers to questions 
which you didn't dare asking before (sorry, that was PR). 


_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+


EXAMPLE 1

Let us assume you want to know all occurences of TIF1 homologues in the
EMBL database. 

The approach is to search TIF1 in LISTA, go to the Homology database of 
LISTA on Protein level, and look up the resulting entries in EMBL after 
having filtered out all yeast entries. 


First, you do a search for TIF1 in SRS: 

                                                                          
 +-------------------------------------------------------------------------+
 |            ID [I]: TIF1                                                 |
 |       Synonym [H]:                                                      |
 |    Definition [D]:                                                      |
 |                     separate keys by & (AND), | (OR), or ! (AND NOT)    |
 |                                                                         |
 | query (set) name [Q]: GE1                     select library(s) [S]: @  |
 | connect fields by AND (1) or OR (2) [X]: 1                              |
 |                                 do =>   ([Do])    abort =>   ([F10])    |
 +-------------------------------------------------------------------------+
                                                                          

Then, you link the hit through a HOMOLOGY database to EMBL. This HOMOLOGY
database is nothing else but a systematic BLAST search (BLAST from  the 
NCBI, Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990). ) versus a non-redundant database. There 
are two flavours: LISTAHOP (which is on PROTEIN level) and LISTAHON (which
is on DNA level). 

   1. query: GE1, set of type "Entry-ID", expr:  ([GE-ID: TIF1*])         
                                                                          
+--------------------------------------------------------------------------+
|                                                                          |
| query (set) name [Q]: X1                          query expression:      |
| GE1 > LISTAHOP > EMBL                                                 
|                                                                          |
|                                    do =>   ([Do])    abort =>   ([F10])  |
|                                                                          |
+--------------------------------------------------------------------------+

 
The next is slightly tricky but gives a good impression on the power 
of the SRS system. You could do it less elegantly but much simpler. 
This step is to filter all yeast entries from the previous hit. 
 


  1. query: GE1, set of type "Entry-ID", expr:  ([GE-ID: TIF1*])         
  2. query: X1, set of type "Seq-ID", expr:  GE1 > LISTAHOP > EMBL       
                                                                          
                                                                          
                                                                            
+--------------------------------------------------------------------------+
|                                                                          |
| query (set) name [Q]: X2                          query expression:      |
| X1 ! [EMBL-ORG:SACC*]                                                    |
|                                                                          |
|                                    do =>   ([Do])    abort =>   ([F10])  |
|                                                                          |
+--------------------------------------------------------------------------+
                                                                            
so what we do here is that we tell the system to use the results of the 
previous query, but filter out all saccharomyces entries. 

This query got 37 answers, results example below: 

   1. entry: EMBL:CEEIF4AM                                                
DE   C.elegans mRNA for eIF-4A homologue                                  
   2. entry: EMBL:DMEIF4A                                                 
DE   D.melanogaster gene for eIF-4A eukaryotic translation initiation     
DE   factor                                                                 
   3. entry: EMBL:DMHELI                                                      
DE   D.melanogaster RNA helicase mRNA, complete cds.                          
   4. entry: EMBL:DMRM62RH                                                    
DE   Drosophila melanogaster RM62 mRNA for novel RNA helicase                 
   5. entry: EMBL:DMRNAHEL                                                    
DE   Drosophila melanogaster RNA helicase gene, complete cds.                 
   6. entry: EMBL:DMVASA                                               
DE   D.melanogaster antigen Mab46F11 (vasa) mRNA, complete cds.               
   7. entry: EMBL:DMVASA2                                                   
DE   Drosophila melanogaster vasa gene segment 2 (exons 3 to 7)           
   8. entry: EMBL:ATTIF4A1                                                
DE   A.thaliana mRNA for eukaryotic translation initiation factor 4A-1    
   9. entry: EMBL:ATTIF4A2                                                
...


_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_

EXAMPLE 2 

We would like to know which DEAD motif proteins are known in the LISTA 
database. 

The approach is to start with PROSITE and simply map this to LISTA.


 [G] General  [O] SetOptions  [U] Query  [H] Help     
                             +--------------------+                       
                             | [Y] RepeatQuery    |                       
                             | [X] Expression     |                       
                             | [Q] QueryReport    |                       
                             | [W]   MakeWild     |                       
                             | [G] Genes          |                       
                             | [B] GeneHomologies |                       
                             | [S] Sequence       |+----------------+     
                             | [R] SeqRelated...  || [P]>PROSITE    |     
                             | [L] Literature     || [D] PROSITEDOC |     
                             | [H] SearchLists    || [B] BLOCKS     |     
                             +--------------------+| [U] EPD        |     
                                                   | [E] ECD        |     
                                                   | [Z] ENZYME     |     
                                                   | [R] REBASE     |     
                                                   +----------------+     
                                                                          
                                                        
                                                                           
                                                                          
 +-----------------------------------------------------------------------+
 |            ID [I]: DEAD                                               |
 |     Accession [N]:                                                    |
 |    Definition [D]:                                                    |
 |                   separate keys by & (AND), | (OR), or ! (AND NOT)    |
 |                                                                       |
 |  query (set) name [Q]: Q1                                             |
 |  connect fields by AND (1) or OR (2) [X]: 1                           |
 |                               do =>   ([Do])    abort =>   ([F10])    |
 +-----------------------------------------------------------------------+
                                                                          
                 
We find one entry there. Then we select the 'link' option and the screen 
looks like 


[G] General  [O] EntryOptions  [U] Query  [H] Help     
   1. entry: +-----------------+LICASE                                    
DE   DEAD-box| [E] ShowEntry   |ndent helicases signature.                
             | [Q] Quit        |                                          
             | [D] DeleteEntry |                                          
             | [C] CopyEntry   |+--------------------+ +-----------+                     
             | [L] LinkEntry   || [G]>Genes          | | [L]>LISTA |                     
             | [S] SearchBuff  || [B] GeneHomologies | +-----------+                      
             | [H] SaveBuff    || [S] Sequence       |                      
             | [X] o TextData  || [R] SeqRelated...  |                      
             | [Y] o Data      || [L] Literature     |                      
             | [Z] o Text      || [H] SearchLists    |                      
             +-----------------++--------------------+                      
                                                                            
so if we go for G (genes) and L (LISTA) we see at the bottom of the screen 

libraries  - Mapped to "SWISSPROT" -> 33 entries                                
libraries  - Mapped to "EMBL" -> 38 entries     
libraries  - Mapped to "LISTA" -> 14 entries

so we go from PROSITE automatically via SWISSPROT and EMBL to LISTA. 
The screen looks now like



 [G] General  [O] LinkOptions  [H] Help                        
  1. entry: LISTA:DBP1                                                   
RL   MOL. MICROBIOL. 5:805-812(1991).                                     
   2. entry: LISTA:DBP2                                                   
RL   MOL. CELL. BIOL. 11:1326-1333(1991).                                 
   3. entry: LISTA:DED1                                                     
RL   J. MOL. BIOL. 152:553-568(1981).                                       
RL   NATURE 349:715-717(1991).                                              
   4. entry: LISTA:DRS1                                                     
RL   PROC. NATL. ACAD. USA 89:11131-11135(1992).                            
   5. entry: LISTA:HIS3                                                     
RL   J. MOL. BIOL. 152:553-568(1981).                                       
   6. entry: LISTA:PET56                                                    
RL   J. MOL. BIOL. 152:553-568(1981).                                       
   7. entry: LISTA:PRP5                                                     
RL   PNAS 87:4236-4240(1990).                                             
   8. entry: LISTA:PRP28                                                  
RL   GENES DEV. 5:629-641(1991).                                          
 Navigation mode - coming from "PROSITE:DEAD_ATP_HELICASE", depth: 1            




+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+


Example 3: 
We have done a search for calmodulin and would like to know whether
the protein is also known in LISTA. If so, we need the homologies of this 
entry on DNA level, and we want to align all protein sequences of this
search. The result will be a result from a TFASTA run. 

The approach is to search calmodulin in SWISSPROT, map this to LISTA, and
look up the entry in the LISTAHON nucleotide database. If everything is 
installed as described, we could later on use TFASTA as implemented in the 
GCG package (GCG from GCG INc., Madison) to seacrh the result. 

We start the search with selecting SWISSPROT and searching calmodulin, 


 +-------------------------------------------------------------------------+
 |            ID [I]:                                                      |
 |     Accession [N]:                                                      |
 |    Definition [D]: CALMODULIN                                           |
 |      Keywords [K]:          +-----------------+                         |
 |      Organism [O]:          | [S]   SWISSPROT |                         |
 |       Authors [A]:          | [P]   PIR       |                         |
 |         Title [T]:          | [E]>  EMBL      |                         |
 |     Reference [R]:          | [F]   EMBL_NEW  |                         |
 |       Comment [C]:          | [H]   GB_NEW    |                         |
 |      Features [F]:          | [N]   NRL3D     |                         |
 |                     separate+-----------------+ (OR), or ! (AND NOT)    |
 |                                                                         |
 | query (set) name [Q]: SQ1                     select library(s) [S]: @  |
 | connect fields by AND (1) or OR (2) [X]: 1                              |
 |                                 do =>   ([Do])    abort =>   ([F10])    |
 +-------------------------------------------------------------------------+
 


Next, we map the result to LISTA and inspect the result: 
  2. query: X1, set of type "Entry-ID", expr:  SQ1 > LISTA    

screen looks like above, and we see 
             
 libraries  - Mapped to "EMBL" -> 117 entries                                   
 libraries  - Mapped to "LISTA" -> 5 entries 
...done - 5 entries written to set "X1"    

   1. entry: LISTA:CMD1                                                   
RL   CELL 47:423-431(1986).                                                 
   2. entry: LISTA:CMK1                                                     
RL   EMBO J. 10:1511-1522(1991).                                            
RL   J. BIOL. CHEM. 266:12784-12794(1991).                                  
   3. entry: LISTA:CMK2                                                        
RL   EMBO J. 10:1511-1522(1991).                                              
RL   J. BIOL. CHEM. 266:12784-12794(1991).                                    
   4. entry: LISTA:CMP1                                                       
RL   EUR. J. BIOCHEM. 204:713-723(1992).                                      
RL   MOL. GEN. GENET. 227:52-59(1991).                                        
RL   PNAS 88:7376-7380(1991).                                                 
   5. entry: LISTA:CMP2                                                       
RL   MOL. GEN. GENET. 227:52-59(1991).                                      
RL   PNAS 88:7376-7380(1991).                                               

This doesn't help much. Therefore, we go back with these 
to SWISSPROT. Note that this is a very useful thing here; 
as we can get the SWISSPROT description of LISTA with a single
operation: 

   1. entry: SWISSPROT:CALM_YEAST                                         
DE   CALMODULIN.                                                            
GN   CMD1.                                                                  
   2. entry: SWISSPROT:KCC1_YEAST                                           
DE   CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE TYPE I (EC 2.7.1.123).     
GN   CMK1.                                                                        
   3. entry: SWISSPROT:KCC2_YEAST                                             
DE   CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE TYPE II (EC 2.7.1.123).      
GN   CMK2.                                                                    
   4. entry: SWISSPROT:P2B1_YEAST                                             
DE   PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A1 (EC 3.1.3.16) (CALCINEURIN   
DE   A1) (CALMODULIN-BINDING PROTEIN 1).                                      
GN   CNA1 OR CMP1.                                                            
   5. entry: SWISSPROT:P2B2_YEAST                                           
DE   PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A2 (EC 3.1.3.16) (CALCINEURIN 
DE   A2) (CALMODULIN-BINDING PROTEIN 2).                                    
GN   CNA2 OR CMP2.                                                          

The 'real thing we need is CMD1, as this seems to be it. 
We briefly check the entry in MEDLINE; 

 [G] General  [O] EntryOptions  [U] Query  [H] Help                                
   1. entry: +-----------------+T                                         
DE   CALMODUL| [E] ShowEntry   |                                            
GN   CMD1.   | [Q] Quit        |                                            
   2. entry: | [D] DeleteEntry |T                                           
DE   CALCIUM/| [C] CopyEntry   |+--------------------+I (EC 2.7.1.123).     
GN   CMK1.   | [L] LinkEntry   || [G] Genes          |                             
   3. entry: | [S] SearchBuff  || [B] GeneHomologies |                        
DE   CALCIUM/| [H] SaveBuff    || [S] Sequence       |II (EC 2.7.1.123).      
GN   CMK2.   | [X] o TextData  || [R] SeqRelated...  |+-------------+         
   4. entry: | [Y] o Data      || [L] Literature     || [M]>MEDLINE |         
DE   PROTEIN | [Z] o Text      || [H] SearchLists    |+-------------+NEURIN   
DE   A1) (CAL+-----------------++--------------------+                        
GN   CNA1 OR CMP1.                                                            
   5. entry: SWISSPROT:P2B2_YEAST                                           
DE   PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A2 (EC 3.1.3.16) (CALCINEURIN 
DE   A2) (CALMODULIN-BINDING PROTEIN 2).                                    
GN   CNA2 OR CMP2.                                                          


libraries  - Mapped to "MEDLINE" -> 3 entries  

 [G] General  [O] EntryOptions  [U] Query  [H] Help              
   1. entry: MEDLINE:87028234                                             
   2. entry: MEDLINE:87228267                                               
   3. entry: MEDLINE:93278279                                               

We could look at the entry now; 
UI  - 87028234                                                            
AU  - Davis TN                                                              
AU  - Urdea MS                                                                 
AU  - Masiarz FR                                                                
AU  - Thorner J                                                             
TI  - Isolation of the yeast calmodulin gene: calmodulin is an essential protein.                                                                
MH  - Amino Acid Sequence                                                     
MH  - Base Sequence                                                             
MH  - Calcium/METABOLISM                                                      
MH  - Calmodulin/*GENETICS/ISOLATION & PURIFICATION                             
MH  - DNA, Fungal/*ISOLATION & PURIFICATION                                   

... up to the abstract (if existing). 

We go back and map the one of the previously targeted five entries to 
LISTAHON and get 

   1. entry: LISTAHON:SCCMD1                                              
GN   CMD1                                                                   
HT   >fun                                                                       
HT   emb|M14760|SCCMD1 Yeast (S.cerevisiae) CMD1 gene encoding calmodulin,      
HT         complete cds. >genbank:gb|M14760|YSCCMD1 Yeast (S.cerevisiae) CMD1   
HT         gene encoding calmodulin, complete cds.                              
HT         Length = 844                                                       

Now this is nearly what we want. The real thing were to have a file 
of entry names so that we could use this in a TFASTA search. LISTAHON has 
links to EMBL, so we go to EMBL and find 31 EMBL sequences which have 
homologies to the CMD1 gene on DNA level (there were 131 on protein level 
in this case). From the set of entries in EMBL we keep the current set 

 [G] General  [O] LinkOptions  [H] Help                    
   1. entry: +------------------+                                      
DE   Candida | [E]>ShowEntry    |gene, complete cds.   
RT   "The iso| [L] LinkEntry    |ization of a calmodulin-encoding gene 
RT   (CMD1) f| [B] Back         |ngus Candida albicans";                  
RL   Gene 106| [D] DeleteEntry  |                                              
   2. entry: | [C] CopyEntry    |                                       
DE   Yeast (S| [T] SelectFields |ne encoding calmodulin, complete cds.  
RT   "Isolati| [K] KeepSet      |odulin gene: Calmodulin is an          
RT   essentia| [S] SearchBuff   |                                       
RL   Cell 47:| [H] SaveBuff     |                     
   3. entry: | [X] o TextData   |                     
DE   A.califo| [Y] o Data       |dulin                                
RT   "Structu| [Z] o Text       | the Aplysia californica Calmodulin    
RT   Gene";  +------------------+
RL   J. Mol. Biol. 216:545-553(1990).                    
   4. entry: EMBL:DDCAL                              
DE   D.discoideum calmodulin mRNA, partial cds.                       
RT   "Identification of the single gene for calmodulin in Dictyostelium    
RT   discoideum";                                      
 
and write a file of entry names: 

 [G] General  [O] SetOptions  [U] Query  [H] Help               
   1. query: SQ1, set of type "Seq-ID", expr:  ([SQ-DEF: CALMODULIN*]) 
   2. query: X1, set of type "Entry-ID", expr:  SQ1 > LISTA
   3. query: X2, set of type "Seq-ID", expr:  X1 > SWISSPROT           
   4. query: L1, set of type "Entry-ID", expr:  [SWISSPROT-ID: CALM_YEAST] ...     
   5. query: L2, set of type "Seq-ID", expr:  [LISTAHON-ID: SCCMD1] > EMBL                                                                              
                                                       
Write file of entry names - filename: L2.FIL  


This file of entry names then is searchable in a TFASTA search. We exit 
the SRS program and invoke the GCG package. Next, we specify 

% tfasta swissprot:calm_yeast @L2.FIL -default

(on VMS: 
$ TFASTA SWISSPROT:CALM_YEAST @L2.FIL /DEFAULT
) 

...  CPU time:  0:00:06
 Output File: calm_yeast.tfasta



(Peptide) TFASTA of: calm_yeast  from: 1 to: 147  December 15, 1993  11:48
...


 TO: @L2.FIL  Sequences:         31  Symbols:     26,236  Word Size: 2



The best scores are:					   frame init1 initn opt..

em_fun:sccmd1  Yeast (S.cerevisiae) CMD1 gene encoding ca...(2)  628   628   628
em_in:slcalmodu  Stylonychia lemnae calmodulin gene, comp...(3)  455   455   490
em_in:ptcam  P.tetraurelia calmodulin gene, complete cds    (1)  453   453   488
em_in:s68025  CAM=calmodulin [Paramecium tetraurelia, Gen...(3)  453   453   488
em_in:tpcalw  T.pyriformis mRNA for calmodulin              (2)  453   453   489
em_in:ttcalm  T.thermophila mRNA for calmodulin             (1)  452   452   488
em_in:accalm  A.californica mRNA for calmodulin             (2)  439   439   479
em_ro:mmcalmod  M.musculus mRNA for calmodulin              (3)  436   436   475
em_pr:hscalcbp  Human calmodulin mRNA, complete cds         (1)  436   436   475
em_ov:ggcam  Chicken calmodulin (cam) mrna                  (2)  436   436   475
em_ov:ggcalma  Chicken calmodulin mRNA, complete cds        (1)  436   436   475
em_ro:rnrcm1  R.norvegicus mRNA for calmodulin (pRCM1)      (2)  436   436   475
em_ov:xlcamb  X.laevis calmodulin gene, mrna, clone 71      (2)  436   436   475
em_pr:hscam  Human calmodulin mRNA, complete cds            (2)  436   436   475
em_ro:rncam  Rat calmodulin mRNA, complete cds              (1)  436   436   475
em_ro:rncama  Rat calmodulin mRNA, complete cds             (2)  436   436   475
em_pl:mscal1  Alfalfa cal1 mRNA for calmodulin              (3)  435   435   471
em_pl:phcalpro  Petunia hybrida CAM53 mRNA, complete cds    (2)  434   434   470
em_bb:s45905  CaM-A=calmodulin [Oryzias latipes=medaka, m...(2)  423   423   445
em_ov:olcamd  O. latipes (killifish) mRNA for calmodulin,...(2)  423   423   445
em_fun:cacmd1  Candida albicans calmodulin gene, complete...(1)  416   416   468
em_in:ddcal  D.discoideum calmodulin mRNA, partial cds      (1)  407   407   436
em_ro:rncamps  Rat calmodulin processed pseudogene, compl...(1)  292   373   410
em_pl:gmcam5  Glycine max calmodulin (SCaM-5) mRNA, compl...(3)  363   363   420
em_ro:rncamii3  R.norvegicus CaMII gene for calmodulin II...(1)  293   293   298
em_ov:ggcam3  Chicken CaM gene encoding calmodulin, exon 3  (3)  172   172   173
em_ov:ggcam4  Chicken CaM gene encoding calmodulin, exon 4  (3)  124   124   124
em_ro:rncamii3  R.norvegicus CaMII gene for calmodulin II...(2)  104   104   128
em_ov:ggcam5  Chicken CaM gene encoding calmodulin, exon 5  (1)  104   104   129
em_ro:rnrcm1  R.norvegicus mRNA for calmodulin (pRCM1)      (5)   41    41    52
em_in:accalm  A.californica mRNA for calmodulin             (3)   36    36    37
em_fun:cacmd1  Candida albicans calmodulin gene, complete...(3)   34    34    36
em_ro:rncamps  Rat calmodulin processed pseudogene, compl...(2)   32    32    44
em_bb:s45905  CaM-A=calmodulin [Oryzias latipes=medaka, m...(4)   31    31    39
em_in:s68025  CAM=calmodulin [Paramecium tetraurelia, Gen...(4)   31    31    42
em_in:ptcam  P.tetraurelia calmodulin gene, complete cds    (4)   31    31    42
em_ov:xlcamb  X.laevis calmodulin gene, mrna, clone 71      (6)   29    29    29
em_fun:sccmd1  Yeast (S.cerevisiae) CMD1 gene encoding ca...(4)   29    29    54
em_ov:ggcam2  Chicken CaM gene encoding calmodulin, exon 2  (1)   28    28    30
em_ro:rncamii3  R.norvegicus CaMII gene for calmodulin II...(3)   27    27    27



Ah well, this could then be reloaded into SRS etc etc... 




Voila - it is that simple :-) 


Regards Reinhard 
                                  
-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
                     ftp mirror at nic.switch.ch 
               -----------------------------------------




More information about the Bio-soft mailing list