Help searching databases

Reinhard Doelz doelz at comp.bioz.unibas.ch
Wed Mar 2 04:52:05 EST 1994


In article <0hQpzNO00iV7A3=HJY at andrew.cmu.edu>, "Howard M. Bomze" <hb10+ at andrew.cmu.edu> writes:
|> I need some help in searching through the sequence data bases for splice
|> junctions.  Does anybody know of any programs out there that I can Use
|> to do this?  I am trying to look at sequences next to splice junctions
|> but in the EXON.  I don't think this will be too hard but I don't know
|> much about programing.  Thanks in advance for any help.

I am not sure what you mean with 'look at' sequences but I'll show 
you how to prepare a set of sequences with the SRS system from Thure 
Etzold. 

Start SRS

[U]
[S]				(select a sequence query)
/[D] SPLICE & JUNCTION          
/[T] SPLICE & JUNCTION
/[C] SPLICE & JUNCTION          (enter search terms SPLICE _and_ JUNCTION
                                 in Description, Title and Reference)
/[S] [SPACE] [S] [E] [SPACE]    (deselect Swissprot, select EMBL database, 
                                 you might want to select more DNA databases)
/[X] 2                           combine search terms with OR 

The mask, now looks as follows: (if your terminal isn't good enough, the 
_ and | will be displayed as x and q's or similar, I'm afraid) 


_____________________________________________________________________________
 [G] General  [O] BuffOptions  [U] Query  [H] Help

 lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqSequenceqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
 x            ID [I]:                                                      x
 x     Accession [N]:                                                      x
 x    Definition [D]: SPLICE & JUNCTION                                    x
 x      Keywords [K]:                                                      x
 x      Organism [O]:                                                      x
 x       Authors [A]:                                                      x
 x         Title [T]:  SPLICE & JUNCTION                                   x
 x     Reference [R]:                                                      x
 x       Comment [C]: SPLICE & JUNCTION                                    x
 x      Features [F]: SPLICE & JUNCTION                                    x
 x                     separate keys by & (AND), | (OR), or ! (AND NOT)    x
 x                                                                         x
 x query (set) name [Q]: SQ1                     select library(s) [S]: @  x
 x connect fields by AND (1) or OR (2) [X]: 1                              x
 x                                 do =>   ([Do])    abort =>   ([F10])    x
 mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj

______________________________________________________________________________

[RETURN] to do the query. There will be about 100 entries. Next,
[O] [W]    to view the query results, my screen looks like 

  Info of Query "SQ1"

  Query Command(s):
      SQ1 = (
      [SQ-DEF: SPLICE* & JUNCTION*] |
      [SQ-TIT: SPLICE* & JUNCTION*] |
      [SQ-CC: SPLICE* & JUNCTION*])  |
      [SQ-FTS: SPLICE* & JUNCTION*] > PARENT

      86 entries from library "EMBL"
      13 entries from library "EMBL_NEW"
      10 entries from library "GENBANK"

  Created set of type "Seq-ID" has 109 members

[Q] to leave this mode. 

Next, we need to find all exons. Again, a sequence query, 

[U] [S] /[F] EXON [RETURN] [O] [W] 


  Query Command(s):
      SQ2 = (
      [S2-FTS: EXON*])

   39185 entries from library "EMBL"
    2499 entries from library "EMBL_NEW"
    1713 entries from library "GENBANK"

  Created set of type "Seq-Ft-ID" has 43397 members

[Q] to leave this query. 

Next, we combine the two by mapping the first on the second. 

[U] [X] SQ1 > SQ2 [RETURN] 

In SQ3, you'll find about 200 entries which are DNA exon sequences, 
described in other sections of the annotation with SPLICE and JUNCTION. 

You may want to copy this into your directory with [O] [C] (copy 
set output). Now as Exons might be rather short we set a minimum limit to 
10 base pairs, and the screen will look as 

    lqqqqqqqqqqqqqqqqqqqqqq Copy Sequence Feature qqqqqqqqqqqqqqqqqqqqqqqkN...
    x                                                                    x
    x  output directory: /bioy/scratch/doelz/                            x
    x  file name:                                                        x
    x  sequence format [F]:  @                                           x
    x                                                                    x
    x  begin [B]: 0       relative to begin (1) or end (2): 1            x
    x    end [E]: 0       relative to begin (1) or end (2): 2            x
    x       minimum length [M]: 10     maximum length: 0                 x
    x                                                                    x
    x         reject if feature incomplete: Y                            x
    x  reject if selected range incomplete: Y                            x
    x                                                                    x
    x                              do =>   ([Do])    abort =>   ([F10])  x
    x                                                                    x
    mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj


As the copying process starts, you will see success messages and messages 
which indicate that the sequence does not meet the constraints (10 bp) 
or that the feature is incomplete (we ticked Y for Yes in this option). 

Finally, you will come up with about 80 sequences, conveniently written 
in GCG format (you might want PIR format as another option). The following
is showing a sample entry on VMS: 

D$DAY:[DOELZ]CFCRE1_S2.GCG;1

FT   repeat_region   1. .415
FT                   /note="mini-exon gene repeat

  Sequence extract of feature "CFCRE1_S2"
  Begin position: 0 added to BEGIN
  End position: 0 added to END
  Length constraints: min=10, max=0

D$DAY:[DOELZ]CFCRE1_S2.GCG  Length: 415  Check: 4283  ..

       1  AAGCTTCCGG AAACAACCGG CACAAATTTT GAGGCGGAAG CGCTGCTTTT TTTTGTGTCC
      61  GGGGGGGTGC TCCTTGGGGT CCCCCTGTCC AGCCCCAGCC GGTCGCCCAC CACATAGGAA
     121  TTTGCGAAGG ACCCCCAAAA ATCCCGGTCC CCGGGGCGAG TTGTCCCAAC TTTTTCAAAC
     181  CTCATGAAGA GCTAGTTGCG TCATTGAAAA GTTCGTGTGC AGAAACCCCC TCCCCCACGT
     241  TTGTACAATG GAAGAGTTTA CGATACAGGT TTTCTCACGG TTTTGAGGTG TTTTTTCGAA
     301  AAACAAAAAA TATAGAGGTG TATAGCGCTT ATTTTTGACA CCCCCCTCAA AACATGCTGG
     361  GGGTATAGGT CCTTCCAACT AACGCTATAT AAGTATCAGT TTCTGTACTT TATTG



SRS is available as telnet hole and via anonymous FTP in full source. 
It is known to run well on VMS and various flavours of UNIX systems. 
GCG Version 8, expected later this year, is rumoured to contain a 
SRS system.



Regards
Reinhard 





DISCLAIMER
Note that  the software  mentioned  resembles  Computer  Program(s)  which 
require a license in order to be run unless stated otherwise in  a  state-
ment  codistributed with the software. The use of the program(s) was  men-
tioned  within  a specific problem or example and must not be used to con-
clude that other  software products cannot possibly do a similar job. 

-- 
  +---------------------------+-------------------------------------------+
  |    Dr. Reinhard Doelz     | Tel. x41 61 2672247    Fax x41 61 2672078 |
  |      Biocomputing         | electronic Mail       doelz at urz.unibas.ch |
  |Biozentrum der Universitaet+-------------------------------------------+
  |   Klingelbergstrasse 70   | EMBnet         embnet at comp.bioz.unibas.ch |
  |CH 4056 Basel  SWITZERLAND | Switzerland       gopher.embnet.unibas.ch |
  +---------------------------+------------- http://beta.embnet.unibas.ch/



More information about the Embl-db mailing list