Help searching databases
doelz at comp.bioz.unibas.ch
Wed Mar 2 04:52:05 EST 1994
In article <0hQpzNO00iV7A3=HJY at andrew.cmu.edu>, "Howard M. Bomze" <hb10+ at andrew.cmu.edu> writes:
|> I need some help in searching through the sequence data bases for splice
|> junctions. Does anybody know of any programs out there that I can Use
|> to do this? I am trying to look at sequences next to splice junctions
|> but in the EXON. I don't think this will be too hard but I don't know
|> much about programing. Thanks in advance for any help.
I am not sure what you mean with 'look at' sequences but I'll show
you how to prepare a set of sequences with the SRS system from Thure
[S] (select a sequence query)
/[D] SPLICE & JUNCTION
/[T] SPLICE & JUNCTION
/[C] SPLICE & JUNCTION (enter search terms SPLICE _and_ JUNCTION
in Description, Title and Reference)
/[S] [SPACE] [S] [E] [SPACE] (deselect Swissprot, select EMBL database,
you might want to select more DNA databases)
/[X] 2 combine search terms with OR
The mask, now looks as follows: (if your terminal isn't good enough, the
_ and | will be displayed as x and q's or similar, I'm afraid)
[G] General [O] BuffOptions [U] Query [H] Help
x ID [I]: x
x Accession [N]: x
x Definition [D]: SPLICE & JUNCTION x
x Keywords [K]: x
x Organism [O]: x
x Authors [A]: x
x Title [T]: SPLICE & JUNCTION x
x Reference [R]: x
x Comment [C]: SPLICE & JUNCTION x
x Features [F]: SPLICE & JUNCTION x
x separate keys by & (AND), | (OR), or ! (AND NOT) x
x query (set) name [Q]: SQ1 select library(s) [S]: @ x
x connect fields by AND (1) or OR (2) [X]: 1 x
x do => ([Do]) abort => ([F10]) x
[RETURN] to do the query. There will be about 100 entries. Next,
[O] [W] to view the query results, my screen looks like
Info of Query "SQ1"
SQ1 = (
[SQ-DEF: SPLICE* & JUNCTION*] |
[SQ-TIT: SPLICE* & JUNCTION*] |
[SQ-CC: SPLICE* & JUNCTION*]) |
[SQ-FTS: SPLICE* & JUNCTION*] > PARENT
86 entries from library "EMBL"
13 entries from library "EMBL_NEW"
10 entries from library "GENBANK"
Created set of type "Seq-ID" has 109 members
[Q] to leave this mode.
Next, we need to find all exons. Again, a sequence query,
[U] [S] /[F] EXON [RETURN] [O] [W]
SQ2 = (
39185 entries from library "EMBL"
2499 entries from library "EMBL_NEW"
1713 entries from library "GENBANK"
Created set of type "Seq-Ft-ID" has 43397 members
[Q] to leave this query.
Next, we combine the two by mapping the first on the second.
[U] [X] SQ1 > SQ2 [RETURN]
In SQ3, you'll find about 200 entries which are DNA exon sequences,
described in other sections of the annotation with SPLICE and JUNCTION.
You may want to copy this into your directory with [O] [C] (copy
set output). Now as Exons might be rather short we set a minimum limit to
10 base pairs, and the screen will look as
lqqqqqqqqqqqqqqqqqqqqqq Copy Sequence Feature qqqqqqqqqqqqqqqqqqqqqqqkN...
x output directory: /bioy/scratch/doelz/ x
x file name: x
x sequence format [F]: @ x
x begin [B]: 0 relative to begin (1) or end (2): 1 x
x end [E]: 0 relative to begin (1) or end (2): 2 x
x minimum length [M]: 10 maximum length: 0 x
x reject if feature incomplete: Y x
x reject if selected range incomplete: Y x
x do => ([Do]) abort => ([F10]) x
As the copying process starts, you will see success messages and messages
which indicate that the sequence does not meet the constraints (10 bp)
or that the feature is incomplete (we ticked Y for Yes in this option).
Finally, you will come up with about 80 sequences, conveniently written
in GCG format (you might want PIR format as another option). The following
is showing a sample entry on VMS:
FT repeat_region 1. .415
FT /note="mini-exon gene repeat
Sequence extract of feature "CFCRE1_S2"
Begin position: 0 added to BEGIN
End position: 0 added to END
Length constraints: min=10, max=0
D$DAY:[DOELZ]CFCRE1_S2.GCG Length: 415 Check: 4283 ..
1 AAGCTTCCGG AAACAACCGG CACAAATTTT GAGGCGGAAG CGCTGCTTTT TTTTGTGTCC
61 GGGGGGGTGC TCCTTGGGGT CCCCCTGTCC AGCCCCAGCC GGTCGCCCAC CACATAGGAA
121 TTTGCGAAGG ACCCCCAAAA ATCCCGGTCC CCGGGGCGAG TTGTCCCAAC TTTTTCAAAC
181 CTCATGAAGA GCTAGTTGCG TCATTGAAAA GTTCGTGTGC AGAAACCCCC TCCCCCACGT
241 TTGTACAATG GAAGAGTTTA CGATACAGGT TTTCTCACGG TTTTGAGGTG TTTTTTCGAA
301 AAACAAAAAA TATAGAGGTG TATAGCGCTT ATTTTTGACA CCCCCCTCAA AACATGCTGG
361 GGGTATAGGT CCTTCCAACT AACGCTATAT AAGTATCAGT TTCTGTACTT TATTG
SRS is available as telnet hole and via anonymous FTP in full source.
It is known to run well on VMS and various flavours of UNIX systems.
GCG Version 8, expected later this year, is rumoured to contain a
Note that the software mentioned resembles Computer Program(s) which
require a license in order to be run unless stated otherwise in a state-
ment codistributed with the software. The use of the program(s) was men-
tioned within a specific problem or example and must not be used to con-
clude that other software products cannot possibly do a similar job.
| Dr. Reinhard Doelz | Tel. x41 61 2672247 Fax x41 61 2672078 |
| Biocomputing | electronic Mail doelz at urz.unibas.ch |
|Biozentrum der Universitaet+-------------------------------------------+
| Klingelbergstrasse 70 | EMBnet embnet at comp.bioz.unibas.ch |
|CH 4056 Basel SWITZERLAND | Switzerland gopher.embnet.unibas.ch |
More information about the Embl-db