Complex query retrieval question

Przemko Tylzanowski przemko at med.kuleuven.ac.be
Thu Jan 24 06:05:52 EST 2002


Hi!
Long time ago existed a program called TargetFinder- it was used to
identify sequences of interest (e.g. targets for transcription factors)
in promoters. The superiority of that program over EPD was that it was
not limited to 600bp of the promoter. Anyway, italians took it offline
(after publishing it!). So, I am stuck now.

What I would like to do is the following. Identify in the GenBank (or
EMBL- does not matter) all sequences containing promoters, enhancers
and/or sequences upstream of TATA of mouse or human origin (at this
point forget about TATA-less). This bit is easy. I can do it using SRS
(funny part is- it will work on the server in England but not
Brussels...). But here problems start. What I get as an output is the
Feature Sequence (I ask for it) but also the rest of the gene. In cases
of large genomic sequences this is VERY PAINFUL... What I would like to
do is yo  extract from these initial hits (between 2000-4500 depending
on the selection of databases) ONLY the sequences containing the
promoter part (IT WAS POSSIBLE IN SRS4- command line). There I could
say- get me the feature and all sequences that are -2000 and +100 from
it. Then I would like to build a database and then run Findpatterns or
something like that. So, I guess I need a combination of SRS and
GCG/EMBOSS.
So, HOW DO I DO IT IN SRS6? I know that I could probably write something
in PERL, the problem is I don't really know it.

Any suggestions, solutions are welcome!

Przemko


--
Przemko Tylzanowski Ph.D.
LSD & Joint
O & N
University of Leuven
Herestraat 49
3000 Leuven
Belgium

phone: (32-16)34-61-96
fax  : (32-16)34-62-00






More information about the Bio-srs mailing list