> Message-Id: <199505270135.VAA20357 at mhade.production.compuserve.com>
> From: Colbert Philippe <74671.1062 at CompuServe.COM>
> To: spin at ief.spb.su> In-Reply-To: spin at ief.spb.su (Alexandr Spirov)'s message <AAKRUmliI8 at ief.spb.su>
> Subject: Promoter region sequence analysis
> Organization: EIGENSOFT INC
> Status: RO
>> Dear Mr Spirov
>> I am currently writing software that will have functions close to
> what you have talked about. For the sake of completeness, can
> you point to me to some literature that explain the algorithms
> that you talked about? I am interested in implementing these
> features in my new program.
>> Thank you
Sorry for such voluminous citation below (from
my MS in preparation). But I hope the text will illustrate
some problems in the field of computer comparisons of 5' gene
regions. In brief, I can mention following problems:
1) Sequence of target sites, their proximity or
overlapping with one another and their amounts are features of
structure of more high level as compared with bulk nucleotide sequence
in the upstream gene regions; It is as a sort of code;
2) Spacers between target sites can mutate relatively freely;
In case of evolutionary diverged species the only homology discovered
is homology between nucleotide sequences inside target sites, while
the spacer sequences don't demonstrate any sequence homology at all;
3) As a result, the scanning of sites as well as the multiple
alignement of the 5' upstream gene parts don't permit to find
homology between evolutionary diverged genes-relatives.
Desirable programm must treat the sequence of target sites
as such and owerall structure of the cluster of sites, qualitatively
comparing and classifiing the clusters, not or not only the nucleotide
My figures 1-3 below show discussed characters of comparisons of
upstream sequences of relative but diverged members of gene family.
With the exception of nucleotide sequences of the target sites properly
(sites for binding BICOID ((GGGATTA-consensus)) and sites for binding
Antennapedia-like transcription factors ((NNCATTA-consensus))), we
cannot find another homologus sequences(!)
P.S. See also:
Falb D. & Maniatis T.
"A conserved regulatory unit implicated in tissue-
specific gene expression in DROSOPHILA and MAN"
Genes & Development, 1992, 6, 454-465.
where they pointed out that
"Both Drosophila and human Adh upstream control elements
conserve a functionally related region that has little
primary sequence similarity, but still contains overlapping
binding sites for the transcription factors AEF-1 and C/EBP"
<<Adh - Alcohol dehydrogenase ;-) >>
Surprisingly enough, the known Drosophila HOM genes 5'
flanking regions appears to share homology with vertebrate HOX 5'
flanking regions (regarding to ATTA/TAAT containing sites). But
what is more, clusters of homologous elements (as compared
Drosophila genes with mammalian homologs) localized at similar
positions relative to translation initiation codon. Comparison of
the 5' flanking region of the Dfd as well as Ubx with vertebrate
members of the Dfd-paralogous group (human and mice Hox-4.2 and
mice Hox-3.5) is shown on my Figs.1-2.
"gggatta"-BICOID site "nncatta"-Antp site
Dfd GAAAattaTGAAGACAACGGGAGCCTTCTAACCCCTTTGTTTAgttaA--AAATattaAA 70
HHox-4.2 TgggattaCCTGAGGGGAATGGGGTGCTGGGGACTGG---------AA--CtacattaAT 60
MHox-4.2 CgagattaCCTGCCGGAAATTGGGACCCCGGGGAT-----AGAattaGAActctattaGC 66
MHox-3.5 CGCTgttaCTCCACGCTGAGCGCTCCGCCTGCCGA--CAACTTGACCCCGCTGACGTCAC 69
Dfd AAGAATACGAAATTTATTTTAACCAACACAATCTTAAAA 108
HHox-4.2 ATCTGGCAGGGGCTCTC-AAATGTGCCATAGCAAGCTAC 98
MHox-4.2 ATCTGTCAGGGACTCTC-AAATGTGGCATGGCAAGTCAC 104
MHox-3.5 GACCGTCTGAATCATCAAGGCCATTTTCAAATCCCATTG 108
Dfd GTAattaCTACTTGCAAAAGCAGCGCCTTtaatCAATAgttaatgtA 155
HHox-4.2 TTGattaCACGTATgttaTTTAgttaAATTTGT-GAAAattaTGAGA 144
MHox-4.2 TTGattaCACGTATgttaTTTAgttaAATTTGT-GAAAattaTGAGA 150
MHox-3.5 GTCTAGCCGTCACATGGTGAGGACCGAATGCGCGGATAattaTGGAG 155
FIG.1. Comparison of Drosophila Dfd, human & mice Hox-4.2 and
mice Hox-3.5 5' flanking (promoter) regions. ATTA containing
elements which share structural and positional homologies is
"gggatta" BICOID site "nncatta" Antp-site
FIG.2. Comparison of Drosophila Ubx and human & mice Hox-4.2
flanking (promoter) regions. ATTA containing elements
which share structural and positional homologies is
In case of Dfd-4.2-3.5 alignment we can see two conserved
pairs of ATTA containing sites. More 5' localized pair shares
perfect homology with bicoid binding site and with Antp/Hox-1.3
binding site. Similar pair of ATTA-containing sites there is in
Ubx promoter region (Fig.2).
Scr CAattacccattaGAACCATCCAAAACATAAGCCTGCAGGTAGGACGCAAAAGTCTAGCC 60
ZHox-2.1 CAgttaaacattaGATTATATTTTCATATTAAGCAtGCattaTAAAGTATGTGGGATGTT 60
Scr AGTTTGCCCTCAGGATGCCAT-----------CGGattaatT 90
ZHox-2.1 GTATGTGTAAATGGTCAATAGGTACTCTAAACGTTattaacG 101
FIG.3. Comparison of Drosoophila Scr & zebrafish Hox-2.1 flanking
(promoter) regions. ATTA/TAAT containing elements which
share structural and positional homologies is underlined.
There is also remarkable homology between 5' flanking region
of Scr and its homolog zebrafish hox-2.1 (Fig.3). We can see
there closely adjacent pair of ATTA containing sequences one of
which is homologous to Antp binding site (more 5' localized) and
ATTAAAT motif (more 3' localized).