Promoter region sequence analysis

Alexandr Spirov spin at ief.spb.su
Tue May 30 13:29:17 EST 1995


>   Message-Id: <199505270135.VAA20357 at mhade.production.compuserve.com>
>   From: Colbert Philippe <74671.1062 at CompuServe.COM>
>   To: spin at ief.spb.su
>   In-Reply-To: spin at ief.spb.su (Alexandr Spirov)'s message <AAKRUmliI8 at ief.spb.su>
>   Subject: Promoter region sequence analysis
>   Organization: EIGENSOFT INC
>   Status: RO
>
>   Dear Mr Spirov
>
>   I am currently writing software that will have functions close to
>   what you have talked about.  For the sake of completeness, can
>   you point to me to some literature that explain the algorithms
>   that you talked about?   I am interested in implementing these
>   features in my new program.
>
>   Thank you
>

          Sorry for such voluminous citation below (from
my MS in preparation). But I hope the text will illustrate
some problems in the field of computer comparisons of 5' gene
regions. In brief, I can mention following problems:
     1) Sequence of target sites, their proximity or
overlapping with one another and their amounts are features of
structure of more high level as compared with bulk nucleotide sequence
in the upstream gene regions; It is as a sort of code;
     2) Spacers between target sites can mutate relatively freely;
In case of evolutionary diverged species the only homology discovered
is homology between nucleotide sequences inside target sites, while
the spacer sequences don't demonstrate any sequence homology at all;
     3) As a result, the scanning of sites as well as the multiple
alignement of the 5' upstream gene parts don't permit to find
homology between evolutionary diverged genes-relatives.

     Desirable programm must treat the sequence of target sites
as such and owerall structure of the cluster of sites, qualitatively
comparing and classifiing the clusters, not or not only the nucleotide
sequences.

     My figures 1-3 below show discussed characters of comparisons of
upstream sequences of relative but diverged members of gene family.
With the exception of nucleotide sequences of the target sites properly
(sites for binding BICOID ((GGGATTA-consensus)) and sites for binding
Antennapedia-like transcription factors ((NNCATTA-consensus))), we
cannot find another homologus sequences(!)
               Regards,
                         A.Spirov.

P.S. See also:
Falb D. & Maniatis T.
"A conserved regulatory unit implicated in tissue-
specific gene expression in DROSOPHILA and MAN"
Genes & Development, 1992, 6, 454-465.
     where they pointed out that
"Both Drosophila and human Adh upstream control elements
conserve a functionally related region that has little
primary sequence similarity, but still contains overlapping
binding sites for the transcription factors AEF-1 and C/EBP"
<<Adh - Alcohol dehydrogenase ;-) >>

===============================================================================
     Surprisingly enough,  the  known  Drosophila  HOM  genes  5'
flanking regions appears to share homology with vertebrate HOX 5'
flanking regions (regarding to ATTA/TAAT containing  sites).  But
what is  more,  clusters  of  homologous  elements  (as  compared
Drosophila genes with mammalian homologs)  localized  at  similar
positions relative to translation initiation codon. Comparison of
the 5' flanking region of the Dfd as well as Ubx with  vertebrate
members of the Dfd-paralogous group (human and mice  Hox-4.2  and
mice Hox-3.5) is shown on my Figs.1-2.


           "gggatta"-BICOID site                             "nncatta"-Antp site
Dfd        GAAAattaTGAAGACAACGGGAGCCTTCTAACCCCTTTGTTTAgttaA--AAATattaAA 70
HHox-4.2   TgggattaCCTGAGGGGAATGGGGTGCTGGGGACTGG---------AA--CtacattaAT 60
MHox-4.2   CgagattaCCTGCCGGAAATTGGGACCCCGGGGAT-----AGAattaGAActctattaGC 66
MHox-3.5   CGCTgttaCTCCACGCTGAGCGCTCCGCCTGCCGA--CAACTTGACCCCGCTGACGTCAC 69


Dfd        AAGAATACGAAATTTATTTTAACCAACACAATCTTAAAA                      108
HHox-4.2   ATCTGGCAGGGGCTCTC-AAATGTGCCATAGCAAGCTAC                      98
MHox-4.2   ATCTGTCAGGGACTCTC-AAATGTGGCATGGCAAGTCAC                      104
MHox-3.5   GACCGTCTGAATCATCAAGGCCATTTTCAAATCCCATTG                      108

             "atta"                             "atta"
Dfd        GTAattaCTACTTGCAAAAGCAGCGCCTTtaatCAATAgttaatgtA              155
HHox-4.2   TTGattaCACGTATgttaTTTAgttaAATTTGT-GAAAattaTGAGA              144
MHox-4.2   TTGattaCACGTATgttaTTTAgttaAATTTGT-GAAAattaTGAGA              150
MHox-3.5   GTCTAGCCGTCACATGGTGAGGACCGAATGCGCGGATAattaTGGAG              155


     FIG.1. Comparison of Drosophila Dfd, human & mice Hox-4.2 and
                mice Hox-3.5  5' flanking (promoter) regions. ATTA containing
          elements which share structural and positional homologies is
          underlined.


                 "gggatta" BICOID site                   "nncatta" Antp-site
Ubx        AAATTAAAAgattaTTA----AG-ATTGAAGT--------CTCAATAAAcattaGT
HHox-4.2   AAAAGCTgggattaCCTGAGGGGAAT-GGGGTGCTGGGGACTGGAACTAcattaAT
MHox-4.2   AAAAGCCgagattaCCTGCCGGAAATTGGGACCCCGGGGA-T-----AGAattaGA


     FIG.2. Comparison of Drosophila Ubx and human & mice Hox-4.2
          flanking (promoter) regions. ATTA containing elements
          which share structural and positional homologies is
          underlined.


     In case of Dfd-4.2-3.5 alignment we can  see  two  conserved
pairs of ATTA containing sites. More  5'  localized  pair  shares
perfect homology with bicoid binding site and  with  Antp/Hox-1.3
binding site. Similar pair of ATTA-containing sites there  is  in
Ubx promoter region (Fig.2).

            "atta"&
                "nncatta" Antp-site
Scr        CAattacccattaGAACCATCCAAAACATAAGCCTGCAGGTAGGACGCAAAAGTCTAGCC 60
ZHox-2.1   CAgttaaacattaGATTATATTTTCATATTAAGCAtGCattaTAAAGTATGTGGGATGTT 60

                                             "attaat" site
Scr        AGTTTGCCCTCAGGATGCCAT-----------CGGattaatT 90
ZHox-2.1   GTATGTGTAAATGGTCAATAGGTACTCTAAACGTTattaacG 101

     FIG.3. Comparison of Drosoophila Scr & zebrafish Hox-2.1 flanking
          (promoter) regions. ATTA/TAAT containing elements which
          share structural and positional homologies is underlined.


     There is also remarkable homology between 5' flanking region
of Scr and its homolog zebrafish  hox-2.1  (Fig.3).  We  can  see
there closely adjacent pair of ATTA containing sequences  one  of
which is homologous to Antp binding site (more 5' localized)  and
ATTAAAT motif (more 3' localized).
------------------------------------------------------------------------------






More information about the Bio-soft mailing list