help: abstract genome - gene finder

Raf rafta4NO^spam at hotmail.com
Tue Mar 9 08:13:34 EST 2004


we followed your advice and extracted all the ORF's using the GETORF tool at
http://bioweb.pasteur.fr/seqanal/interfaces/getorf.html, then matched them
against a non-redundant protein database in the biology workbench
(http://workbench.sdsc.edu/).  Now we're having troubles interpreting the
results...
Here are our results, if an ORF matches a protein, there is a link...:
http://www.clancommunity.com/muziek/phylogenetics/outseq1-80.html
http://www.clancommunity.com/muziek/phylogenetics/outseq2-80.html
if you look at the first sequence, there are 3 overlaps in ORF's: 11-12-13
(positions: [795 - 944], [802 - 957], [830 - 994])
we know genes can overlap, but is it possible to have 3 ORF's in one gene?
how do genes and ORF's relate to each other?
or is it that there is one gene in these 3 regions (matching one ORF), so
that the other two ORF's are coincedence?

The actual assignment is to make a phylogenetic tree out of 6 genomes, so we
have to find the (mutual) genes first.
We've matched sequence 1 against sequence 2, showing this result:
http://www.clancommunity.com/muziek/phylogenetics/CLUSTALWPROF%20seq1-2.htm
Its clear that there is some resemblance between the two genomes.  We hoped
that we could find ORF's at about the same locations in the two genomes, so
we could align them...  The region at the end of the two genomes doesnt have
many different base-pairs, but the second sequence only has a reverse ORF
([999 - 760] (REVERSE SENSE)).  Is it possible that a gene can be read in
both directions (possibly after some mutations in time)?

genomes can be found:
http://www.clancommunity.com/muziek/phylogenetics/SpeciesDataBase.txt

i know these are a lot of -newbie- questions, but any help is greatly
appreciated :)

raf

> "Gordon D. Pusch" <g_d_pusch_remove_underscores at xnet.com> wrote in message
> news:gi8yiq0y2b.fsf at pusch.xnet.com...
> > "RaFTa" <rafta4NO^spam at hotmail.com> writes:
> >
> > > we've got an assignment from school to identify relevant parts of an
> > > artificial genome, relevant parts being the genes.  we've searched on
> > > google for gene-finding programs, but they always need the type of
> > > species that genome belongs to...
> > >
> > > does anybody know a program that can handle this problem?
> > >
> > > we're informatics-student, so we dont really know a lot of the
> > > biotechnology-world :)
> >
> > If your "fake" genome has been built out of genes taken from from
species
> > whose codon usages are not too dissimilar, you might try GLIMMER,
> > <http://www.tigr.org/software/glimmer/>. However, if the "fake" genome
> > has been built from a random "mix-and-match," about the only thing that
> > _might_ work would be extract all of its "Open Reading Frames" (ORFs)
> > that are long enough to plausibly contain genes (e.g., > ~120 bp),
> > translate them, and then BLAST them against a non-redundant protein
> > database...
> >
> >
> > -- Gordon D. Pusch
> >
> > perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//;
print;'
>
>





More information about the Bio-soft mailing list