Locuslink and protein information

maneesh maneesh at drunkenbastards.com
Tue Jul 10 12:36:14 EST 2001


Hi all,
I'm a little new to bioinformatics in general but I was wondering if
anyone would mind helping me out by suggesting a general
methodology....

We have some affymetrix accessions which I eventually want to
understand in terms of protien domains and where those domains are on
their respective genes...

Now I think my head has stopped spinning from all the relationships
between the major public databases...but it's still not clear to me on
how to do this...

I can get a genbank accession for a particular affymetrix accession. 
>From genbank, I can get to a locuslink LocusId...if that particular
LocusId has a protein refseq, I can get the protien domain information
(right from the locuslink form), and if I have a mRNA refseq I
understand that it is 'blasted' against genomic contigs (which are
assembled from genbank entries), and if I am lucky (how lucky do I
have to be?) I can get a contig.  What I would like is get a contig
sequence that is annotated with protein domain information... is not
clear to me wether or not the contigs are annotated with protien
domain data..

I think I could just grab the CDS entries, grab the NCBI Protein ID,
get the domain information, but how do I go back to the location of
the domain on the gene?

A few other random questions that are bugging me, that I would love
answered are:

a)As far as I can tell contig entries seem to be entire
chromosomes..am I wrong?
b)I am using python and postgresql to leech data from the locuslink
flatfile...but in order to do anything slightly fancy with the
sequences I think I'd need some sequence libs...if there is an obvious
answer to this please let me know, in the meantime I'm going to hunt
around  (or should I just stick with Perl?)...

Thanks in advance for any help you can provide...




More information about the Bio-www mailing list