[Protein-analysis] Re: Newbie question about microarray analysis

Rex Eastbourne rex.eastbourne at gmail.com
Tue May 30 13:43:56 EST 2006


Hi Austin,

I just have a plain list of 200 proteins, without data from the
experiment. I need to cluster the proteins by their inherent
characteristics (function, ancestry). I used the protein database on
the NCBI website to get the sequences. Now, I want to take all these
200 sequences and get some measure of how similar each is to each
other. I figure this would require some specific software that would
allow me to enter all the proteins and see how they're related. I found
ProtoNet, but it seems you can only enter one protein and explore its
specific cluster. Are there any other tools for this I might not be
aware of?

I'm sorry to keep asking you questions like this -- just referring me
to a website that explains this would be greatly appreciated.

Thank you,

Rex


Austin P. So (Hae Jin) wrote:
> Rex Eastbourne wrote:
> > Thanks again for replying. The k-means algorithm should be a snap. But
> > how do I convert the proteins, which are in the format
> > "UPSP_SLDJK_HUMAN_P12182" to vectors that can be handled by the
> > mathematical algorithm (i.e. what is the "distance" between two
> > proteins)? Is there already a program that does this? (I understand
> > there's something on the NCBI's website.)
>
> So, if I understand the format of the data:
>
> 1. "UPSP_SLDJK_HUMAN_P12182" is just a name...say it is a row id.
> 2. with that name (i.e. in each row), you will have a series of data
> points, each data point corresponding the amount of protein found in
> patient X (technically you don't have to know if they have the disease
> or not).
> 3. each column (i.e. patient data) will therefore be a
> (multidimensional) data vector, with each protein being an "axis".
>
> 		patient1	patient2	patient3	patient4
> protein1	1	50	49	3
> protein2	2	35	30	1
> protein3	30	20	20	31
>
> In this way you can apply (hierarchical) k-means clustering on the
> column "vectors".
>
> Note that you may not get anything either since ultimately your analysis
> is only as good as your data...
> 
> Austin



More information about the Proteins mailing list