[Protein-analysis] Re: Newbie question about microarray analysis

Austin P. So (Hae Jin) nobody at nowhere.com
Mon May 29 23:54:16 EST 2006


Rex Eastbourne wrote:
> Thanks again for replying. The k-means algorithm should be a snap. But
> how do I convert the proteins, which are in the format
> "UPSP_SLDJK_HUMAN_P12182" to vectors that can be handled by the
> mathematical algorithm (i.e. what is the "distance" between two
> proteins)? Is there already a program that does this? (I understand
> there's something on the NCBI's website.)

So, if I understand the format of the data:

1. "UPSP_SLDJK_HUMAN_P12182" is just a name...say it is a row id.
2. with that name (i.e. in each row), you will have a series of data 
points, each data point corresponding the amount of protein found in 
patient X (technically you don't have to know if they have the disease 
or not).
3. each column (i.e. patient data) will therefore be a 
(multidimensional) data vector, with each protein being an "axis".

		patient1	patient2	patient3	patient4
protein1	1	50	49	3
protein2	2	35	30	1
protein3	30	20	20	31

In this way you can apply (hierarchical) k-means clustering on the 
column "vectors".

Note that you may not get anything either since ultimately your analysis 
is only as good as your data...

Austin


More information about the Proteins mailing list