[Protein-analysis] Re: Newbie question about microarray analysis
Austin P. So (Hae Jin)
nobody at nowhere.com
Mon May 29 23:54:16 EST 2006
Rex Eastbourne wrote:
> Thanks again for replying. The k-means algorithm should be a snap. But
> how do I convert the proteins, which are in the format
> "UPSP_SLDJK_HUMAN_P12182" to vectors that can be handled by the
> mathematical algorithm (i.e. what is the "distance" between two
> proteins)? Is there already a program that does this? (I understand
> there's something on the NCBI's website.)
So, if I understand the format of the data:
1. "UPSP_SLDJK_HUMAN_P12182" is just a name...say it is a row id.
2. with that name (i.e. in each row), you will have a series of data
points, each data point corresponding the amount of protein found in
patient X (technically you don't have to know if they have the disease
or not).
3. each column (i.e. patient data) will therefore be a
(multidimensional) data vector, with each protein being an "axis".
patient1 patient2 patient3 patient4
protein1 1 50 49 3
protein2 2 35 30 1
protein3 30 20 20 31
In this way you can apply (hierarchical) k-means clustering on the
column "vectors".
Note that you may not get anything either since ultimately your analysis
is only as good as your data...
Austin
More information about the Proteins
mailing list