Comparing protein sequences.

Scott Coutts SPAMFILTER-scott.coutts at med.monash.edu.au
Wed Dec 1 18:46:28 EST 2004


Stefek Borkowski wrote:

> Scott Coutts wrote:
> 
>>
>> You'd be better off finding both sequences (you can do this by simply
>> using a keyword search) and then doing an alignment of the two
>> sequences. You can do this on the web, using one of the 'clustal'
>> programs, or you can download a stand-alone version of clustal and
>> view your alignment using another downloadable program called
>> 'genedoc'. I dont have the web addresses on hand at the moment, but
>> you can easily find them with a google search.
>>
> Thanks Scott for your quick answer. I just figured it out that I can use 
> the WWW module of BLAST called "BLAST 2 Sequences". Although I still 
> have an interpretation problem. Would you care to comment on the below, 
> please.
> 
> I would like to know what the BLAST interpretation really is in the case 
> of comparing two sequences by the BLAST 2 Sequences online modul. The 
> report goes as follows:
> Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%)
> I would say that the homology of the two proteins is equal to the value 

Firstly, a technical point here... when your talking about genes, you 
should say 'similarity' rather than 'homology'. Either a gene is a 
homolog of another, or it's not.

   http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
   http://www.biomedcentral.com/news/20040309/01

But anyway...

 >
> of "Identities", so it would be 28%. What about the "Positives" then? I 
> happend somewhere in the literature on estimation of the homology 
> between the 2 proteins, stating that it is equal to 36%. This seems to 
> be more or less the average arithmetic mean of "Identities" and 
> "Positives", namely (28 + 46)/2 is 37% which seems close to the 
> literature value. Is my way of thinking correct or not necessarily. In 
> other words, whot is the recommended algorithm of estimationg the 
> homology of two sequences, on the basis of BLAST report.

I'm not sure what figure they were quoting, but if it is properly quoted 
in the literature as a percentage, then it should include a statement of 
whether it is identities or similarities (positives), the region over 
which the count was obtained (if it's not mentioned the usually it's the 
whole protein).

You should read the documentation that comes with BLAST to understand 
how it works. The 'identities' is indicating the number of amino acids 
that are exactly the same, and the 'positives' is indicating the number 
that are similar (i.e. maintain similar properties, for example, both 
hydrophobic etc). You should also consider the E value that you're given.


Scott.



More information about the Microbio mailing list