Comparing protein sequences.
SPAMFILTER-scott.coutts at med.monash.edu.au
Wed Dec 1 18:46:28 EST 2004
Stefek Borkowski wrote:
> Scott Coutts wrote:
>> You'd be better off finding both sequences (you can do this by simply
>> using a keyword search) and then doing an alignment of the two
>> sequences. You can do this on the web, using one of the 'clustal'
>> programs, or you can download a stand-alone version of clustal and
>> view your alignment using another downloadable program called
>> 'genedoc'. I dont have the web addresses on hand at the moment, but
>> you can easily find them with a google search.
> Thanks Scott for your quick answer. I just figured it out that I can use
> the WWW module of BLAST called "BLAST 2 Sequences". Although I still
> have an interpretation problem. Would you care to comment on the below,
> I would like to know what the BLAST interpretation really is in the case
> of comparing two sequences by the BLAST 2 Sequences online modul. The
> report goes as follows:
> Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%)
> I would say that the homology of the two proteins is equal to the value
Firstly, a technical point here... when your talking about genes, you
should say 'similarity' rather than 'homology'. Either a gene is a
homolog of another, or it's not.
> of "Identities", so it would be 28%. What about the "Positives" then? I
> happend somewhere in the literature on estimation of the homology
> between the 2 proteins, stating that it is equal to 36%. This seems to
> be more or less the average arithmetic mean of "Identities" and
> "Positives", namely (28 + 46)/2 is 37% which seems close to the
> literature value. Is my way of thinking correct or not necessarily. In
> other words, whot is the recommended algorithm of estimationg the
> homology of two sequences, on the basis of BLAST report.
I'm not sure what figure they were quoting, but if it is properly quoted
in the literature as a percentage, then it should include a statement of
whether it is identities or similarities (positives), the region over
which the count was obtained (if it's not mentioned the usually it's the
You should read the documentation that comes with BLAST to understand
how it works. The 'identities' is indicating the number of amino acids
that are exactly the same, and the 'positives' is indicating the number
that are similar (i.e. maintain similar properties, for example, both
hydrophobic etc). You should also consider the E value that you're given.
More information about the Microbio