ClustalX's NJ Plot explanation? - DND.gif (0/1)
keith at thale.nott.ac.uk
Tue Aug 15 06:48:41 EST 2000
On Mon, 14 Aug 2000, Asteras Amaliadas wrote:
> Hi there,
> I used ClustalX to do multiple alignment of a few proteins. NJ Plot (a
> program which is distributed with ClustalX) produced the attached
> tree. I would be indebted if you could explain to me what the
> distances of the branches mean (what is this 0.05?) or if you could
> point out a relevant reference.
Whilst I can't find any attachment to your post, I can (hopefully) explain
a bit about the numbers you have seen.
Branches on an NJ tree reflect the evolutionary distance between the
sequences in question. This distance is usually inferred from the
'observed' differences between sequences and corrected for by applying
Kimura's correction for multiple hits. I.e. very divergent sequences have
probably undergone more changes that you can actually see (parallel
The branch lengths are then calculated (I think) as the number of
nucleotide substitutions per nucleotide site. So if two sequences which
have 100 bp that align perfectly are compared, and you get a branch length
of 0.05 then you would expect there to have been 5 substitutions that have
occurred since the sequences diverged. Remember though that this is the
total amount of changes that have occured between the two sequences and on
a tree this might be divided into each branch.
I.e. on an unrooted tree, assume that the branch length is 0.05 between
two sequences A and B:
A -------------------- B
Same sequences shown on a rooted tree:
For sequences that have not diverged too much, I have seen people refer to
these branch lengths as '%divergence', i.e. A and B in the above example
are 5% different. You can get into problems if you adhere to this too
much, but it serves as a general rule-of-thumb guide.
The NJ method was developed by Saitou and Nei (1987, Mol. Biol. Evol. 4,
406-425) though branch length measurements have probably been around a lot
A general warning: very short branch lengths on a tree are often
artifacts of the tree building program and don't reflect a branch in the
'real' (unknown) evolutionary tree.
Hope this helps, if you want to know more, I'd refer you to the excellent
book 'Molecular Evolution' by Wen-Hsiung Li.
~ Keith Bradnam - Developer, Arabidopsis Genome Resource (AGR)
~ Nottingham Arabidopsis Stock Centre - http://nasc.nott.ac.uk/
~ University Park, University of Nottingham, NG7 2RD, UK
~ Tel: (0115) 951 3091
More information about the Bio-soft