Out of Africa: another mistake, this time by me

arlin at ac.dal.ca arlin at ac.dal.ca
Mon Nov 18 16:57:18 EST 1991

In article <robison.690432470 at ribo>, robison at ribo.harvard.edu (Keith Robison) writes:
> In another posting arlin at ac.dal.ca writes:
>>...even if we could treat population migrations as
>>character-state changes in lineages, we would not come up with the
>>numbers that Keith Robison derives. A parsimony analysis is carried
>>out by using allowable character state transitions to explain a
>>distribution of observed character states within a pre-specified
>>phylogeny. In the analysis that Keith Robison is presenting, the
>>observed character states are geographic locations (Africa or
>>not-Africa) of the bearers of mtDNA types, and the phylogeny is the
>>supposed phylogeny of the mtDNA types.  The observed character states
>>for extant lineages are given on p. 1505 of the September 27 issue of
>>Science (Vigilant, et al., 1991), along with the tree.  The arguments
>>below will not make sense unless you are looking at the diagram on p.
>>1505.  The question that Keith is trying to answer is this: how many
>>character state changes (Africa-to-not-Africa or not-Africa-to-Africa)
>>does it take to explain the observed character states if A) the
>>ancestral character state was *Africa* or B) *not Africa*.  His
>>answers are A) 12 and B) 28.
>>The first answer is correct but the second is incorrect.  Looking at
>>the tree on p. 1505, it is easy to see that if the ancestral state was
>>*Africa*, then changes to *not-Africa* are needed for 1) type 23; 2)
>>type 28; 3) the ancestor of 49 & 50; 4) type 58; 5) the ancestor of
>>types 74 through 135.  In addition, many descendants of the ancestor
>>of types 74-135 went back to Africa, so we must postulate
>>back-migrations for 6) type 76; 7) the ancestor of types 77 & 78; 8)
>>type 83; 9) type 100; 10) type 103; 11) the ancestor of types 105-107;
>>and 12) type 127.  Keith seems to have correctly identified all of
>>these events (at least he came up with the right total of migrations
>>and back-migrations).
>>If the ancestral state was *not-Africa*, then changes to *Africa* must
>>be postulated for 1) the ancestor of types 1-9; and 2) the ancestor of
>>types 10-135.  Back migrations will then have to be postulated for 3)
>>type 23; 4) type 28; 5) the ancestor of 49 & 50; 6) type 58; 7) the
>>ancestor of types 74 through 135. In addition, many descendants of the
>>ancestor of types 74-135 went back to Africa, so we must postulate
>>migrations for 8) type 76; 9) the ancestor of types 77 & 78; 10) type
>>83; 11) type 100; 12) type 103; 13) the ancestor of types 105-107; and
>>14) type 127.  [these numbers should look familiar-- see previous
>>paragraph].  So the parsimony tally for this hypothesis should be 14,
>>not 28, as Keith Robison suggested (Keith seems not to have recognized
>>the necessity of event #7, and so missed finding the most 
>>parsimonious solution for the out-of-not-Africa hypothesis). 
> Whoa! Take a look at the tree for 1-9.  The only common ancestor of all of
> these terminal nodes is the ROOT OF THE TREE!  

OOPS!!  Thanks for pointing this out, Keith.  In my haste I misread the 
tree, and when I wrote "1-9" and "10-135" in the above paragraphs, I 
really should have said "1-6" and "7-135."  Correcting this error does 
not change the assignments for the most parsimonious solution, nor does
it alter the force of my argument: 12 "migrations" are needed for the
out-of-Africa hypothesis and 14 "migrations" for the out-of-not-Africa

Keith was astute in noting my mistake, but then miscontrues its import:
>You could salvage your 
> argument by spliting 1-9 into 1-6 and 7-9.  However, by doing so you have
> won your argument in a way O.Henry would love -- you have constructed an
> alternative scenario in which the root of the tree is not in Africa but
> in which all paths through the tree at one time pass through Africa. 
> Put another way, under such a scenario the common ancestor of all humans
> was not African, but all humans have an African ancestor.

My argument is not really changed: as noted above, I simply mis-recognized
1-9 and 10-135 as the primary clades whereas they really should be 1-6 and 

Yes, Keith, the most parsimonious solution for the out-of-not-Africa
hypothesis is such that all paths from the root to an external node pass 
through the character state "africa."  This is not a clear indication that
the out-of-not-Africa hypothesis is less parsimonious than the out-of-Africa
hypothesis: parsimony evaluates a hypothesis, not according to whether one
finds it humorous or not, but according to the minimum number of evolutionary
events necessary to explain a distribution of character states under the 
hypothesis.  We both seem to agree that the parsimony solution for
out-of-Africa is 12 events, and the solution for the out-of-not-africa
hypothesis is something more than 12 (i.e., something less parsimonious).
I submit that the most parsimonious solution for out-of-not-Africa is 14 
events, and that 28 is an incorrect answer, or an answer based on some other
method than parsimony.  According to parsimony, the out-of-not-africa 
hypothesis is only 14/12ths more humorous than the out-of-africa hypothesis.

The hypergeometric test:

This statistical test was chosen for its ostentatious title, and not because
it is apt for the purpose.  Here's why:

Imagine, Keith, that the root of the tree on p.1505 has been removed. Now, 
pin the tree up to your wall and throw a dart at it.  Place the root 
nearest the dart, and assess the character of successive nodes near the root.  
You are testing the null distribution of successive african or not-african
clades in the proximity of a randomly-placed root.  Does this fit the hyper-
geometric distribution?  You will find that it does not-- it is obvious that
the nodes are already non-randomly related: african nodes (whether they are
near the root or not) tend to be clustered, and so do non-african nodes.
The hypergeometric test is not applicable, because the null expectation does
not fit the hypergeometric distribution.  This is why this test (you are 
correct in pointing out that it is a valid test used in statistics) is not
used for evaluating trees.   

I hope we can settle the minor dispute about the parsimony numbers, and then
ask i) how sure can we be of the tree? and ii) should we really be using the
parsimony method as applied in the above example?

Arlin Stoltzfus

Arlin at ac.dal.ca

More information about the Mol-evol mailing list