Warning: TreeMap
1.0 has bugs in the randomisation test!
4
July 2000
There are two
bugs in the randomisation routine in TreeMap, which have surfaced
as a result of papers presented at the recent Glasgow
meeting on host-parasite cospeciation in August 1999, and a
workshop I gave at Sydney in June 2000. Because much of the tree
and interface code libraries used by TreeMap are shared with other
programs (such as TreeView)
and have changed dramatically in the five years since TreeMap was
written, it has been difficult to create a stop-gap release with
these bugs fixed. A new version of TreeMap is currently being developed,
which will fix the randomisation bugs. This version uses a completely
different (and better) algorithm for reconstructing the history
of host-parasite assemblages ("jungles"). Until TreeMap 2 is released,
please interpret the results of TreeMap 1.0 statistical tests with
great caution.
My sincere
apologies about these bugs. TreeMap was my first program in C++
and my first one for the Macintosh, and it rather shows. My feeling
is that there will be few cases where the conclusions of a study
will be dramatically altered, but please read the following explanation
of the bugs to make up your own mind. The bugs affect BOTH the Macintosh
and Windows versions of the program in the same way.
Bug 1
The first bug
(found by Kevin Johnston) concerns the generation of random trees
using the Markovian (Yule process) model. TreeMap 1.0 does not generate
the correct null distribution of trees. The algorithm used in TreeMap
has two steps: (1) generate a random topology by randomly bifurcating
the tips of a growing binary tree, then (2) randomly assign taxon
labels to the tips of the tree. This last step was inadvertently
omitted in TreeMap 1.0, hence the Markovian trees generated are
only a subset of the actual distribution. This bug has two consequences.
The first is that the null distribution of random trees is incorrect.
Secondly, analyses run on different files for the same taxa may
generate different results. This would occur, for example, if the
same taxa occurred in different orders in the two files. A file
with the parasite tree (a,(b,(c,(d,e)))) would generate a different
distribution of random parasite trees from a file with the parasite
tree ((((e,d),c),b),a). However, different trees within the same
file would generate the same distribution. This bug does not affect
the "proportional-to-distinguishable" option.
Advice
Do not use
the Yule (Markovian) model at the present time. Redo any analyses
using the proportional-to-distinguishable model (but see the next
bug).
Bug
2
The second
bug (found by Jason Taylor) affects randomisation tests using either
the Yule or proportional-to-distinguishable models in the same way.
Because of a programming error the distribution of cospeciation
may be slightly biased towards rejecting the null hypothesis of
cospeciation if the host and parasite phylogenies are small. For
each pair of host and parasite trees TreeMap would always require
a minimum of one host switch, hence the case of no host switching
would never appear in the distribution. Hence, given a pair of five
taxon trees that match perfectly, we would expect the maximum of
four possible cospeciation events to occur 1 in 105 times, so in
100 randomisations we expect to see at most a single instance of
four cospeciations. In fact, due to the bug we never get this result
in TreeMap. Hence, the distribution of numbers of cospeciations
that TreeMap displays in the Histogram window will be truncated
at the right end because it lacks the highest possible value.
In practice
this is unlikely to be a serious problem, as usually the reconstruction
can be improved by postulating host switches, unless the trees are
nearly (or actually) identical. Unless the number of taxa is small
(less than six), this bug should not affect the outcome of the test
because the maximum value would rarely occur more than once (if
at all) in a set of randomisations. For example, for trees with
eight taxa each, the maximum possible number of cospeciations is
seven, and this would only occur if the host and parasite trees
were identical. Given that there are 135,135 possible rooted trees
for eight taxa, even 10,000 randomisations would rarely yield trees
with seven cospeciations.
Advice
Don't use the
randomisation test in cases where there are fewer than six parasite
taxa. If your test did not reject the hypothesis that the host and
parasite trees no more similar than random (i.e., no cospeciation)
then your result is unlikely to be significantly affected by this
bug. If your test did reject the hypothesis of random trees (i.e.,
consistent with copseciation) then the result is likely to be unaffected
by the bug unless you have few taxa (in which case the trees would
have to be almost indentical to reject the null hypothesis anyway).
What am I
doing about it?
For the reasons
given above I have been unable to provide a version of TreeMap 1
with the bug fixed, so I am concentrating on getting TreeMap 2 ready
as soon as possible.
How does this
affect your results?
The test results
most likely to be at risk are those using the Yule (Markovian) model.
You should repeat the test using the Proportional to distinguishable
model. If you have concerns about a specific analysis please
contact me directly.
|