We introduce a mechanism for analytically deriving upper bounds on the maximum likelihood for genetic sequence data on sets of phylogenies. A simple ‘partition’ bound is introduced for general models. Tighter bounds are developed for the simplest model of evolution, the two state symmetric model of nucleotide substitution under the molecular clock. This follows earlier theoretical work which has been restricted to this model by analytic complexity. A weakness of current numerical computation is that reported ‘maximum likelihood’ results cannot be guaranteed, both for a specified tree (because of the possibility of multiple maxima) or over the full tree space (as the computation is intractable for large sets of trees). The bounds we develop here can be used to conclusively eliminate large proportions of tree space in the search for the maximum likelihood tree. This is vital in the development of a branch and bound search strategy for identifying the maximum likelihood tree. We report the results from a simulation study of approximately 106 data sets generated on clock-like trees of five leaves. In each trial a likelihood value of one specific instance of a parameterised tree is compared to the bound determined for each of the 105 possible rooted binary trees. The proportion of trees that are eliminated from the search for the maximum likelihood tree ranged from 92% to almost 98%, indicating a computational speed–up factor of between 12 and 44.
History
Publication title
Bioinformatics
Volume
19
Pagination
ii66-ii72
ISSN
1367-4803
Department/School
School of Natural Sciences
Publisher
Oxford Univ Press
Place of publication
Great Clarendon St, Oxford, England, Ox2 6Dp
Rights statement
The definitive publisher-authenticated version is available online at: www.oxfordjournals.org