Mitchell_whole_thesis.pdf (1.19 MB)

# Distinguishing convergence on phylogenetic networks

In phylogenetics, the evolutionary history of a group of taxa, for example, groups of species, genera or subspecies, can be modelled using a phylogenetic tree. Alternatively, we can model evolutionary history with a phylogenetic network. On phylogenetic networks, edges that have previously evolved independently from a common ancestor may subsequently converge for a period of time. Examples of processes in biology that are better represented by networks than trees are hybridisation, horizontal gene transfer and recombination. Molecular phylogenetics uses information in biological sequences, for example, sequences of DNA nucleotides, to infer a phylogenetic tree or network. This requires models of character substitution. A group of these models is called the Abelian group-based models. The rate matrices of the Abelian group-based models can be diagonalised in a process often referred to as Hadamard conjugation in the literature. The time dependent probability distributions representing the probabilities of each combination of states across all taxa at any site in the sequence are referred to as phylogenetic tensors. The phylogenetic tensors representing a given tree or network can be expressed in the diagonalised basis that may allow them to be analysed more easily. We look at the diagonalising matrices of various Abelian group-based models in this thesis. We compare the phylogenetic tensors for various trees and networks for two, three and four taxa. If the probability spaces between one tree or network and another are not identical then there will be phylogenetic tensors that could have arisen on one but not the other. We call these two trees or networks distinguishable from each other. We show that for the binary symmetric model there are no two-taxon trees and networks that are distinguishable from each other, however there are three-taxon trees and networks that are distinguishable from each other. We compare the time parameters for the phylogenetic tensors for various taxon label permutations on a given tree or network. If the time parameters on one taxon label permutation in terms of the other taxon label permutation are all non-negative then we say that the two taxon label permutations are not network identifiable from each other. We show that some taxon label permutations are network identifiable from each other. We show that some four-taxon networks do not satisfy the four-point condition, while others do. There are two structures‚ÄövÑvp of four-taxon rooted trees. One of these structures is defined by the cluster, b,c,d, where the taxa are labelled alphabetically from left to right, starting with a. The network with this structure and convergence between the two taxa with the root as their most recent common ancestor satisfies the four-point condition. The phylogenetic tensors contain polynomial equations that cannot be easily solved for fourtaxon or higher trees or networks. We show how methods from algebraic geometry, such as Gr‚àö‚àÇbner bases, can be used to solve the polynomial equations. We show that some four-taxon trees and networks can be distinguished from each other.

## History

## Publication status

- Unpublished

## Rights statement

Copyright 2016 the Author## Repository Status

- Open