Background Micro-biological research relies on the use of model organisms that act as representatives of their species or subspecies, these are frequently well-characterized laboratory strains. However, it has often become apparent that the model strain initially chosen does not represent important features of the species. For micro-organisms, the diversity of their genomes is such that even the best possible choice of initial strain for sequencing may not assure that the genome obtained adequately represents the species. To acquire information about a species' genome as efficiently as possible, we require a method to choose strains for analysis on the basis of how well they represent the species. Results We develop the Best Total Coverage (BTC) method for selecting one or more representative model organisms from a group of interest, given that rough genetic distances between the members of the group are known. Software implementing a "greedy" version of the method can be used with large data sets, its effectiveness is tested using both constructed and biological data sets. Conclusion In both the simulated and biological examples the greedy-BTC method outperformed random selection of model organisms, and for two biological examples it outperformed selection of model strains based on phylogenetic structure. Although the method was designed with microbial species in mind, and is tested here on three microbial data sets, it will also be applicable to other types of organism.
History
Publication title
BMC Microbiology
Volume
5
Issue
May
Pagination
1-11
ISSN
1471-2105
Department/School
School of Natural Sciences
Publisher
Biomed Central Ltd
Place of publication
236 Grays Inn Road, Floor 6, London, England, WC1X