The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their abil- ity to achieve agreement with classes published in the biological literature be- fore. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be re- garded as points in a finite dimensional space. This is why it is necessary to de- velop novel machine learning approaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the dis- crete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are e␣cient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified ver- sion of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
History
Publication title
Proceedings of Future Generation Information Technology
Editors
Lee YH, Kim TH, Fang WC, Slezak D
Pagination
308-316
ISBN
978-3-642-10508-1
Department/School
School of Information and Communication Technology
Publisher
Springer-Verlag
Place of publication
New York, USA
Event title
Future Generation Information Technology
Event Venue
Jeju Island, Korea
Date of Event (Start Date)
2009-12-10
Date of Event (End Date)
2009-12-12
Rights statement
The original publication is available at http://www.springerlink.com
Repository Status
Restricted
Socio-economic Objectives
Information systems, technologies and services not elsewhere classified