Detecting potential labeling errors for bioinformatics by multiple voting

Guan, D; Yuan, W; Ma, T; Lee, S

File(s) under permanent embargo

Detecting potential labeling errors for bioinformatics by multiple voting

journal contribution

posted on 2023-05-19, 14:02 authored by Guan, D, Yuan, W, Ma, T, Lee, S

Classification techniques are important in bioinformatics analysis as they can separate various bioinformatical data into distinct groups. To obtain good classifiers, accurate labeling of the training data is required. However labeling in practical bioinformatics applications might be erroneous due to various reasons. To identify those mislabeled data, an ensemble learning based scheme, single-voting has been widely used. It generates multiple classifiers and makes use of their voting to detect mislabeled data. Single-voting scheme mainly consists of two components: data partitioning component to generate multiple classifiers, and mislabeled detection component to identify mislabeled data. Existing works in this field mainly focus on mislabeled detection part and neglect data partitioning. However, our analysis shows that data partitioning plays an important role in single-voting scheme. This analysis helps us proposing a novel multiple-voting scheme. It is superior to traditional single-voting by reducing the unreliable influence from data partitioning. Empirical and theoretical evaluations on a set of bioinformatics datasets illustrate the utility of our proposed scheme.

History

Publication title

Knowledge-Based Systems

Volume

66

Pagination

28-35

ISSN

0950-7051

Department/School

School of Engineering

Publisher

Elsevier Science Bv

Place of publication

Po Box 211, Amsterdam, Netherlands, 1000 Ae

Rights statement

Repository Status

Restricted

Socio-economic Objectives

Information systems, technologies and services not elsewhere classified

Usage metrics

Keywords

bioinformatics analysis mislabeled data detection single-voting multiple-voting classification

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Detecting potential labeling errors for bioinformatics by multiple voting

History

Publication title

Volume

Pagination

ISSN

Department/School

Publisher

Place of publication

Rights statement

Repository Status

Socio-economic Objectives

Usage metrics

Categories

Keywords

Licence

Exports