File(s) under permanent embargo
Novel feature selection algorithms for improving neural network performance
thesisposted on 2023-05-27, 11:14 authored by Zhao, Z
Data mining and machine learning have become enormously pivotal in this Big Data time, as people are tremendously eager to predict from what they have known, and foresee the unknown. During the process of machine learning, selecting features that are genuinely useful and helpful to the prediction tasks remains a central issue, because a smaller but more relevant dataset can improve the performance of machine learning. In this thesis, the whole procedure of feature selection is investigated, beginning with a new data preparing method, Average Random Choosing Method (ARCM), which can solve Class Imbalance Problem. Experimental results show that ARCM can significantly improve the prediction accuracy of Artificial Neural Network (ANN) for imbalanced datasets, in comparison with other researches' results. After that, the new filter named Consistency Concentration Based Feature Selection (CCBFS) is introduced. It is based on the conception of consistency based feature selection (CBFS). CCBFS can evaluate each individual feature and works with both nominal and continuous features, which used to be a shortage of consistency based algorithms. A GA based wrapper is proposed after CCBFS. This Accumulate Elitism Genetic Algorithm (AEGA) searching approach for feature selection inherits the advantages of GA based methods: quick parallel searching and large searching space. AEGA also covers the limitation of GA such as slow converging speed. Especially for ANN, AEGA has good resistance on the unstable performance of MLP and shortens the training times. In the end, the hybrid structure Multi Combined Filter-Wrapper Feature Selection (MCFWFS) is introduced. This novel algorithm uses CCBFS to make a pre-selection and accelerate converging process for AEGA. The multi combined structure ensures fast computing time and large improvement of predictor's performances at the same time. It is designed for a wide range of real world problems, even with extremely large datasets. The main contributions of this research include a novel data preparing method (ARCM), a novel filter for feature ranking (CCBFS), a novel GA based wrapper (AEGA) and a novel structure of hybrid feature selection (MCFWFS). Experiments have demonstrated that these new algorithms have outstanding performances on many kinds of datasets and real world problems in the Machine Learning categories of classification and multi class recognition. This research provides new solutions for improving the machine learning performances and make up shortages of current feature selection methods at some extent.
Rights statementCopyright 2016 the author Parts of chapter 4 appears to be the equivalent of a pre-print version of an article published as: Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., Wasinger, R., 2015. Investigation and improvement of multi-layer perceptron neural networks for credit scoring, Expert systems with applications, 42(7), 3508-3516 A small portion of chapter 4 appears to be the equivalent of a pre-print version of an article published as: Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., 2014. Investigation of multilayer perceptron and class imbalance problems for credit rating. International journal of computer and information technology, 3940, 805-812. Parts of chapter 7 have been published as: Zhao, Z., Xu, S., Kang, B. H., Kabir, M. M. J., Liu, Y., 2015. Instance selection and optimization of neural networks. Information technology in industry, 3(1), 1-9, which iss published under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) License, (https://creativecommons.org/licenses/by/3.0/)