This article investigates internet commerce security applica- tions of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification meth- ods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and so- phisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast super- vised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Cen- tre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme.
History
Publication title
Knowledge Management and Acquisition for Smart Systems and Services
Editors
Byeong-Ho Kang & Debbie Richards
Pagination
235-246
ISBN
978-3-642-15036-4
Department/School
School of Information and Communication Technology