Hierarchical clustering using homogeneity as similarity measure for big data analytics
Version 2 2025-01-15, 01:14Version 2 2025-01-15, 01:14
Version 1 2023-05-23, 11:17Version 1 2023-05-23, 11:17
conference contribution
posted on 2025-01-15, 01:14authored byY Zhao, R Wong, C-H Chi, W Zhou, C Ding, C Wang
In big data analytics, clustering plays a fundamental and decisive role in supporting pattern mining and value creation. To help improve user experience and satisfaction level of clustering algorithms, one important key is to let users define the quality of the aggregated clusters (e.g. in terms of the homogeneity and the relative population of each resulting cluster) they prefer instead of to fix the number of clusters to be obtained before the clustering process. In this paper, we first propose a new measure, called the Clustering Performance Index (or CPI), that takes into consideration of homogeneity, relative population, and number of clusters aggregated. Then we propose a new hierarchical clustering algorithm by adopting homogeneity as its key similarity. Experimental results show that our proposed clustering algorithm can achieve a good balance among CPI, the number of clusters aggregated, and the time cost of the algorithm.
History
Publication title
Proceedings of the 12th IEEE International Conference on Services Computing
Volume
95
Editors
PP Maglio, I Paik, W Chou
Pagination
348-354
ISBN
9781467372817
Department/School
Information and Communication Technology
Publisher
IEEE-Inst Electrical Electronics Engineers Inc
Publication status
Published
Place of publication
New York, USA
Event title
12th IEEE International Conference on Services Computing
Event Venue
New York City, NY, USA
Date of Event (Start Date)
2015-06-27
Date of Event (End Date)
2015-07-02
Rights statement
Copyright 2015 IEEE
Socio-economic Objectives
280115 Expanding knowledge in the information and computing sciences