Hierarchical clustering using homogeneity as similarity measure for big data analytics

Zhao, Y; Wong, R; Chi, C-H; Zhou, W; Ding, C; Wang, C

File(s) under permanent embargo

Hierarchical clustering using homogeneity as similarity measure for big data analytics

conference contribution

posted on 2023-05-23, 11:17 authored by Zhao, Y, Wong, R, Chi, C-H, Zhou, W, Ding, C, Wang, C

In big data analytics, clustering plays a fundamental and decisive role in supporting pattern mining and value creation. To help improve user experience and satisfaction level of clustering algorithms, one important key is to let users define the quality of the aggregated clusters (e.g. in terms of the homogeneity and the relative population of each resulting cluster) they prefer instead of to fix the number of clusters to be obtained before the clustering process. In this paper, we first propose a new measure, called the Clustering Performance Index (or CPI), that takes into consideration of homogeneity, relative population, and number of clusters aggregated. Then we propose a new hierarchical clustering algorithm by adopting homogeneity as its key similarity. Experimental results show that our proposed clustering algorithm can achieve a good balance among CPI, the number of clusters aggregated, and the time cost of the algorithm.

History

Publication title

Proceedings of the 12th IEEE International Conference on Services Computing

Editors

PP Maglio, I Paik, W Chou

Pagination

348-354

ISBN

9781467372817

Department/School

School of Information and Communication Technology

Publisher

IEEE-Inst Electrical Electronics Engineers Inc

Place of publication

New York, USA

Event title

12th IEEE International Conference on Services Computing

Event Venue

New York City, NY, USA

Date of Event (Start Date)

2015-06-27

Date of Event (End Date)

2015-07-02

Rights statement

Repository Status

Restricted

Socio-economic Objectives

Expanding knowledge in the information and computing sciences

Usage metrics

Keywords

keywords-clustering homogeneity relative population clustering performance index

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Hierarchical clustering using homogeneity as similarity measure for big data analytics

History

Publication title

Editors

Pagination

ISBN

Department/School

Publisher

Place of publication

Event title

Event Venue

Date of Event (Start Date)

Date of Event (End Date)

Rights statement

Repository Status

Socio-economic Objectives

Usage metrics

Categories

Keywords

Licence

Exports