Joel_Thesis_Final.pdf (8.45 MB)
DynamicWEB : a conceptual clustering algorithm for a changing world
thesisposted on 2023-05-26, 04:28 authored by Joel ScanlanJoel Scanlan
This research was motivated by problems in network security, where an attacker often deliberately changes their identifying information and behaviour in order to camouflage their malicious behaviour. Addressing this problem has resulted in a new adaption to the unsupervised machine learning technique COBWEB. In machine learning and data mining the aim is to extract patterns from data in order to discover a meaning underlying the processes that are taking place. In most cases, each object is observed once, and then the patterns that have been extracted can be used to classify newly-observed objects. Conceptual clustering aims to do this in such a way that the patterns that are learned are human readable. Concept drift algorithms allow concepts to change over time, although most undertake this in a supervised manner, which presents a challenge when looking for novel classes. This research focuses on the classification of objects that change over time across multiple observations. The objects may change their own characteristics (labelled as object drift in this research) or maintain the same characteristics, but change their identifier. In addition to this, it is also possible for the concept that describes a group of objects to itself change (known as concept drift). In addition to the possible application within the security domain, the method was generalised and tested across a range of machine learning and data mining domains. In the process it was shown that the method was robust in the presence of concept drift, which occurs when a group of objects that define a given concept change their characteristics, resulting in the definition of that concept having changed over time. The ideas of concept drift and object drift are not only relevant within the computer security field, but can be of significance in any knowledge domain. Therefore, any method presented to address this learning problem should be generalised enough to be applicable in many application areas. The new method, entitled DynamicWEB, extends the existing conceptual clustering method COBWEB to allow for profiles to be added and removed from the concept hierarchy. An index structure was implemented using an AVL tree to facilitate fast scalable searching of the knowledge structure. As the target objects change over time the profiles of each target are updated within the structure, maintaining an up-to-date representation of the domain. The profiles contain derived attributes, which are formed across multiple observations of each object, with the aim of retaining knowledge of how the object has changed over time. As well as preserving context over time, Dynamic Web uses multiple trees and so, transforms the learner into an ensemble classifier. In addition to testing the method on the security and network based datasets, a number of other datasets are also examined. A new dataset (a modified version of Quinlan's weather dataset) is presented in order to illustrate how Dynamic Web operates in the presence of object drift. The method is also tested on several wellknown machine learning datasets, some of which exhibit concept drift. Along with these artificial datasets, a group of real-world datasets, including several sourced from the Australian Bureau of Statistics, were also examined, illustrating DynamicWEB's ability to adapt to change. This thesis describes the work done to enable DynamicWEB to adapt to both concept drift and object drift, both of which are characteristic of many application domains. DynamicWEB is also capable of profiling an object across multiple observations to allow for accurate prediction and inter-object relationship discovery.
Rights statementCopyright Copyright 2011 the author