posted on 2023-05-28, 12:15authored byPatwary, MAK
This dissertation addresses the problem of dynamic graph partitioning in a streaming manner in the cloud. This problem applies to real-world graph applications such as PageRank, Social Networks, Shortest Path and so on. The scale of graphs of these applications has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning and wise resource allocation are necessary for these large graph applications. At the beginning of this study, this dissertationevaluates two existing streaming graph partitioning algorithms in the cloud. After having completed the empirical study of these algorithms in the cloud machines, we identified the following research problems: 1) There are no existing streaming graph partitioning methods to find an optimised number of machines and scale the resources, as per the demands of an ever-increasing graph dataset. 2) How can we minimise the number of edge-cuts while balancing the load in a streaming manner in the cloud environment? 3) How can we use dynamic graph partitioning in a streaming manner to reduce the edge-cuts and the load imbalance during the partitioning? We also address the scaling of the resources as per the demands of dynamic graph data. Streaming graph partitioning is a variant of traditional graph partitioning which accepts graph input in a one-pass manner. This partitioning technique was introduced to overcome a memory bottleneck issue in traditional graph partitioning. In streaming graph partitioning, it is necessary to utilise the resources as per the demands of graph data stream. This thesis proposes an auto-scaling algorithm to determine the required number of machines, based on the upcoming stream data rate and the service time at the worker machines. The proposed method helps to minimise the cost and provide the best use of cloud resources by allocating the number and types of machines wisely. Once the optimised resources and costs are fixed, this study looks into the problem of graph partitioning in a streaming manner, with the aim of minimising inter-machine communication and reducing the computational load imbalance as much as possible. In order to achieve these goals, we propose a window-based, streaming graph partitioning algorithm. The proposed method utilises sliding window technology with a partitioning strategy and the load balancing method as well. After exploring the streaming graph partitioning with the static datasets, we studied the problem of the dynamic behaviour of graph datasets. This problem was how to partition dynamic graph data while minimising the edge-cut and keeping the computational load imbalance to a minimum. In addition to this partitioning technique, we also proposed an auto-scaling algorithm which adaptively scales in and out the machines, as per the demands of the computational cost.
History
Publication status
Unpublished
Rights statement
Copyright 2020 the author Chapter 5 appears to be the equivalent of a post-print version of a conference paper. Copyright 2019 Association for Computing Machinery. Reprinted, with permission, from Patwary, Md A. K., Garg, S, Kang, B., Window-based streaming graph partitioning algorithm, Proceedings of the Australasian Computer Science Week Multiconference, ACS W-2019, Sydney, 2019.