University of Tasmania
Barika_whole_thesis.pdf (9.35 MB)

Scheduling techniques for efficient execution of stream workflows in cloud environments

Download (9.35 MB)
posted on 2023-05-28, 09:30 authored by Barika, MSM
Advancements in Internet of Things (IoT) technology have led to the development of advanced applications and services that rely on data generated from enormous amounts of connected devices such as sensors, mobile devices and smart cars. These applications process and analyse such data as it arrives to unleash the potential of live analytics. Considering that our future world will be fully automated, current IoT applications and services are categorised as data-driven workflows, which integrate multiple analytical components. Examples of these workflow applications are smart farming, smart retail and smart transportation. This work flow application also known as a stream work flow is one type of big data workflow application and is becoming gradually viable for solving real-time data computation problems that are more complex. The use of cloud computing technology which can provide on demand and elastic resources to execute stream workflow applications is ideal, but additional challenges are raised due to the location of data sources and end users' requirements in terms of data processing and deadline for decision making. The focus of existing research works in this domain is on the streaming operator graph generated by streaming data platforms, where this graph differs from a stream workflow as there is a single source of data for the whole operator graph and one end operator, while stream workflow has multiple input data sources and multiple output streams. Moreover, the majority of those works investigated one type of runtime change for the streaming graph operator, which is the fluctuation of data. This means that the structural changes that may happen at runtime are not studied. Considering the heterogeneity and dynamic behaviour of stream workflows, these workflow applications have unique features that make the scheduling problem have different assumptions and optimisation goals compared with the placement problem of streaming graph operators. As a consequence, the execution of stream workflow applications on the cloud environment requires advanced scheduling techniques to address the aforementioned challenges as well as handling different runtime changes that may occur during the execution of these applications. To this end, the Multicloud environment approach opens the door toward enhancing the execution of workflow applications by leveraging various clouds to utilise data locality and exploit deployment flexibility. Thus, the problem of scheduling a stream workflow in a Multicloud environment while meeting user real-time data analysis requirements needs to be investigated. In this thesis, we leverage the Multicloud environment approach to design novel scheduling techniques to efficiently schedule outsourcing stream workflow applications over various cloud infrastructures while minimising the execution cost. We also design dynamic scheduling techniques to continuously manage resources to handle structural and non-structural changes at runtime in order to maintain user-defined performance requirements at minimal execution cost. In summary, this thesis makes the following concrete contributions: ‚Äö Comprehensive state of the art survey that analyses various big data workflow orchestration issues span over three different levels (workflow, data and cloud) by providing a research taxonomy of core requirements, challenges, and current tools, techniques and research prototypes. ‚Äö Simulation toolkit named IoTSim-Stream to model and simulate stream workflow applications in cloud computing environments. ‚Äö Two scheduling algorithms that generate scheduling plans at deployment time to execute stream workflow efficiently on cloud infrastructures with minimal monetary cost. ‚Äö Two-phase adaptive scheduling technique that considers the problem of scheduling stream workflows to support runtime data fluctuations while guaranteeing real-time performance requirements and minimising monetary cost. ‚Äö Pluggable dynamic scheduling technique that manages cloud resources over time to handle structural changes of stream workflow at runtime in a cost-effective manner, along with three plugin scheduling methods.


Publication status

  • Unpublished

Rights statement

Copyright 2020 the author Chapter 2 appears to be, in part, the equivalent of the author's version of a published article. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM computing surveys, 52(5), 95, Copyright 2019 Association for Computing Machinery 2019. Chapter 3 appears to be the equivalent of a pre-print version of an article published as: Barika, M., Garg, S., Chan, A., Calheiros, R. N., Ranjan, R., 2019. IoTSim-Stream: modelling stream graph application in cloud simulation, Future generation computer systems, 99, 86-105 Chapter 4 appears to be the equivalent of a pre-print version of an article published as: Barika, M., Garg, S., Chan, A., Calheiros, R. N., Scheduling algorithms for efficient execution of stream workflow applications in multicloud environments, IEEE transactions on services computing, doi: 10.1109/TSC.2019.2963382 Copyright Copyright 1969, IEEE. In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Tasmania's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to to learn how to obtain a License from RightsLink. Chapter 6 appears to be the equivalent of a pre-print version of an article published as: Barika, M., Garg, S., Ranjan, R., 2020. Cost effective stream workflow scheduling to handle application structural changes, Future generation computer systems , 112, 348-361

Repository Status

  • Open

Usage metrics

    Thesis collection


    No categories selected


    Ref. manager