Administrators & cloud service providers
Open Source communities & application developers
QoS Cloud services
dagSim is a discrete event simulator, developed by Politecnico di Milano, that can be used to study the performance of DAG based processes such as Map/Reduce, Tez or Apache Spark applications. It simulates the execution of jobs, composed by several stages, each one divided into a set of identical tasks that can be run in parallel on multiple computation resources. A Direct Acyclic Graph (DAG) defines the rules according to which stages should be executed. The tool allows advanced scripting features based on the Lua programming language, and it supports several probability distributions (including non-parametric models obtained directly from a dataset) to define the execution times of tasks.
Leverage the tool for distributed cloud services, Big Data infrastructures, business analytics software development, performance assessment, benchmarking of computing infrastructures in general. dagSim is used within academic environments and for research purposes, currently integrated as part of model-based resource provisioning and auto-scaling algorithms.
User needs:
Predict performances of DAG based parallel computation frameworks
Validate benchmarking suites
Accuracy and fast performance
Specific benefits:
Accuracy and efficiency of the simulations
Performance prediction and optimization
User scenario
An autoscaling component of a BigData application deployment that performs queries which can be described using DaG based workflows, can use the tool after having performed a minimal benchmarking campaign, to define the initial resource requirements in term of Virtual Machines to be provisioned, and decide when to release or acquire new resources to respect Service Level Agreements.
The development of the first version of the tool has been completed. The next version is under debugging. New features considering allocation policies to support different Big Data and High Performance computing frameworks are being implemented.
The tool can be download from https://github.com/eubr-bigsea/dagSim
The tool is configured through Lua scripts, which however resemble classical textual based configuration files. For this reason, the actual usage of the tool does not require specific skills, though the definition of the model requires at least a BSc including a course in Statistics. Moreover, script generation can also be automated using for example the SparkLogParser developed in the project. A knowledge of the Lua programming language can improve the outcomes the user can achieve from the tools.
The tool is based on the Lua programming language, which however is based on the MIT license (https://www.lua.org/license.html) that allows the use of its source code at absolutely no cost and no “copyleft” restrictions.
The cost for using the tool is very limited, since it can produce results very efficiently. An organisation that produces Big Data Analytics software that can be described with DAGs, would require around 1 day for performing basic benchmarking, and one hour to produce the configuration script and run the tool. Benchmarking and script production can however be automated in the deployment phase, reducing the usage cost to few minutes of machine execution time, using for example the SparkLogParser developed in the project.
Prof. Marco Gribaudo of Politecnico di Milano: marco.gribaudo@polimi.it
View related publications
--> E. Barbierato, M. Gribaudo, and D. Manini. "Fluid approximation of pool depletion systems", 23rd International Conference on Analytical & Stochastic Modelling Techniques & Applications (ASMTA '16) E. Barbierato, M. Gribaudo, and D. Manini, pages 60-75. Springer International Publishing, Cham, 2016.
--> E. Gianniti, A. M. Rizzi, E. Barbierato, M. Gribaudo, D. Ardagna, "Fluid Petri Nets for the Performance Evaluation of MapReduce Applications InfQ 2016 - New Frontiers in Quantitative Methods in Informatics, INFQ 2016 Workshop Proceedings.
--> D. Ardagna, E. Barbierato, A. Evangelinou, E. Gianniti, M. Gribaudo, T. B. M. Pinto, A. Guimarães, A. P. Couto da Silva, J. M. Almeida. "Performance Prediction of Cloud-Based Big Data Applications". ICPE 2018, to Appear
Not directly connected to dagSim, but supporting the tool:
--> E.Barbierato, M.Gribaudo and M. Iacono, "Modeling Hybrid Systems in SIMTHESys", Eighth International Workshop on Practical Applications of Stochastic Modelling (PASM '16)", Electronic Notes on Theoretical Computer Science, vol. 327, pp. 5-25 (October 2016), Elsevier, ISSN: 1571-0661, DOI: 10.1016/j.entcs.2016.09.021
--> E. Barbierato, M. Gribaudo, and M. Iacono, "Simulating hybrid systems within SIMTHESys multi-formalism models. 13th European Workshop on Performance Engineering Chios (Greece), October 5-7 2016, Lecture Notes on Computer Science, 9951, pp. 189-203 (October 2016), Springer, ISSN: 0302-9743