The EUBra-BIGSEA QoS IaaS infrastructure relies on a set of robust components supporting the capacity to meet a set of deadlines associated to the applications. Fast reaction and accuracy are key factors in this challenging process.
To this respect, dagSim is a discrete event simulator component working on a DAG (Directed Acyclic Graph) corresponding to a MapReduce Apache Tez Spark and COMPSs models able to estimate Big Data applications performance efficiently. Such frameworks can analyse very efficiently large amounts of unstructured data and it has been adopted in multiple application domains, e.g., machine learning, graph processing, and data mining.
In this context, one of the main challenges is that the execution time of a big data jobs is generally unknown. Because of this, predicting the execution time of big data applications is usually done empirically through experimentation, requiring a costly setup. In alternative, it is possible to develop models and software tools for predicting performance. dagSim addresses these issues.
dagSim is characterized by the following strategic features: Fast performance, user friendly, no need to train the users and currently no competitors (being its application target highly specific). Existing literature has been focusing on Hadoop 1.0 only, therefore considering Map/Reduce without DAGs and allocating statically the containers to the two stages, while dagSim, performs a dynamic allocation of the resources reflecting a more realistic scenario.
Essential information for potential users
- dagSIM is currently still under development. The internal releases implement different event allocation policies in order to get closer to Big Data and High Performance computing frameworks using models different than DAGs.
- dagSIM uses a model of the DAG, which should be provided manually.
- The tool uses currently LUA language (https://www.lua.org), which however is not an essential component and can be removed.
Currently, dagSim is used within academic environments, though any industrial sectors making use of Big Data could deploy it:
Communication, Life Sciences, Manufacturing and Financial Services, and Smart Cities just to mention a few.