The EUBra-BIGSEA QoS IaaS infrastructure relies on a set of robust components supporting the capacity to meet a set of deadlines associated to the applications. Fast reaction and accuracy are key factors in this challenging process.
To this respect, dagSim is a discrete event simulator component working on a DAG (Directed Acyclic Graph) corresponding to a MapReduce Apache Tez Spark and COMPSs models able to estimate Big Data applications performance efficiently. Such frameworks can analyse very efficiently large amounts of unstructured data and it has been adopted in multiple application domains, e.g., machine learning, graph processing, and data mining.
In this context, one of the main challenges is that the execution time of a big data jobs is generally unknown. Because of this, predicting the execution time of big data applications is usually done empirically through experimentation, requiring a costly setup. In alternative, it is possible to develop models and software tools for predicting performance. dagSim addresses these issues.
dagSim is characterized by the following strategic features: Fast performance, user friendly, no need to train the users and currently no competitors (being its application target highly specific). Existing literature has been focusing on Hadoop 1.0 only, therefore considering Map/Reduce without DAGs and allocating statically the containers to the two stages, while dagSim, performs a dynamic allocation of the resources reflecting a more realistic scenario.
Essential information for potential users
Currently, dagSim is used within academic environments, though any industrial sectors making use of Big Data could deploy it:
Communication, Life Sciences, Manufacturing and Financial Services, and Smart Cities just to mention a few.