DQaaS - Data Quality-as-a-Service
DQaaS (Data Quality-as-a-Service) aims to provide information about the quality of a requested dataset. Data Quality helps applications and users to understand the degree with which a dataset is suitable for their goals. In particular, considering a dataset, the service (i) offers the access to different quality metrics periodically evaluated and (ii) allows applications and users to define and assess personalized quality metrics.
DQaaS is designed for dealing with Big Data, thus it addresses volume and velocity requirements. In particular, the algorithms have been developed on architectures able to support parallelization and when applications/users request real time quality analyses, only a sample of data will be considered. These choices aim to reduce the impact that such service can have on the system performance.
- DQaaS is currently still under development and some preliminary tests have been conducted in the academic environment.
- As of the end of April 2017, a first release is available on GitHub https://github.com/eubr-bigsea/DQaaS A second release is scheduled for September 2017.
- At the moment DQaaS uses as input data the data sources available for EUBra-BIGSEA use case. Open data can also be used.
- The results that are expressed in terms of data quality dimensions.
Politecnico di Milano (POLIMI) designed and developed the tool and data quality assessment algorithms. Currently, effort is dedicated to the improvement of the interaction between users/applications and the data quality service. Developed under the EUBra-BIGSEA mandate, the tool can be used by data scientists or software developer to understand the quality of the datasets that they are considering for the development of their big data applications. It is an adaptive preprocessing service that can easily be exploited on various big data platforms.
- Tiago Brasileiro, Cinzia Cappiello, Nadia P. Kozievitch, Demetrio Mestre, Carlos Eduardo Pires, Monica Vitali, "Towards Reliable Data Analyses for Smart Cities” Accepted for publication in proceedings of the 21st International Database Engineering & Applications Symposium IDEAS ’17.