PRIVAaaS is a software toolkit that provides a set of libraries and tools that allow to control and reduce the data leakage in the context of Big Data processing and, consequently, to protect sensible information that is part of the EUBra-BIGSEA framework.
The process is divided into two perspectives which model different aspects of the anonymization problem: the first perspective is related to the anonymization of the loaded input data, while the second is related to the anonymization of the data resulting from the data processing algorithms. The result is output data that is anonymized for the intended usage scenario.

The process starts with the definition step done by users who have knowledge of privacy policies that will guide the anonymization process according to a set of rules which are implemented by an ontology. To maximize data utility while preserving low levels of disclosure risk, these policies govern 2 key anonymization phases: 1) the anonymization of raw data and how each algorithm can use each set of data; and 2) the anonymization of the data provided by the analytics algorithm to the end user, to avoid that knowledge that is extracted by the algorithm is unduly accessed.

Features, Benefits & Downloads

  • Implements different anonymization techniques and models.
  • Off-the-shelf component, requiring minimal configuration to be used.
  • Based on open source dependencies.
  • Data can be uploaded through json/csv files.
  • The anonymization policy guides the anonymization process, which can be done automatically, letting privacy specialists free from this task.
  • The ontology describes a vocabulary for anonymization and can be adapted to different policies for different scenarios.


Essential information for potential users

  • PRIVAaaS is still under development. A Java library was released to be used in the ETL process and a web service was released to be used during the data analytics process. Improvements and the addition of anonymization models are ongoing.
  • The anonymization process implemented by PRIVAaaS is based on data definition, so it is necessary to know the datasets in which it will be applied.
  • Users need some knowledge about privacy and anonymization to define the anonymization policies, i.e., to select and classify correctly the attributes (identifiers, quasi-identifiers, sensible) and the anonymization techniques to be applied to each attribute, as each one has its particularity.
  • The anonymization process is automated, based on predefined rules described by the policy.



PRIVAaaS has been developed to be used in the EUBra-BIGSEA use cases. Concretely, it was designed to be used in scenarios that involve the processing of massive amounts of data that may contain or lead to privacy-sensitive information.
Furthermore, as data anonymization is currently relevant in several sectors, PRIVAaaS can be used in all sectors that take advantage of Big Data analytics techniques: as government, health, private companies, etc.


Papers & References

  • Basso, T.; Matsunaga, R. ; Moraes, R. ; Antunes, N. .Challenges on Anonymity, Privacy and Big Data. In: Workshop on Dependability in Evolving Systems, 2016, Cali, Colombia. 7th Latin-American Symposium on Dependable Computing, 2016. 
  • Basso,T.; Moraes, R.; Antunes, N.; Vieira, M.; Santos, W.; Meira,W. “PRIVAaaS: privacy approach for a distributed cloud-based data analytics platforms”. IN: International Workshop On Assured Cloud Computing And QoS Aware Big Data, 2017, Madrid, Spain. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017.
  • Matsunaga, R.; Ricarte, I.; Basso, T.; Moraes, R. “Towards an Ontology-Based definition of Data Anonymization Policy for Cloud Computing and Big Data”. In: International Workshop on Recent Advances in the Dependability Assessment of Complex Systems, 2017, Denver, EUA. 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017.
  • Ferreira, A.; Basso, T.; Silva, H.; Moraes, R. “PRIVA: a policy-based anonymization library for cloud and big data platform”. In: XVIII Workshop de Testes e Tolerância a Falhas, 2017, Belém, Brazil. XXXV Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, 2017.