Ophidia is a CMCC Foundation research project addressing Big Data challenges in eScience. It exploits advanced parallel computing techniques and a hierarchical storage organization to execute intensive data analysis over multi-terabytes datasets.
Ophidia provides a Big Data analytics framework for parallel I/O and the analysis of multi-dimensional datasets. It leverages the datacube abstraction and comes with an extensive set of OLAP-oriented parallel operators, supporting e.g. datacube sub-setting, datacube aggregation, NetCDF file import and export, datacube intercomparison. Additionally it provides several primitives to operate on n-dimensional arrays that allow, for example, sub-setting, data aggregation, array concatenation, algebraic expressions, predicate evaluation, statistical analysis and regression.
Key features and benefits.
- Big data analytics framework for scientific multi-dimensional data.
- Hierarchical storage organization to partition and distribute data across multiple nodes.
- Server-side approach with multiple standard interfaces.
- Wide set of parallel (MPI-based) operators and array-based primitives for data analytics.
- Operators for metadata, provenance and search & discovery.
- Workflow support or data analytics experiments
Essential information for potential users
The latest Ophidia release is v1.0.0 (released in March 2017).
Open source framework released under the GPLv3 license.
It can be installed on Linux Debian/RedHat-based operating systems. Most libraries and tools dependencies are automatically solved when installing the binary packages, while MySQL server and Slurm should be manually installed and configured. This is being addressed (as cluster deployment in a cloud environment) in the frame of EUBra-BIGSEA and full automatic installation will be feasible by the end of the project.
Ophidia can be exploited by users through the Ophidia terminal (shell-like client) or PyOphidia (python bindings). To support the user, the terminal provides auto-completion features and an online manual for all commands and operators available.
Ophidia is used mainly in scientific sectors like in the climate change domain. It has been extended and used in several research projects like: FP7 EUBRazilCloudConnect, FP7 CLIP-C and H2020 INDIGO-DataCloud.
- A. D'Anca, C. Palazzo, D. Elia, S. Fiore, I. Bistinas, K. Böttcher, V. Bennett, G. Aloisio, “On the Use of In-Memory Analytics Workflows to Compute eScience Indicators from Large Climate Datasets”. 1st Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics. 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Madrid, Spain, May 14-17, 2017. [To appear]
- S. Fiore, et al, "Distributed and cloud-based multi-model analytics experiments on large volumes of climate change data in the earth system grid federation eco-system”. 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, 2016, pp. 2911-2918.
- D. Elia, S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams, G. Aloisio, “An in-memory based framework for scientific data analytics”. In Proceedings of the ACM International Conference on Computing Frontiers (CF ’16), May 16-19, 2016, Como, Italy, pp. 424-429.