Research Paper - Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem
The resources for the research paper Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem, a proof of concept for the core of Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem
to build a resilient Big Data platform based on Data Lakehouse architecture, which is the base for
Machine Learning (MLOps) and Artificial Intelligence (AIOps).
The core components of the platform are:
- Infrastructure (Kubernetes)
- Data Ingestion (Argo Workflows + Python)
- Data Storage (MinIO)
- Data Processing/Query (Dremio)
To visualise the interactions of the current implementation, the C4 software architecture model (Context, Containers, Components, and Code) is used.
The following is a simplified view of the initial architecture model (all the abstractions are combined together).
ASDF, Linux operating system, and Docker Engine (tested with asdf 0.11.1, Ubuntu 20.04.5 LTS, and Docker Engine Community 23.0.1).
The following tools are used in the development:
- Helm
- Kubectl
- Kustomize
They could be installed with corresponding versions via asdf:
asdf installCheck the clusters section for more details about the infrastructure setup.
Check the applications section for more details about the application setup.
Check the pipelines section for more details about the pipeline setup.
Check the benchmarking section for more details about the pipeline setup.
Ahmed AbouZaid: Conceptualization, Methodology, Software, Validation, Data curation, Writing–original draft, Writing–review & editing. Peter J. Barclay: Conceptualization, Methodology, Writing–review & editing, Supervision. Christos Chrysoulas: Conceptualization, Writing–review & editing. Nikolaos Pitropakis: Conceptualization, Writing–review & editing.


