Research Paper - Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem

The resources for the research paper Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem, a proof of concept for the core of Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture, which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

Architecture

Core Components

The core components of the platform are:

Infrastructure (Kubernetes)
Data Ingestion (Argo Workflows + Python)
Data Storage (MinIO)
Data Processing/Query (Dremio)

Initial Model

To visualise the interactions of the current implementation, the C4 software architecture model (Context, Containers, Components, and Code) is used.

The following is a simplified view of the initial architecture model (all the abstractions are combined together).

Prerequisites

ASDF, Linux operating system, and Docker Engine (tested with asdf 0.11.1, Ubuntu 20.04.5 LTS, and Docker Engine Community 23.0.1).

The following tools are used in the development:

Helm
Kubectl
Kustomize

They could be installed with corresponding versions via asdf:

asdf install

Clusters

Check the clusters section for more details about the infrastructure setup.

Applications

Check the applications section for more details about the application setup.

Pipelines

Check the pipelines section for more details about the pipeline setup.

Benchmarking

Check the benchmarking section for more details about the pipeline setup.

Author Contributions

Ahmed AbouZaid: Conceptualization, Methodology, Software, Validation, Data curation, Writing–original draft, Writing–review & editing. Peter J. Barclay: Conceptualization, Methodology, Writing–review & editing, Supervision. Christos Chrysoulas: Conceptualization, Writing–review & editing. Nikolaos Pitropakis: Conceptualization, Writing–review & editing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
applications		applications
benchmarking		benchmarking
clusters		clusters
pipelines		pipelines
.gitattributes		.gitattributes
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
initial-architecture-data-flow.png		initial-architecture-data-flow.png
initial-architecture-model.png		initial-architecture-model.png
queries-performance-with-cache-enabled.png		queries-performance-with-cache-enabled.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Paper - Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem

Contents

Architecture

Core Components

Initial Model

Prerequisites

Clusters

Applications

Pipelines

Benchmarking

Author Contributions

About

Uh oh!

Languages

License

aabouzaid/modern-data-platform-research-paper

Folders and files

Latest commit

History

Repository files navigation

Research Paper - Building A Modern Data Platform Based On The Data Lakehouse Architecture And Cloud-Native Ecosystem

Contents

Architecture

Core Components

Initial Model

Prerequisites

Clusters

Applications

Pipelines

Benchmarking

Author Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages