Skip to content

gridatek/kldp

Repository files navigation

KLDP - Kubernetes Local Data Platform

A batteries-included local data engineering platform running on Minikube. Start with Airflow, add Spark, and scale to a full data stack - all on your laptop.

🎯 Vision

KLDP provides a production-like Kubernetes data platform for local development, testing, and learning. No cloud costs, no complex setup - just run a script and start building pipelines.

🚀 Quick Start

# Prerequisites: Docker, Minikube, Helm, kubectl

# Clone and setup
git clone https://github.com/gridatek/kldp.git
cd kldp

# Start with Airflow only
./scripts/install-airflow.sh

# Or install everything
./scripts/install-all.sh

# Access Airflow UI
minikube service airflow-webserver -n airflow

📦 Components

Phase 1 (Current)

  • Apache Airflow - Workflow orchestration with KubernetesExecutor
  • PostgreSQL - Metadata database
  • MinIO - S3-compatible object storage

Phase 2 (Planned)

  • 🔄 Spark Operator - Distributed data processing
  • 🔄 Sample Pipelines - Airflow + Spark integration examples

Phase 3 (Future)

  • 📋 Prometheus + Grafana - Monitoring and observability
  • 📋 Kafka - Streaming data platform
  • 📋 JupyterHub - Interactive notebooks
  • 📋 Data Catalog - Metadata management

🛠️ Project Structure

kldp/
├── core/
│   └── airflow/              # Airflow Helm configs
├── compute/
│   └── spark/                # Spark operator configs
├── storage/
│   ├── minio/                # Object storage
│   └── postgresql/           # Shared database
├── monitoring/
│   └── observability/        # Prometheus, Grafana
├── scripts/
│   ├── init-cluster.sh       # Initialize minikube
│   ├── install-airflow.sh    # Install Airflow
│   ├── install-spark.sh      # Install Spark operator
│   ├── install-all.sh        # Full stack installation
│   └── destroy.sh            # Cleanup everything
├── examples/
│   ├── airflow-basics/       # Basic Airflow DAGs
│   └── spark-pipeline/       # Airflow + Spark examples
└── docs/
    ├── GETTING_STARTED.md
    ├── ARCHITECTURE.md
    └── TROUBLESHOOTING.md

💻 System Requirements

Minimal (Airflow only)

  • 4 CPU cores
  • 8 GB RAM
  • 20 GB disk space

Recommended (Full stack)

  • 6 CPU cores
  • 12 GB RAM
  • 40 GB disk space

🎓 Use Cases

  • Learning: Hands-on experience with production data tools
  • Development: Test pipelines locally before deploying to prod
  • Prototyping: Experiment with data architectures risk-free
  • Teaching: Workshop and training material

📚 Documentation

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - See LICENSE for details.

🙏 Acknowledgments

Built on top of:


Note: KLDP is optimized for local development. For production deployments, use managed services or properly configured production clusters.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages