Skip to content
View rafpyprog's full-sized avatar
🎉
getpwd 0.1.0 released!
🎉
getpwd 0.1.0 released!

Block or report rafpyprog

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Spark

35 repositories

A simplified, lightweight ETL Framework based on Apache Spark

Scala 586 158 Updated Jan 24, 2024

Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

Scala 446 251 Updated Sep 4, 2025

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala 1,362 782 Updated Jan 28, 2025

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Scala 258 74 Updated Feb 21, 2023

Read and write Tensorflow TFRecord data from Apache Spark.

Scala 294 56 Updated Apr 22, 2024

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,500 28,974 Updated Dec 16, 2025

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,642 2,258 Updated Dec 1, 2025

Notes talking about the design and implementation of Apache Spark

5,348 1,837 Updated Apr 2, 2024

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Python 3,864 942 Updated Jul 10, 2023

Simple and Distributed Machine Learning

Scala 5,191 854 Updated Dec 15, 2025

Koalas: pandas API on Apache Spark

Python 3,368 366 Updated Mar 20, 2024

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Jupyter Notebook 2,690 732 Updated Nov 19, 2025

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Java 3,408 1,170 Updated Nov 29, 2025

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Scala 2,270 401 Updated Sep 29, 2023

Distributed Deep learning with Keras & Spark

Python 1,577 311 Updated May 1, 2023

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Jupyter Notebook 1,667 917 Updated Mar 16, 2024

MLeap: Deploy ML Pipelines to Production

Scala 1,527 315 Updated Dec 16, 2025

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Python 1,538 233 Updated Dec 2, 2024

Jupyter magics and kernels for working with remote Spark clusters

Python 1,363 455 Updated Sep 9, 2025

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Python 2,128 98 Updated Dec 2, 2025

Qubole Sparklens tool for performance tuning Apache Spark

Scala 586 143 Updated Jun 26, 2024

This is a repo documenting the best practices in PySpark.

Jupyter Notebook 462 77 Updated Dec 8, 2022

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Scala 430 116 Updated Jan 14, 2022

Easy to use library to bring Tensorflow on Apache Spark

Python 297 45 Updated Oct 11, 2023

Jupyter Notebook extension for Apache Spark integration

JavaScript 191 32 Updated Dec 1, 2020

Create HTML profiling reports from Apache Spark DataFrames

Python 197 76 Updated Feb 2, 2020

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and…

HTML 192 44 Updated Jun 1, 2021

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Scala 157 86 Updated Nov 28, 2025

Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.

Python 102 16 Updated Aug 20, 2019

A tool to validate data, built around Apache Spark.

Scala 100 34 Updated Dec 15, 2025