Skip to content
Change the repository type filter

All

    Repositories list

    • mpire

      Public
      A Python package for easy multiprocessing, but faster than multiprocessing
      Python
      41000Updated Sep 10, 2021Sep 10, 2021
    • hydra

      Public
      Hydra is a framework for elegantly configuring complex applications
      Python
      755000Updated Mar 25, 2021Mar 25, 2021
    • klio

      Public
      Smarter data pipelines for audio.
      Python
      52000Updated Oct 16, 2020Oct 16, 2020
    • Neuraxle

      Public
      Build neat pipelines with the right abstractions to do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
      Python
      62000Updated Dec 19, 2019Dec 19, 2019
    • faust

      Public
      Python Stream Processing
      Python
      535000Updated Aug 4, 2018Aug 4, 2018
    • thredo

      Public
      Python
      18000Updated Aug 1, 2018Aug 1, 2018
    • bloop

      Public
      A hot bloop for your productivity
      Scala
      208000Updated Jun 5, 2018Jun 5, 2018
    • Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
      Python
      535000Updated Jan 21, 2018Jan 21, 2018
    • fireant

      Public
      Data analysis and reporting tool for quick access to custom charts and tables in Jupyter Notebooks and in the shell.
      Python
      20000Updated Nov 30, 2017Nov 30, 2017
    • gain

      Public
      Web crawling framework based on asyncio for everyone.
      Python
      205000Updated Jun 19, 2017Jun 19, 2017
    • Persimmon

      Public
      A visual dataflow programming language for sklearn
      Python
      41000Updated May 18, 2017May 18, 2017
    • ufora

      Public
      Compiled, automatically parallel Python for data science
      Python
      28000Updated May 27, 2016May 27, 2016
    • dask

      Public
      Task scheduling and blocked algorithms for parallel processing
      Python
      1.8k000Updated Jul 6, 2015Jul 6, 2015
    • disque

      Public
      Disque is a distributed message broker
      C
      536000Updated Apr 30, 2015Apr 30, 2015
    • HiBench

      Public
      HiBench is a Hadoop benchmark suite.
      Java
      771000Updated Apr 17, 2015Apr 17, 2015
    • This repository hold the Amazon Elastic MapReduce sample bootstrap actions
      Python
      304000Updated Apr 14, 2015Apr 14, 2015
    • rabit

      Public
      Reliable Allreduce and Broadcast Interface for distributed machine learning
      C++
      181000Updated Mar 27, 2015Mar 27, 2015
    • crawler4j

      Public
      Open Source Web Crawler for Java
      Java
      1.9k000Updated Mar 4, 2015Mar 4, 2015
    • grpc-java

      Public
      The Java gRPC implementation. HTTP/2 based RPC
      Java
      4k000Updated Feb 26, 2015Feb 26, 2015
    • Java
      124000Updated Feb 25, 2015Feb 25, 2015
    • Spark and Redshift integration
      Scala
      348000Updated Feb 6, 2015Feb 6, 2015
    • Scala
      73000Updated Jan 31, 2015Jan 31, 2015
    • A tool for managing Apache Kafka.
      Scala
      2.5k000Updated Jan 29, 2015Jan 29, 2015
    • flink

      Public
      Mirror of Apache Flink
      Java
      14k000Updated Jan 16, 2015Jan 16, 2015
    • Runs embedded, in-memory Apache Kafka instances. Helpful for integration testing.
      Scala
      13000Updated Dec 3, 2014Dec 3, 2014
    • dataduct

      Public
      DataPipeline for humans.
      Python
      81000Updated Nov 28, 2014Nov 28, 2014
    • A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, Baidu and others) by using proxies (socks4/5, http proxy) and with many different IP's, including asynchronous networking support (very fast).
      Python
      747000Updated Nov 20, 2014Nov 20, 2014
    • pyspider

      Public
      A Powerful Spider System with Web UI
      Python
      3.7k000Updated Nov 18, 2014Nov 18, 2014
    • samoa

      Public
      SAMOA (Scalable Advanced Massive Online Analysis) is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
      Java
      77000Updated Nov 8, 2014Nov 8, 2014
    • kangaroo

      Public
      Hadoop utilities for Kafka
      Java
      35000Updated Oct 23, 2014Oct 23, 2014