Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 32 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,38 @@
![img](docs/flow_php_banner_02_2022.png)

Flow is a PHP based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.
Flow is a PHP-based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.

[![Latest Stable Version](https://poser.pugx.org/flow-php/flow/v)](https://packagist.org/packages/flow-php/flow)
[![Latest Unstable Version](https://poser.pugx.org/flow-php/flow/v/unstable)](https://packagist.org/packages/flow-php/flow)
[![License](https://poser.pugx.org/flow-php/flow/license)](https://packagist.org/packages/flow-php/flow)
[![Test Suite](https://github.com/flow-php/flow/actions/workflows/test-suite.yml/badge.svg?branch=1.x)](https://github.com/flow-php/flow/actions/workflows/test-suite.yml)

Supported PHP versions
Supported PHP versions: [![PHP 8.1](https://img.shields.io/badge/php-~8.1-8892BF.svg)](https://php.net/) [![PHP 8.2](https://img.shields.io/badge/php-~8.2-8892BF.svg)](https://php.net/)

* [![Supported PHP Version](https://img.shields.io/badge/php-~8.1-8892BF.svg)](https://php.net/)
* [![Supported PHP Version](https://img.shields.io/badge/php-~8.2-8892BF.svg)](https://php.net/)
## Features

* low and constant memory consumption
* asynchronous data processing
* reading from any data source
* writing to any data source
* rich collection of data transformation functions
* direct access to remote filesystems
* partitioning
* grouping & aggregating
* remote file processing
* joins
* sorting
* displaying datasets as ASCII table
* validation against the schema
* window functions
* caching

📈[Project Roadmap](https://github.com/orgs/flow-php/projects/1)

## Installation

This package is a [monorepo](https://tomasvotruba.com/blog/2019/10/28/all-you-always-wanted-to-know-about-monorepo-but-were-afraid-to-ask/).
Please check below packages and select only those that you are going to use,
Please check the below packages and select only those that you are going to use,
this will reduce the number of unnecessary dependencies in your project (less maintenance).

- [ETL](src/core/etl/README.md)
Expand All @@ -38,10 +53,12 @@ this will reduce the number of unnecessary dependencies in your project (less ma
- [text](src/adapter/etl-adapter-text/README.md)
- [xml](src/adapter/etl-adapter-xml/README.md)
- Libraries
- [array-dot](src/lib/array-dot/README.md) - auto included
- [array-dot](src/lib/array-dot/README.md)
- [doctrine-dbal-bulk](src/lib/doctrine-dbal-bulk/README.md)
- [Google Dremel algorithm](src/lib/dremel/README.md)
- [Parquet](src/lib/parquet/README.md)

For example if you want to work with json/csv files here are dependencies you will need to install:
For example, if you want to work with JSON/CSV files here are the dependencies you will need to install:

```shell
composer require flow-php/etl:^0.1 flow-php/etl-adapter-csv:^0.1 flow-php/etl-adapter-json:^0.1
Expand All @@ -53,40 +70,22 @@ In order to understand how Flow works, please read [documentation](src/core/etl/

### [Usage Examples](examples/README.md)

## Features

* low and constant memory consumption
* asynchronous data processing
* reading from any data source
* writing to any data source
* rich collection of data transformation functions
* direct access to remote filesystems
* partitioning
* grouping & aggregating
* remote files processing
* joins
* sorting
* displaying datasets as ASCII table
* validation against schema
* window functions
* caching

## Asynchronous Processing

* [etl-adapter-amphp](https://github.com/flow-php/etl-adapter-amphp)
* [etl-adapter-reactphp](https://github.com/flow-php/etl-adapter-reactphp)

## Building blocks

* DataFrame - Lazy data processing frame.
* Rows - Immutable collection of `Row` objects.
* Row - Immutable, strongly typed collection of `Entry` objects.
* Entry - Immutable, strongly typed object representing cell in a row.
* Entry - Immutable, strongly typed object representing a cell in a row.
* **E**xtractor (Reader) - Memory safe, Data Source returning \Generator, yielding `Rows` to the `Pipeline`
* **T**ransformer - Data transformer receiving and returning `Rows` (in most cases transformer), one instance of `Rows` at once.
* **L**oader (Writer) - Memory safe representation of Data Sink, responsibility of Loader is to write `Rows` into destination storage, one at time.
* **L**oader (Writer) - Memory safe representation of Data Sink, the responsibility of Loader is to write `Rows` into destination storage, one at time.
* Pipeline - Interface representing ETL process, each received `Rows` instanced is passed through all `Pipes`, also responsible for error handling.
* Pipe - Loader of Transformer instance existing in `Pipes` collection.
* Pipe - Loader of Transformer instance existing in the `Pipes` collection.

## Asynchronous Processing

* [etl-adapter-amphp](https://github.com/flow-php/etl-adapter-amphp)
* [etl-adapter-reactphp](https://github.com/flow-php/etl-adapter-reactphp)

### GitHub Stars

Expand Down