Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 68 additions & 12 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,77 @@
# DataHub Roadmap

Here is DataHub's roadmap for the next six months (starting Jan 2021).
Here is DataHub's roadmap for the next six months (until end of the year 2021).

We publish only a short roadmap, because we are evolving very fast and want to adapt to the community's needs. We will be checking off against this roadmap as we make progress over the next few months.
We publish only a short six month roadmap for the future, because we are evolving very fast and want to adapt to the community's needs. We will be checking off against this roadmap as we make progress over the next few months.

**Caveat**: ETA-s are subject to change. Do let us know before you commit to your stakeholders about deploying these capabilities at your company.

If you would like to suggest new items or request timeline changes to the existing items, please submit your request through this [form](https://docs.google.com/forms/d/1znDv7_CXXvUDcUsqzq92PgGqPSh_1yeYC3cl2xgizSE/) or submit a GitHub [feature request](https://github.com/linkedin/datahub/issues/new?assignees=&labels=feature-request&template=--feature-request.md&title=A+short+description+of+the+feature+request).

Of course, you always have access to our community through [Slack](https://slack.datahubproject.io) or our [town halls](townhalls.md) to chat with us live!

# Current Roadmap
## Q3 2021 [Jul - Sept 2021]

### Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- [ ] Support for data profiling and preview extraction through ingestion pipeline

### Data Quality
- [ ] Support for data quality visualization
- [ ] Support for data health score based on data quality results and pipeline observability
- [ ] Integration with systems like Great Expectations, AWS deequ, dbt test etc.

### Fine-grained Access Control for Metadata
- [ ] Support for role-based access control on metadata (view, edit etc.)

### Column-level lineage
- [ ] Metadata Model
- [ ] SQL Parsing

### Operational Metadata
- [ ] Partitioned Datasets
- [ ] Support for operational signals like completeness, freshness etc.


## Q4 2021 [Oct - Dec 2021]

### Data Lake Ecosystem Integration
- [ ] Spark Delta Lake
- [ ] Apache Iceberg
- [ ] Apache Hudi

### Metadata Trigger Framework
- [ ] Stateful sensors for Airflow
- [ ] Receive events for you to send alerts, email
- [ ] Slack integration

### ML Ecosystem
- [ ] Features
- [ ] Models
- [ ] Notebooks

### Metrics Ecosystem
- [ ] Measures, Dimensions
- [ ] Relationships to Datasets and Dashboards

### Data Mesh oriented features
- [ ] Data Product modeling
- [ ] Analytics to enable Data Meshification

### Collaboration
- [ ] Conversations on the platform
- [ ] Knowledge Posts (Gdocs, Gslides, Gsheets)


## Beyond the horizon

### Let us know what you want!
- Submit requests [here](https://docs.google.com/forms/d/1znDv7_CXXvUDcUsqzq92PgGqPSh_1yeYC3cl2xgizSE/) or
- Submit a GitHub [feature request](https://github.com/linkedin/datahub/issues/new?assignees=&labels=feature-request&template=--feature-request.md&title=A+short+description+of+the+feature+request).


# Historical Roadmap
## Q1 2021 [Jan - Mar 2021]

### React UI
Expand All @@ -30,7 +92,7 @@ Of course, you always have access to our community through [Slack](https://slack
### SSO for Authentication
- [x] Support for Authentication (login) using OIDC providers (Okta, Google etc)

### Tags
### Tags
Use-Case: Support for free-form global tags for social collaboration and aiding discovery
- [x] Edit / Create new tags
- [x] Attach tags to relevant constructs (e.g. datasets, dashboards, users, schema\_fields)
Expand Down Expand Up @@ -58,7 +120,7 @@ Use Case: See sample data for a dataset and statistics on the shape of the data
- [ ] How-to guides for deploying DataHub to all the major cloud providers
- [x] AWS
- [ ] Azure
- [ ] GCP
- [x] GCP


### Data Quality
Expand All @@ -71,7 +133,7 @@ Use Case: See sample data for a dataset and statistics on the shape of the data
- [x] Integration with common systems like Google Analytics etc.

### Usage-Based Insights
- [ ] Display frequently used datasets, dashboards
- [x] Display frequently used datasets, etc.
- [ ] Improved search relevance through usage data

### Role-based Access Control
Expand All @@ -81,14 +143,8 @@ Use Case: See sample data for a dataset and statistics on the shape of the data

### No-code Metadata Model Additions
Use Case: Developers should be able to add new entities and aspects to the metadata model easily
- [ ] No need to write any code (in Java or Python) to store, retrieve, search and query metadata
- [x] No need to write any code (in Java or Python) to store, retrieve, search and query metadata
- [ ] No need to write any code (in GraphQL or UI) to visualize metadata

## Beyond the horizon

### Let us know what you want!
- Submit requests [here](https://docs.google.com/forms/d/1znDv7_CXXvUDcUsqzq92PgGqPSh_1yeYC3cl2xgizSE/) or
- Submit a GitHub [feature request](https://github.com/linkedin/datahub/issues/new?assignees=&labels=feature-request&template=--feature-request.md&title=A+short+description+of+the+feature+request).