metadv is a Python library for generating SQL models from a declarative YAML configuration. It supports multiple data modeling approaches including Data Vault 2.0, Anchor Modeling, and Dimensional Modeling, with template packages for popular dbt libraries.
- Declarative Configuration: Define your data model structure in a single YAML file
- Multiple Modeling Approaches: Support for Data Vault 2.0, Anchor Modeling, and Dimensional Modeling
- Template Packages: Works with automate_dv, datavault4dbt, and dimensional templates
- Custom Templates: Add your own template packages for different frameworks
- Validation: Validates your configuration before generating models
- CLI & Library: Use as a command-line tool or import as a Python library
pip install metadv- Create a
metadv.ymlfile in your dbt project'smodels/metadv/folder - Define your targets (entities and relations) and source mappings
- Run the generator to create SQL models
See sample_metadv.yml for a complete example configuration.
# Generate Data Vault models using automate_dv
metadv /path/to/dbt/project --package datavault-uk/automate_dv
# Validate only (don't generate)
metadv /path/to/dbt/project --package datavault-uk/automate_dv --validate-only
# Generate to custom output directory
metadv /path/to/dbt/project --package datavault-uk/automate_dv --output ./output
# Show detailed output including warnings
metadv /path/to/dbt/project --package datavault-uk/automate_dv --verbose
# Output results as JSON
metadv /path/to/dbt/project --package datavault-uk/automate_dv --jsonfrom metadv import MetaDVGenerator
# Initialize generator with package name
generator = MetaDVGenerator('/path/to/dbt/project', 'datavault-uk/automate_dv')
# Validate configuration
result = generator.validate()
if result.errors:
print("Validation errors:", [e.message for e in result.errors])
# Generate SQL models
success, error, files = generator.generate()
if success:
print(f"Generated {len(files)} files")
else:
print(f"Error: {error}")| Package | Description | Generated Models |
|---|---|---|
datavault-uk/automate_dv |
Data Vault 2.0 using automate_dv | Stage, Hub, Link, Satellite |
scalefreecom/datavault4dbt |
Data Vault 2.0 using datavault4dbt | Stage, Hub, Link, Satellite |
dimensional |
Dimensional Modeling | Dimension, Fact |
metadv:
# Optional: custom templates directory (relative to project root or absolute)
templates-dir: ./my-templates
# Optional: custom validations directory (relative to project root or absolute)
validations-dir: ./my-validations
# Define your targets (entities and relations)
targets:
- name: customer
type: entity
description: Customer business entity
- name: order
type: entity
description: Order business entity
- name: customer_order
type: relation
description: Customer to order relationship
entities:
- customer
- order
# Define source models and their column mappings
sources:
- name: raw_customers
columns:
- name: customer_id
target:
- target_name: customer # Entity key connection
- name: customer_name
target:
- attribute_of: customer # Attribute connection
- name: raw_orders
columns:
- name: order_id
target:
- target_name: order
- name: customer_id
target:
- target_name: customer_order
entity_name: customer # Which entity in the relation
- name: order_date
target:
- attribute_of: order
multiactive_key: true # Mark as multiactive key| Field | Description |
|---|---|
templates-dir |
Optional path to custom templates directory (relative to project root or absolute). Templates here take precedence over built-in templates. |
validations-dir |
Optional path to custom validations directory (relative to project root or absolute). Custom validators with the same class name as built-in ones will override them. |
targets |
Array of target definitions (entities and relations) |
sources |
Array of source model definitions with column mappings |
| Type | Description | Data Vault Output | Dimensional Output |
|---|---|---|---|
entity |
A business entity (e.g., Customer, Product) | Hub + Satellite | Dimension |
relation |
A relationship between entities | Link + Satellite | Fact |
Each column has a target array that can contain multiple connections:
| Field | Description |
|---|---|
target_name |
Target entity/relation this column identifies (creates key) |
entity_name |
For relation connections: which entity within the relation |
entity_index |
For self-referencing relations: entity position (0-indexed) |
attribute_of |
Target this column is an attribute of (satellite/dimension payload) |
target_attribute |
Custom display name for the attribute |
multiactive_key |
Mark as multiactive key column (useful for Data Vault) |
-
Entity/Relation Key Connections (
target_name): Link a source column to a target. The column becomes a business key. -
Attribute Connections (
attribute_of): Link a source column as an attribute of a target. The column becomes part of the satellite or dimension payload.
For satellites with multiple active records per business key, mark one or more columns as multiactive keys:
- name: phone_number
target:
- attribute_of: customer
multiactive_key: true # This column distinguishes active recordsMultiactive key columns are:
- Used to identify unique records (can be used as child key within the satellite)
- Excluded from the payload columns
- Generate
ma_sat_models instead ofsat_models using condition in templates.yml
metadv validates your configuration and reports:
Errors (must be fixed before generating):
- Relations missing entity connections from sources
Warnings (recommendations):
- Entities without source connections
- Targets without descriptions
- Columns without any connections
Run with --validate-only to check your configuration without generating files.
You can create custom template packages by setting templates-dir in your metadv.yml to point to a directory containing your templates. This directory should contain package folders with:
- A
templates.ymlfile defining template configurations - SQL template files using Jinja2 and Python string.Template syntax
Templates in your custom directory take precedence over built-in templates with the same package name.
You can create custom validators by setting validations-dir in your metadv.yml to point to a directory containing your validation Python files.
- Create a Python file in your validations directory (e.g.,
my_validation.py) - Import the base class from metadv
- Create a class that inherits from
BaseValidator - Implement the
validate()method
# my_validation.py
from metadv.validations.base import BaseValidator, ValidationContext, ValidationMessage
from typing import List
class MyCustomValidator(BaseValidator):
def validate(self, ctx: ValidationContext) -> List[ValidationMessage]:
messages = []
# Your validation logic here
if some_condition:
messages.append(ValidationMessage(
type='error', # or 'warning'
code='my_error_code',
message='Something is wrong'
))
return messagesTo override a built-in validator, create a custom validator with the same class name. For example, to override the EntityNoSourceValidator:
# entity_no_source.py (in your validations-dir)
from metadv.validations.base import BaseValidator, ValidationContext, ValidationMessage
from typing import List
class EntityNoSourceValidator(BaseValidator):
"""Custom implementation that overrides the built-in validator."""
def validate(self, ctx: ValidationContext) -> List[ValidationMessage]:
# Your custom logic here
return []The templates.yml file defines which templates to generate for each domain (entity, relation, source):
# Templates for entity targets (e.g., Hub, Dimension)
entity:
hub: # Template key (arbitrary name)
template: hub.sql # Template file to use
filename: "hub/hub_{entity_name}.sql" # Output filename pattern
scope: entity # Generator scope (see below)
sat:
template: sat.sql
filename: "sat/sat_{entity_name}__{source_name}.sql"
scope: source # One file per source-target pair
condition: has_attributes # Only generate if condition is true
ma_sat:
template: ma_sat.sql
filename: "sat/ma_sat_{entity_name}__{source_name}.sql"
scope: source
condition: is_multiactive # Only for multiactive satellites
# Templates for relation targets (e.g., Link, Fact)
relation:
link:
template: link.sql
filename: "link/link_{relation_name}.sql"
scope: relation
sat:
template: sat.sql
filename: "sat/sat_{relation_name}__{source_name}.sql"
scope: source
condition: has_attributes
# Templates for source models (e.g., Stage)
source:
stage:
template: stage.sql
filename: "stage/stg_{source_name}.sql"
scope: source| Field | Description |
|---|---|
template |
SQL template filename in the package folder |
filename |
Output path pattern with placeholders like {entity_name}, {source_name}, {relation_name} |
scope |
Determines generator type and context passed to template |
condition |
Optional condition that must be true to generate this template |
| Scope | Generator | Description |
|---|---|---|
entity |
TargetGenerator | One file per entity target |
relation |
TargetGenerator | One file per relation target |
source |
SourceTargetGenerator | One file per source-target pair |
attribute |
AttributeGenerator | One file per individual attribute |
source (in source domain) |
SourceGenerator | One file per source model (for staging) |
| Condition | True when |
|---|---|
has_attributes |
Source has attribute columns for this target |
is_multiactive |
Source has multiactive key columns for this target |
Templates receive context variables based on their scope. Use Python ${variable} syntax for initial substitution, then Jinja2 {{ variable }} for dbt rendering:
Entity scope: entity_name, source_refs
Relation scope: relation_name, entities, source_refs, fk_columns
Source scope (source-target): source_name, source_model, entity_name/relation_name, attributes, key_column, columns
Attribute scope: entity_name/relation_name, source_name, source_model, attribute_name, column, key_column
Source scope (source): source_name, columns
MIT License