✅ Robust Data Quality Checks
dbt-assertions ensures thorough data quality assessments at the row level,
enhancing the reliability of downstream models.
🔍 Efficient Exception Detection
Granular row-by-row exception detection identifies and flags specific rows that fails assertions, streamlining the resolution process.
🛠️ Customizable Assertions & Easy Integration
Easy-to-use macros assertions() and assertions_filter() empower users to
customize without barriers data quality checks within the model YAML definition,
adapting to specific data validation needs.
🚀 An Easy Shift from your Actual Workflows
A generic test generic_assertions() to perform dbt tests as usual,
testing the package easily without compromising your current workflows.
you can test the package with this generic test easily without having to rebuild you table
- Features
- Content
- Install
- Supported databases
- Dependencies
- Variables
- Basic Example
- Documentation
- Contribution
- Acknowledgments
- Contact
dbt-assertions currently supports dbt 1.7.x or higher.
Check dbt package hub for the latest installation instructions, or read the docs for more information on installing packages.
Include in packages.yml
packages:
- package: AxelThevenot/dbt_assertions
version: [">=1.0.0", "<3.0.0"]
# <see https://github.com/AxelThevenot/dbt-assertions/releases/latest> for the latest version tag| Database | assertions | assertions_filter | __unique__ | __not_null__ | generic_assertions |
|---|---|---|---|---|---|
| BigQuery (default) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Snowflake | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| DuckDB | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Databricks | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Redshift | ✅ | ✅ (Include/Exclude Fields Not Supported) |
✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Athena | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Clickhouse | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Spark | ✅ | ✅ | ✅ (not nested fields) |
✅ (not nested fields) |
✅ |
| Others | ℹ️ | ℹ️ | ℹ️ | ℹ️ | ℹ️ |
- ✅: supported
- ✔️: supported not tested
- ❌: not supported
- ℹ️: Opened to contributions ❤️ (see the integration_tests folder on how to test your adapter)
For latest release, see https://github.com/AxelThevenot/dbt-assertions/releases
This package do not have dependencies.
The following variable can be defined in your dbt_project.yml file to change the default exceptions column name used by the package.
vars:
"dbt_assertions:default_column": "exceptions"Check the basic_example example.
assertions() macro generates a select expression for row-level assertions.
Arguments:
- column (optional[str]): column to read the assertions from.
This macro parses the schema model YAML to extract row-level assertions; custom assertions, unique, and not-null. It then constructs an array of exceptions for each row based on its assertions results.
By default, it will generate assertions based on
your YAML model definition
reading configuration for a column named exceptions.
You can call the macro using column argument to change this default column.
SELECT
*,
{{ dbt_assertions.assertions(column='warnings') }},
FROM {{ ref('my_model') }}Note: this macro is made to generate assertions based on you query result. It means it must be generated at the end of the query.
WITH
[...] -- Other CTEs
final AS (
SELECT
[...]
FROM {{ ref('my_model') }}
)
-- After query results
SELECT
*,
{{ dbt_assertions.assertions() }},
FROM finalassertions_filter() macro generates an expression to filter rows based on
assertions results, generated with the assertions() macro.
Arguments:
- column (optional[str]): Column to read the exceptions from.
- exclude_list (optional[list[str]]): Assertions to exclude in the filter.
- include_list (optional[list[str]]): Assertions to include in the filter.
- reverse (optional[bool]): returns rows without exception when
reverse=false, and rows with exceptions whenreverse=true.
By default, each row with exception(s) will be filtered.
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter() }}You can change this default behaviour specifying an
optional exclude_list or include_list argument (not both).
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter(exclude_list=['assertions_id']) }}Generates a test to get rows based on exceptions.
It will returns the rows without any exception by default. You can change this default behaviour specifying a exclude_list or include_list (not both).
You must defined beforehand the assertions for the model. More on YAML definition for assertions.
Arguments:
- column (optional[str]): Column to read the exceptions from.
- exclude_list (optional[list[str]]): Assertions to exclude in the filter.
- include_list (optional[list[str]]): Assertions to include in the filter.
- re_assert (optional[bool]): to set to
trueif your assertion field is not calculated in your table.
Configure the generic test in schema.yml with either
The configuration valid for dbt Core and Fusion, with test arguments under arguments
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
arguments:
[column: <column_name>]
[exclude_list: <list(str_to_filter)>]
[include_list: <list(str_to_filter)>]
[re_assert: true | false]
columns:
...Or the configuration valid for dbt Core only
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
[column: <column_name>]
[exclude_list: <list(str_to_filter)>]
[include_list: <list(str_to_filter)>]
[re_assert: true | false]
columns:
...[] represents optional parts. Yes everything is optional but let's see it by examples.
In the basic test example
you can easily create your test as follows then run your dbt test command.
models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
arguments:
column: exceptions
include_list:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: true
columns:
...The assertions definition must be created under a column definition of your model and respects the following.
assertions:
[__unique__: <unique_expression>]
[__not_null__: __unique__ | <not_null_expression>]
[<custom_assertion_id>:
description: [<string>]
expression: <string>
null_as_exception: [<bool>]]
...[] represents optional parts.
Yes, everything is optional but let's see it by examples.
Custom assertions are the basics assertions.
The package is made to support every assertions as long as it is supported in a SELECT statement of your underlying database. So you can do a lot of things.
It is represented as key values. Keys are the ID of the assertions.
Each assertions is defined by at least an expression which will be rendered
to be evaluated as your test.
description and null_as_exception are optional.
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by my_id)"
site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"
site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)null_as_exception is an optional configuration for your assertion.
Default to false it is the return result if your expression is evaluated to NULL.
Default behaviour is set to false because one assertion must evaluate on thing.
Prefer using the __not_null_ helper instead.
In our previous, if we want to also avoid NULL site types.
assertions:
...
site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)
null_as_exception: trueAs guaranteeing uniqueness of rows is a concern in most of the use cases,
the __unique__ helper is here to avoid writing complex and repetitive queries.
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by key_1, key_2)"The above configuration is the same as writing
assertions:
__unique__:
- key_1
- key_2You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion for the 0-depth guaranteeing uniqueness across the rows.
- One assertion for each repeated field guaranteeing uniqueness within the row.
The following example will generate the assertions:
unique: Row must be unique over the unique keys.nested_1__unique: Items must be unique within nested_1 in the row.nested_1.nested_2__unique: Items must be unique within nested_1.nested_2 in the row.
assertions:
__unique__:
- key_1
- key_2
- nested_1:
- key_3
- key_4
- nested_2:
- key_5
- key_6As guaranteeing not null values is also concern in most of the use cases,
the __not_null__ helper is here to avoid writing complex and repetitive queries.
assertions:
key_1__not_null:
description: "key_1 is not null."
expression: "key_1 IS NOT NULL"
key_2__not_null:
description: "key_2 is not null."
expression: "key_2 IS NOT NULL"The above configuration is the same as writing
assertions:
__unique__:
- key_1
- key_2
__not_null__:
- key_1
- key_2And as the two helpers are often linked, you can rewrite the assertions as follows, which is also the same.
assertions:
__unique__:
- key_1
- key_2
__not_null__: __unique__You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion for each column of the 0-depth guaranteeing not null.
- One assertion for each column under the repeated field guaranteeing all the values are not null within the row.
The following example will generate the assertions:
key_1__not_null: key_1 is not null.key_2__not_null: key_2 is not null.nested_1.key_3__not_null: nested_1.key_3 are not null.
assertions:
__not_null__:
- key_1
- key_2
- nested_1:
- key_3If exceptions column is not a naming convention you like,
you can still opt for a column name you choose and the macro will
still work with the from_colum argument.
You can also play with multiple columns.
model:
name: my_model
columns:
...
- errors
config:
meta:
assertions:
__unique__:
- key_1
- key_2
__not_null__: __unique__
- warns
config:
meta:
assertions:
site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"And in your model query.
WITH final AS
(
SELECT ...
)
SELECT
*,
{{ dbt_assertions.assertions(column='errors') }},
{{ dbt_assertions.assertions(column='warns') }}
FROM {{ ref('my_model') }}If you want to contribute, please open a Pull Request or an Issue on this repo. Feel free to reach me Linkedin.
Special thank to
- Victor Vaneecloo for the generic test
- Guillaume Blaquiere for its help in terminology
- Daniel Bartley for its help in terminology
- Benoit Perigaud for the Snowflake support
- Ferdy Hulzebos for the Databricks support
- Rich Herman for the Redshift support
- Vaisakh Mohan for the Athena support
- François Lenne for the Clickhouse support
- François Lenne for the Spark support
If you have any question, please open a new Issue or feel free to reach out to Linkedin
Happy coding with dbt-assertions!
