You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/comparisons.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
There are many tools and frameworks in the data ecosystem. This page tries to make sense of it all.
6
6
7
7
## dbt
8
-
[dbt](https://www.getdbt.com/) is a tool for data transformations. It is a pioneer in this space and has shown how valuable transformation frameworks can be. Although dbt is a fanstastic tool, it has trouble scaling with data and organizational size.
8
+
[dbt](https://www.getdbt.com/) is a tool for data transformations. It is a pioneer in this space and has shown how valuable transformation frameworks can be. Although dbt is a fantastic tool, it has trouble scaling with data and organizational size.
9
9
10
10
dbt built their product focused on simple data transformations. By default, it fully refreshes data warehouses by executing templated SQL in the correct order.
11
11
@@ -107,7 +107,7 @@ WHERE d.ds BETWEEN @start_ds AND @end_ds
107
107
#### Data leakage
108
108
dbt does not check whether the data inserted into an incremental table should be there or not. This can lead to problems and consistency issues, such as late-arriving data overriding past partitions. These problems are called "data leakage."
109
109
110
-
SQLMesh wraps all queries in a subquery with a time filter under the hood to enforce that the data inserted for a particular batch is as expected and reproducible everytime.
110
+
SQLMesh wraps all queries in a subquery with a time filter under the hood to enforce that the data inserted for a particular batch is as expected and reproducible every time.
111
111
112
112
In addition, dbt only supports the 'insert/overwrite' incremental load pattern for systems that natively support it. SQLMesh enables 'insert/overwrite' on any system, because it is the most robust approach to incremental loading, while 'Append' pipelines risk data inaccuracy in the variety of scenarios where your pipelines may run more than once for a given date.
Copy file name to clipboardExpand all lines: docs/concepts/models/overview.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ The `SELECT` expression of a model must follow certain conventions for SQLMesh t
33
33
### Unique column names
34
34
The final `SELECT` of a model's query must contain unique column names.
35
35
36
-
### Explict types
36
+
### Explicit types
37
37
SQLMesh encourages explicit type casting in the final `SELECT` of a model's query. It is considered a best practice to prevent unexpected types in the schema of a model's table.
38
38
39
39
SQLMesh uses the postgres `x::int` syntax for casting; the casts are automatically transpiled to the appropriate format for the execution engine.
@@ -55,11 +55,11 @@ This example demonstrates non-inferrable, inferrable, and explicit aliases:
55
55
```sql linenums="1"
56
56
SELECT
57
57
1, -- not inferrable
58
-
x +1, -- not infererrable
59
-
SUM(x), -- not infererrable
58
+
x +1, -- not inferrable
59
+
SUM(x), -- not inferrable
60
60
x, -- inferrable as x
61
61
x::int, -- inferrable as x
62
-
x +1AS x, --explictly x
62
+
x +1AS x, --explicitly x
63
63
SUM(x) as x, -- explicitly x
64
64
```
65
65
@@ -87,7 +87,7 @@ Name is ***required*** and must be ***unique***.
87
87
- Start is used to determine the earliest time needed to process the model. It can be an absolute date/time (`2022-01-01`), or a relative one (`1 year ago`).
88
88
89
89
### cron
90
-
- Cron is used to schedule your model to process or refresh at a certain interval. It uses [croniter](https://github.com/kiorky/croniter) under the hood, so expressions such as `@daily` can be used. A model's `IntervalUnit` is determined implicity by the cron expression.
90
+
- Cron is used to schedule your model to process or refresh at a certain interval. It uses [croniter](https://github.com/kiorky/croniter) under the hood, so expressions such as `@daily` can be used. A model's `IntervalUnit` is determined implicitly by the cron expression.
91
91
92
92
### storage_format
93
93
- Storage format is a property for engines such as Spark or Hive that support storage formats such as `parquet` and `orc`.
@@ -112,7 +112,7 @@ For models that are incremental, the following parameters can be specified in th
112
112
- Batch size is used to optimize backfilling incremental data. It determines the maximum number of intervals to run in a single job. For example, if a model specifies a cron of `@hourly` and a batch_size of `12`, when backfilling 3 days of data, the scheduler will spawn 6 jobs. (3 days * 24 hours/day = 72 hour intervals to fill. 72 intervals / 12 intervals per job = 6 jobs.)
113
113
114
114
## Macros
115
-
Macros can be used for passing in paramaterized arguments such as dates, as well as for making SQL less repetitive. By default, SQLMesh provides several predefined macro variables that can be used. Macros are used by prefixing with the `@` symbol. For more information, refer to [macros](../macros.md).
115
+
Macros can be used for passing in parameterized arguments such as dates, as well as for making SQL less repetitive. By default, SQLMesh provides several predefined macro variables that can be used. Macros are used by prefixing with the `@` symbol. For more information, refer to [macros](../macros.md).
116
116
117
117
## Statements
118
118
Models can have additional statements that run before the main query. This can be useful for loading things such as [UDFs](../glossary.md#user-defined-function-udf).
Copy file name to clipboardExpand all lines: docs/concepts/models/python_models.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Python models
2
2
3
-
Although SQL is a powerful tool, some use cases are better handled by Python. For example, Pyton may be a better option in pipelines that involve machine learning, interacting with external APIs, or complex business logic that cannot be expressed in SQL.
3
+
Although SQL is a powerful tool, some use cases are better handled by Python. For example, Python may be a better option in pipelines that involve machine learning, interacting with external APIs, or complex business logic that cannot be expressed in SQL.
4
4
5
5
SQLMesh has first-class support for models defined in Python; there are no restrictions on what can be done in the Python model as long as it returns a Pandas or Spark DataFrame instance.
Copy file name to clipboardExpand all lines: docs/concepts/overview.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
This page provides a conceptual overview of what SQLMesh does and how its components fit together.
4
4
5
5
## What SQLMesh is
6
-
SQLMesh is a Python framework that automates everything needed to run a scaleable data transformation platform. SQLMesh works with a variety of [engines and orchestrators](../integrations/overview.md).
6
+
SQLMesh is a Python framework that automates everything needed to run a scalable data transformation platform. SQLMesh works with a variety of [engines and orchestrators](../integrations/overview.md).
7
7
8
8
It was created with a focus on both data and organizational scale and works regardless of your data warehouse or SQL engine's capabilities.
Copy file name to clipboardExpand all lines: docs/guides/projects.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
---
6
6
7
-
Before getting started, ensure that you meet the [prerequsities](../prerequisites.md) for using SQLMesh.
7
+
Before getting started, ensure that you meet the [prerequisites](../prerequisites.md) for using SQLMesh.
8
8
9
9
---
10
10
@@ -58,7 +58,7 @@ To create a project from the command line, follow these steps:
58
58
59
59
To edit an existing project, open the project file you wish to edit in your preferred editor.
60
60
61
-
If using CLI or Notebook, you can open a file in your project for editing by using the `sqlmesh`command with the `-p`varaible, and pointing to your project's path as follows:
61
+
If using CLI or Notebook, you can open a file in your project for editing by using the `sqlmesh`command with the `-p`variable, and pointing to your project's path as follows:
Copy file name to clipboardExpand all lines: docs/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Here are some challenges that data teams run into, especially when data sizes in
18
18
* Validating changes to data pipelines before deploying to production is an uncertain and sometimes expensive process. Although branches can be deployed to environments, when merged to production, the code is re-run. This is wasteful and generates uncertainty because the data is regenerated.
19
19
20
20
1. Silos transform data lakes to data swamps
21
-
* The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistence". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
21
+
* The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistance". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
22
22
23
23
## What is SQLMesh?
24
24
SQLMesh consists of a CLI, a Python API, and a Web UI to make data pipeline development and deployment easy, efficient, and safe.
0 commit comments