Skip to content

Latest commit

 

History

History
546 lines (301 loc) · 27.6 KB

File metadata and controls

546 lines (301 loc) · 27.6 KB

Protocol Documentation

Table of Contents

Top

configs.proto

ActionConfig

Action config defines the contents of actions.yaml configuration files.

Field Type Label Description
table ActionConfig.TableConfig
view ActionConfig.ViewConfig
incrementalTable ActionConfig.IncrementalTableConfig
assertion ActionConfig.AssertionConfig
operation ActionConfig.OperationConfig
declaration ActionConfig.DeclarationConfig
notebook ActionConfig.NotebookConfig
dataPreparation ActionConfig.DataPreparationConfig

ActionConfig.AssertionConfig

Field Type Label Description
name string The name of the assertion.
dataset string The dataset (schema) of the assertion.
project string The Google Cloud project (database) of the assertion.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string Description of the assertion.
hermetic bool If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.
dependOnDependencyAssertions bool If true, assertions dependent upon any of the dependencies are added as dependencies as well.

ActionConfig.ColumnDescriptor

Field Type Label Description
path string repeated The identifier for the column, using multiple parts for nested records.
description string A text description of the column.
bigqueryPolicyTags string repeated A list of BigQuery policy tags that will be applied to the column.
tags string repeated A list of tags for this column which will be applied.

ActionConfig.DataPreparationConfig

Field Type Label Description
name string The name of the data preparation.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string Description of the data preparation.

ActionConfig.DeclarationConfig

Field Type Label Description
name string The name of the declaration.
dataset string The dataset (schema) of the declaration.
project string The Google Cloud project (database) of the declaration.
description string Description of the declaration.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the declaration.

ActionConfig.IncrementalTableConfig

Field Type Label Description
name string The name of the incremental table.
dataset string The dataset (schema) of the incremental table.
project string The Google Cloud project (database) of the incremental table.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
protected bool If true, prevents the dataset from being rebuilt from scratch.
uniqueKey string repeated If set, unique key represents a set of names of columns that will act as a the unique key. To enforce this, when updating the incremental table, Dataform merges rows with uniqueKey instead of appending them.
description string Description of the incremental table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partitionBy string The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partitionExpirationDays int32 The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
requirePartitionFilter bool Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
updatePartitionFilter string SQL-based filter for when incremental updates are applied.
clusterBy string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.IncrementalTableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
assertions ActionConfig.TableAssertionsConfig Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.
hermetic bool If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.IncrementalTableConfig.AdditionalOptionsEntry

Field Type Label Description
key string
value string

ActionConfig.IncrementalTableConfig.LabelsEntry

Field Type Label Description
key string
value string

ActionConfig.NotebookConfig

Field Type Label Description
name string The name of the notebook.
location string The Google Cloud location of the notebook.
project string The Google Cloud project (database) of the notebook.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
description string Description of the notebook.
dependOnDependencyAssertions bool When set to true, assertions dependent upon any dependency will be add as dedpendency to this action

ActionConfig.OperationConfig

Field Type Label Description
name string The name of the operation.
dataset string The dataset (schema) of the operation.
project string The Google Cloud project (database) of the operation.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
hasOutput bool Declares that this action creates a dataset which should be referenceable as a dependency target, for example by using the ref function.
description string Description of the operation.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the operation. Can only be set if hasOutput is true.
dependOnDependencyAssertions bool When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
hermetic bool If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.TableAssertionsConfig

Options for shorthand specifying assertions, useable for some table-based action types.

Field Type Label Description
uniqueKey string repeated Column(s) which constitute the dataset's unique key index. If set, the resulting assertion will fail if there is more than one row in the dataset with the same values for all of these column(s).
uniqueKeys ActionConfig.TableAssertionsConfig.UniqueKey repeated
nonNull string repeated Column(s) which may never be NULL. If set, the resulting assertion will fail if any row contains NULL values for these column(s).
rowConditions string repeated General condition(s) which should hold true for all rows in the dataset. If set, the resulting assertion will fail if any row violates any of these condition(s).

ActionConfig.TableAssertionsConfig.UniqueKey

Combinations of column(s), each of which should constitute a unique key index for the dataset. If set, the resulting assertion(s) will fail if there is more than one row in the dataset with the same values for all of the column(s) in the unique key(s).

Field Type Label Description
uniqueKey string repeated

ActionConfig.TableConfig

Field Type Label Description
name string The name of the table.
dataset string The dataset (schema) of the table.
project string The Google Cloud project (database) of the table.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
description string Description of the table.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
partitionBy string The key by which to partition the table. Typically the name of a timestamp or the date column. See https://cloud.google.com/dataform/docs/partitions-clusters.
partitionExpirationDays int32 The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
requirePartitionFilter bool Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
clusterBy string repeated The keys by which to cluster partitions by. See https://cloud.google.com/dataform/docs/partitions-clusters.
labels ActionConfig.TableConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.TableConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
assertions ActionConfig.TableAssertionsConfig Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.
hermetic bool If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.

ActionConfig.TableConfig.AdditionalOptionsEntry

Field Type Label Description
key string
value string

ActionConfig.TableConfig.LabelsEntry

Field Type Label Description
key string
value string

ActionConfig.Target

Target represents a unique action identifier.

Field Type Label Description
project string The Google Cloud project (database) of the action.
dataset string The dataset (schema) of the action. For notebooks, this is the location.
name string The name of the action.
includeDependentAssertions bool flag for when we want to add assertions of this dependency in dependency_targets as well.

ActionConfig.ViewConfig

Field Type Label Description
name string The name of the view.
dataset string The dataset (schema) of the view.
project string The Google Cloud project (database) of the view.
dependencyTargets ActionConfig.Target repeated Targets of actions that this action is dependent on.
filename string Path to the source file that the contents of the action is loaded from.
tags string repeated A list of user-defined tags with which the action should be labeled.
disabled bool If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.
preOperations string repeated Queries to run before query. This can be useful for granting permissions.
postOperations string repeated Queries to run after query.
materialized bool Applies the materialized view optimization, see https://cloud.google.com/bigquery/docs/materialized-views-intro.
description string Description of the view.
columns ActionConfig.ColumnDescriptor repeated Descriptions of columns within the table.
labels ActionConfig.ViewConfig.LabelsEntry repeated Key-value pairs for BigQuery labels.
additionalOptions ActionConfig.ViewConfig.AdditionalOptionsEntry repeated Key-value pairs of additional options to pass to the BigQuery API. Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.
dependOnDependencyAssertions bool When set to true, assertions dependent upon any dependency will be add as dedpendency to this action
hermetic bool If true, this indicates that the action only depends on data from explicitly-declared dependencies. Otherwise if false, it indicates that the action depends on data from a source which has not been declared as a dependency.
assertions ActionConfig.TableAssertionsConfig Assertions to be run on the dataset. If configured, relevant assertions will automatically be created and run as a dependency of this dataset.

ActionConfig.ViewConfig.AdditionalOptionsEntry

Field Type Label Description
key string
value string

ActionConfig.ViewConfig.LabelsEntry

Field Type Label Description
key string
value string

ActionConfigs

Action configs defines the contents of actions.yaml configuration files.

Field Type Label Description
actions ActionConfig repeated

NotebookRuntimeOptionsConfig

Field Type Label Description
outputBucket string Storage bucket to output notebooks to after their execution.

WorkflowSettings

Workflow Settings defines the contents of the workflow_settings.yaml configuration file.

Field Type Label Description
dataformCoreVersion string The desired dataform core version to compile against.
defaultProject string Required. The default Google Cloud project (database).
defaultDataset string Required. The default dataset (schema).
defaultLocation string Required. The default BigQuery location to use. For more information on BigQuery locations, see https://cloud.google.com/bigquery/docs/locations.
defaultAssertionDataset string Required. The default dataset (schema) for assertions.
vars WorkflowSettings.VarsEntry repeated Optional. User-defined variables that are made available to project code during compilation. An object containing a list of "key": value pairs.
projectSuffix string Optional. The suffix to append to all Google Cloud project references.
datasetSuffix string Optional. The suffix to append to all dataset references.
namePrefix string Optional. The prefix to append to all action names.
defaultNotebookRuntimeOptions NotebookRuntimeOptionsConfig Optional. Default runtime options for Notebook actions.

WorkflowSettings.VarsEntry

Field Type Label Description
key string
value string

Scalar Value Types

.proto Type Notes C++ Java Python Go C# PHP Ruby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)