Conversation
0c58f64 to
1f76d48
Compare
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
| self._mapping = mapping | ||
|
|
||
| @property | ||
| def mapping(self) -> t.Dict[str, str]: |
There was a problem hiding this comment.
[Nitpick] TBH I really dislike the name mapping. Given its name and the type signature it can be anything at all and it's impossible to tell without reading the docs (if they are even available). Can we be more specific? Like physical_tables_to_model or model_tables or model_to_table_mapping etc.
There was a problem hiding this comment.
changed to model_tables
| def mapping(self) -> t.Dict[str, str]: | ||
| """Mapping of model name to physical table name. | ||
|
|
||
| If a snapshot has not been versioned yet, its view name will be returned. |
There was a problem hiding this comment.
Why return view? So that local evaluation works?
There was a problem hiding this comment.
yea, in case you haven't pushed a snapshot yet (because you can run evaluate before plan)
| return self.engine_adapter.fetchdf(query) | ||
|
|
||
|
|
||
| class Context(ExecutionContext): |
There was a problem hiding this comment.
The way we use the base class here is quite sketchy and can lead to unintended consequences. For example we never invoke the base class constructor and only rely on method overriding hoping it would just do the right thing. As the code evolves custom initialization can be added to the constructor of ExecutionContext which wouldn't be a part of the Context.
I'd rather have an ABC for this instead and 2 concrete implementations. Also we may want to create context package since this module is pretty huge already.
| context, start=start, end=end, latest=latest, **kwargs | ||
| ) | ||
| if self.kind == ModelKind.INCREMENTAL: | ||
| assert self.time_column |
There was a problem hiding this comment.
I don't think this is helpful. Shouldn't this be a ConfigError? Just a heads up that I was going to work on our validation sequence (configuration + model definitions) holistically soon.
There was a problem hiding this comment.
this is only for mypy
|
|
||
| if pyspark and isinstance(df, pyspark.sql.DataFrame): | ||
| self.convert_to_time_column(end) | ||
| df = df.where( |
There was a problem hiding this comment.
This made me realize something. How do we handle time zones? As far as I understand our start / end macros always return UTC. When it comes to spark functions it uses the local time zone by default unless UTC is set explicitly as part of the session config (https://spark.apache.org/docs/latest/sql-ref-syntax-aux-conf-mgmt-set-timezone.html). Is this something that we need to take care of or a responsibility of a user?
There was a problem hiding this comment.
i think it might be ok because all of our timestamps are utc aware
| latest: TimeLike, | ||
| snapshots: t.Dict[str, Snapshot], | ||
| limit: int = 0, | ||
| snapshots: t.Optional[t.Dict[str, Snapshot]] = None, |
There was a problem hiding this comment.
Shall we get rid of snapshots here as well and just provide mapping upstream?
There was a problem hiding this comment.
i think it's more convenient this way so others don't need to form the mapping,
also looking at this code, i realized -- does spark implement running audits yet?
There was a problem hiding this comment.
you mean airflow? Unless they are invoked as part of the evaluation I don't think so
769f29c to
7fbf430
Compare
i changed the internal representation of the time format to be python