Skip to content

execution context fixes #20#34

Merged
tobymao merged 7 commits intomainfrom
toby/execution_context
Dec 9, 2022
Merged

execution context fixes #20#34
tobymao merged 7 commits intomainfrom
toby/execution_context

Conversation

@tobymao
Copy link
Copy Markdown
Contributor

@tobymao tobymao commented Dec 9, 2022

i changed the internal representation of the time format to be python

@tobymao tobymao force-pushed the toby/execution_context branch from 0c58f64 to 1f76d48 Compare December 9, 2022 03:45
Comment thread sqlmesh/core/context.py Outdated
Comment thread sqlmesh/core/context.py Outdated
Comment thread sqlmesh/core/context.py
tobymao and others added 2 commits December 8, 2022 20:25
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Comment thread sqlmesh/core/model.py Outdated
tobymao and others added 2 commits December 8, 2022 21:14
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Comment thread sqlmesh/core/context.py Outdated
self._mapping = mapping

@property
def mapping(self) -> t.Dict[str, str]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick] TBH I really dislike the name mapping. Given its name and the type signature it can be anything at all and it's impossible to tell without reading the docs (if they are even available). Can we be more specific? Like physical_tables_to_model or model_tables or model_to_table_mapping etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to model_tables

Comment thread sqlmesh/core/context.py
def mapping(self) -> t.Dict[str, str]:
"""Mapping of model name to physical table name.

If a snapshot has not been versioned yet, its view name will be returned.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return view? So that local evaluation works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, in case you haven't pushed a snapshot yet (because you can run evaluate before plan)

Comment thread sqlmesh/core/context.py Outdated
return self.engine_adapter.fetchdf(query)


class Context(ExecutionContext):
Copy link
Copy Markdown
Collaborator

@izeigerman izeigerman Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we use the base class here is quite sketchy and can lead to unintended consequences. For example we never invoke the base class constructor and only rely on method overriding hoping it would just do the right thing. As the code evolves custom initialization can be added to the constructor of ExecutionContext which wouldn't be a part of the Context.

I'd rather have an ABC for this instead and 2 concrete implementations. Also we may want to create context package since this module is pretty huge already.

Comment thread sqlmesh/core/model.py
context, start=start, end=end, latest=latest, **kwargs
)
if self.kind == ModelKind.INCREMENTAL:
assert self.time_column
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is helpful. Shouldn't this be a ConfigError? Just a heads up that I was going to work on our validation sequence (configuration + model definitions) holistically soon.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only for mypy

Comment thread sqlmesh/core/model.py

if pyspark and isinstance(df, pyspark.sql.DataFrame):
self.convert_to_time_column(end)
df = df.where(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me realize something. How do we handle time zones? As far as I understand our start / end macros always return UTC. When it comes to spark functions it uses the local time zone by default unless UTC is set explicitly as part of the session config (https://spark.apache.org/docs/latest/sql-ref-syntax-aux-conf-mgmt-set-timezone.html). Is this something that we need to take care of or a responsibility of a user?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it might be ok because all of our timestamps are utc aware

latest: TimeLike,
snapshots: t.Dict[str, Snapshot],
limit: int = 0,
snapshots: t.Optional[t.Dict[str, Snapshot]] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we get rid of snapshots here as well and just provide mapping upstream?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's more convenient this way so others don't need to form the mapping,

also looking at this code, i realized -- does spark implement running audits yet?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean airflow? Unless they are invoked as part of the evaluation I don't think so

@tobymao tobymao force-pushed the toby/execution_context branch from 769f29c to 7fbf430 Compare December 9, 2022 18:37
@tobymao tobymao enabled auto-merge (squash) December 9, 2022 18:39
@tobymao tobymao merged commit 3fee692 into main Dec 9, 2022
@tobymao tobymao deleted the toby/execution_context branch December 9, 2022 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants