Skip to content

Add logic for recognizing jinja code in models#28

Closed
georgesittas wants to merge 1 commit intomainfrom
jo/jinja_rendering
Closed

Add logic for recognizing jinja code in models#28
georgesittas wants to merge 1 commit intomainfrom
jo/jinja_rendering

Conversation

@georgesittas
Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas commented Dec 8, 2022

Posting this so I can get some early feedback; I plan to work on / refine it soon. Some points for discussion:

  1. Do we want to support jinja in the model's metadata? If not, we might be able to parse just this part of the file in order to use the fields inside Model.load (e.g. name is needed to instantiate a model).

  2. How do we want to handle model validation? As far as I can understand, the plan is to store SQL that may contain jinja as a string (see JinjaModel), so validation needs to happen every time we're about to render / execute it? Am I missing something here?

  3. What about jinja context? Do we already have a rough plan of where we'll store the needed information to render correctly?

By the way, a simple way I found to render jinja code from a string is the following (for reference):

>>> from jinja2 import Environment, BaseLoader
>>> template = Environment(loader=BaseLoader).from_string("Hello {{ name }}")
>>> rtemplate.render(**{"name": "foo"})
'Hello foo'

I'm happy to follow a different direction if this doesn't make much sense.

Comment thread sqlmesh/utils/jinja.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be necessary, maybe just render jinja and see if the outputs are different

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical as to whether this is going to add more overhead than a regex search, but we can try.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then only run that after the regex search

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented Dec 8, 2022

yea, i think we need to parse the Model first for other reasons as well, like getting the dialect

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented Dec 8, 2022

i would render the jinja once to validate it, but not validate every time

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented Dec 8, 2022

what information is needed to render? can you store it in the context?

@crericha
Copy link
Copy Markdown
Collaborator

crericha commented Dec 8, 2022

Do we want to support jinja in the model's metadata? If not, we might be able to parse just this part of the file in order to use the fields inside Model.load (e.g. name is needed to instantiate a model).

Are you referring to the config() method at the top of dbt models, which is wrapped in jinja? If so, my hope was to render that to obtain the config.

@georgesittas
Copy link
Copy Markdown
Collaborator Author

georgesittas commented Dec 8, 2022

what information is needed to render? can you store it in the context?

We can have things like

select * from events where event_type = '{{ var("event_type") }}'

So I guess we'd need to use context information to render the jinja correctly (e.g. by scanning a .yml file with all the user's configs or something).

Are you referring to the config() method at the top of dbt models, which is wrapped in jinja? If so, my hope was to render that to obtain the config.

Good point. Yeah, I guess that won't be very hard to support. Another question related to this is: are we going to assume that jinja => dbt project? Under this assumption, we can always just call a parse function for this config section you're referring to, otherwise it might be a bit trickier.

@georgesittas
Copy link
Copy Markdown
Collaborator Author

georgesittas commented Dec 8, 2022

i would render the jinja once to validate it, but not validate every time

You mean like @ model load time, right? So, we'd validate the model using the rendered SQL but still store the original jinja string (possibly w/out the model meta)?

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented Dec 8, 2022

i would render the jinja once to validate it, but not validate every time

You mean like @ model load time, right? So, we'd validate the model using the rendered SQL but still store the original jinja string (possibly w/out the model meta)?

yea, shouldn't store model meta

@eakmanrq
Copy link
Copy Markdown
Collaborator

eakmanrq commented Dec 8, 2022

Another question related to this is: are we going to assume that jinja => dbt project?

I think we will have users that like Jinja and if we couple Jinja with dbt then they will say "Well I need to write dbt models so I can use Jinja". So I think ideally we would support Jinja for all models (not just dbt) but not sure on the technical complexity of that.

@tobymao
Copy link
Copy Markdown
Contributor

tobymao commented Dec 8, 2022

right, jinja !+ dbt

@crericha
Copy link
Copy Markdown
Collaborator

crericha commented Dec 8, 2022

Are you referring to the config() method at the top of dbt models, which is wrapped in jinja? If so, my hope was to render that to obtain the config.

Good point. Yeah, I guess that won't be very hard to support. Another question related to this is: are we going to assume that jinja => dbt project? Under this assumption, we can always just call a parse function for this config section you're referring to, otherwise it might be a bit trickier.

My thought is I'd call render at load to get the config information and transform it into sqlmesh model meta data (combining it with any general config from the yaml files). Maybe, render could take in an optional dict of jinja overrides, where I can pass in a lambda for config and handle it myself { "config": myConfigMethod}. We need to think a little bit on how best to handle ref and source

I agree with @eakmanrq and @tobymao regarding Jinja support for all models, regardless if it is dbt or not.

@georgesittas
Copy link
Copy Markdown
Collaborator Author

I see, yeah these points sound reasonable. So then ideally we'd want sqlmesh users to be able to use both our macro system and jinja, but for the latter we won't create any special functionality (like how dbt uses ref, for example), since we already have a "sqlmesh" way to do things.

Although, if we allow jinja usage throughout sqlglot, does that mean that we also need to have a dedicated config file (e.g. a yaml or something similar), so that the user can provide kwargs to be used in their jinja code? Does that make sense?

@crericha
Copy link
Copy Markdown
Collaborator

crericha commented Dec 8, 2022

For ref and source, we'll need to replace those with the sqlmesh naming either on load or when the query is rendered. Does anyone have opinions when it happens?

Comment thread sqlmesh/core/context.py
file_contents = file.read()

if JINJA_RE.search(file_contents):
expressions = [JinjaModel(this=file_contents)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a jinja model needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed per se. The idea for using it was that in this way we can still represent the query as a SQLGlot expression. An alternative would be to store a raw string for jinja queries, but then we'd have to take that new representation into account in places where query is expected to be an expression.

This is still in an early phase, though. As we're moving on, it might make sense to actually change this, so I'm not arguing that this is the way to go.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe render will return a sqlglot expression (jinja or no jinja), no?

Copy link
Copy Markdown
Collaborator Author

@georgesittas georgesittas Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct -- maybe JinjaModel is not necessary after all. I'll think about it and continue working on this PR soon.

@georgesittas georgesittas marked this pull request as draft December 9, 2022 17:58
@georgesittas
Copy link
Copy Markdown
Collaborator Author

georgesittas commented Dec 9, 2022

Closing this PR so that @tobymao can do a fresh first pass on this task. I plan to help with it afterwards.

@georgesittas georgesittas deleted the jo/jinja_rendering branch December 9, 2022 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants