Add logic for recognizing jinja code in models#28
Conversation
There was a problem hiding this comment.
this may not be necessary, maybe just render jinja and see if the outputs are different
There was a problem hiding this comment.
I'm skeptical as to whether this is going to add more overhead than a regex search, but we can try.
There was a problem hiding this comment.
ok, then only run that after the regex search
|
yea, i think we need to parse the Model first for other reasons as well, like getting the dialect |
|
i would render the jinja once to validate it, but not validate every time |
|
what information is needed to render? can you store it in the context? |
Are you referring to the config() method at the top of dbt models, which is wrapped in jinja? If so, my hope was to render that to obtain the config. |
We can have things like select * from events where event_type = '{{ var("event_type") }}'So I guess we'd need to use context information to render the jinja correctly (e.g. by scanning a .yml file with all the user's configs or something).
Good point. Yeah, I guess that won't be very hard to support. Another question related to this is: are we going to assume that jinja => dbt project? Under this assumption, we can always just call a parse function for this config section you're referring to, otherwise it might be a bit trickier. |
You mean like @ model load time, right? So, we'd validate the model using the rendered SQL but still store the original jinja string (possibly w/out the model meta)? |
yea, shouldn't store model meta |
I think we will have users that like Jinja and if we couple Jinja with dbt then they will say "Well I need to write dbt models so I can use Jinja". So I think ideally we would support Jinja for all models (not just dbt) but not sure on the technical complexity of that. |
|
right, jinja !+ dbt |
My thought is I'd call render at load to get the config information and transform it into sqlmesh model meta data (combining it with any general config from the yaml files). Maybe, render could take in an optional dict of jinja overrides, where I can pass in a lambda for config and handle it myself I agree with @eakmanrq and @tobymao regarding Jinja support for all models, regardless if it is dbt or not. |
|
I see, yeah these points sound reasonable. So then ideally we'd want sqlmesh users to be able to use both our macro system and jinja, but for the latter we won't create any special functionality (like how dbt uses Although, if we allow jinja usage throughout sqlglot, does that mean that we also need to have a dedicated config file (e.g. a yaml or something similar), so that the user can provide kwargs to be used in their jinja code? Does that make sense? |
|
For |
ff1990d to
fdd8934
Compare
| file_contents = file.read() | ||
|
|
||
| if JINJA_RE.search(file_contents): | ||
| expressions = [JinjaModel(this=file_contents)] |
There was a problem hiding this comment.
Why is a jinja model needed?
There was a problem hiding this comment.
It's not needed per se. The idea for using it was that in this way we can still represent the query as a SQLGlot expression. An alternative would be to store a raw string for jinja queries, but then we'd have to take that new representation into account in places where query is expected to be an expression.
This is still in an early phase, though. As we're moving on, it might make sense to actually change this, so I'm not arguing that this is the way to go.
There was a problem hiding this comment.
I believe render will return a sqlglot expression (jinja or no jinja), no?
There was a problem hiding this comment.
Yes, that's correct -- maybe JinjaModel is not necessary after all. I'll think about it and continue working on this PR soon.
|
Closing this PR so that @tobymao can do a fresh first pass on this task. I plan to help with it afterwards. |
Posting this so I can get some early feedback; I plan to work on / refine it soon. Some points for discussion:
Do we want to support jinja in the model's metadata? If not, we might be able to parse just this part of the file in order to use the fields inside
Model.load(e.g.nameis needed to instantiate a model).How do we want to handle model validation? As far as I can understand, the plan is to store SQL that may contain jinja as a string (see
JinjaModel), so validation needs to happen every time we're about to render / execute it? Am I missing something here?What about jinja context? Do we already have a rough plan of where we'll store the needed information to render correctly?
By the way, a simple way I found to render jinja code from a string is the following (for reference):
I'm happy to follow a different direction if this doesn't make much sense.