Adding a new column is not a breaking contract change#7333
Conversation
core/dbt/contracts/graph/nodes.py
Outdated
|
|
||
| if breaking_change_reasons: | ||
| raise (ModelContractError(reasons=" and ".join(breaking_change_reasons), node=self)) | ||
| for key, value in sorted(old.columns.items()): |
There was a problem hiding this comment.
In terms of performance - we're not iterating over columns for every model, just those with a contract change. This should extend also nicely to detecting breaking changes based on constraint modifications - we could add the constraints to the existing contract checksum and dig into the specific breaking change reasons from there.
There was a problem hiding this comment.
That was my instinct as well. The checksum is the fastest way to detect that there have been no changes. If there are changes, we dig in from there, and it's worth iterating over every column to construct the most thorough & helpful error message.
There was a problem hiding this comment.
Yes. Something like this was always going to have to happen for constraints.
| class ContractBreakingChangeError(DbtRuntimeError): | ||
| CODE = 10016 | ||
| MESSAGE = "Contract Error" | ||
| MESSAGE = "Breaking Change to Contract" |
There was a problem hiding this comment.
I changed this, but I don't actually know where this MESSAGE appears. Should, this and the type below, be Contract Breaking Change Error for consistency with the exception class name?
There was a problem hiding this comment.
I think it appears in a generic message that packages up compilation errors, but I'm not sure it does anywhere else. Let me look for it...
There was a problem hiding this comment.
A few exceptions use it to construct the actual "message", but mostly it's not. Another thing we might want to clean up at some point. So I don't think it matters here.
There was a problem hiding this comment.
Other message use a "type" property to construct a "{type} Error", but mostly DbtInternalError
core/dbt/contracts/graph/nodes.py
Outdated
| ) | ||
| else: | ||
| # no breaking changes | ||
| return True |
There was a problem hiding this comment.
It might just be a confusing comment - but this method should return True when there are no changes (breaking or otherwise) to the contract, and False if there is a change to the contract but it is non-breaking. Are we actually handling the latter case?
There was a problem hiding this comment.
It does look like the logic isn't right here. I think we need to keep track of adds and return False, so we need "elif columns_added, return False, else True.
There was a problem hiding this comment.
I think that the else wouldn't actually be hit, but it does feel better to have it there.
There was a problem hiding this comment.
I'm taking a look at this logic now. It's genuinely tricky. I'll add more inline comments.
There was a problem hiding this comment.
This should be returning False.
In practice, it's likely that a contract change would be accompanied by a change to the model definition, such that same_body would return False — but we should still have the logic be right here.
I'll add a test that should catch this as well, by selecting just state:modified.contract.
There was a problem hiding this comment.
I inverted the logic a bit, so now it's:
- Was the contract being enforced previously?
- no → return
FalseorTrue, but neverContractBreakingChangeError
- no → return
- If so, do the checksums match up?
- yes →
return True
- yes →
- If the checksums don't match up, are there any changes we consider breaking?
- breaking → throw a
ContractBreakingChangeErrorwith details - non-breaking, still a change →
return False
- breaking → throw a
Let me know if you think that makes more or less sense than before
core/dbt/contracts/graph/nodes.py
Outdated
| # Note: we don't have contract.checksum for current node, so build | ||
| # Breaking change: the contract was previously enforced, and it no longer is | ||
| # Note: we don't have contract.checksum for current node, so build it now | ||
| self.build_contract_checksum() |
There was a problem hiding this comment.
ah, good catch - I'd added it up there, but it actually isn't needed (?) - we only need it in the case that self.contract.enforced is False (?) - but it still seems like calling it here wouldn't actually do anything...
dbt-core/core/dbt/contracts/graph/nodes.py
Lines 571 to 575 in 6b42a71
There was a problem hiding this comment.
I'm pretty sure we can remove both calls to build_contract_checksum(). It's being called within NodePatchParser.parse_patch
There was a problem hiding this comment.
Oh, that's a change. Initially I only had it building when contract.enforced is True, and it's been changed so that it always builds.
There was a problem hiding this comment.
Actually no, it's "build_contract_checksum" that checks for contract enforced. So it is a no-op, but it probably shouldn't be, because Michelle wanted to also capture that there were additional changes to the columns. So we should remove the check for contract.enforced in build_contract_checksum and only call it in parse_patch if contract.enforced is True, and THEN it won't be a no-op.
There was a problem hiding this comment.
Boy this logic makes your head hurt.
There was a problem hiding this comment.
@gshank Right 😅
We want the full column spec to be captured in the checksum, so that additions would cause a checksum mismatch — but if the checksums don't match, we then need to dig in and understand whether a column was actually removed/changed, or just added.
I'm happy with the current implementation for now, and it seems to satisfy the test cases we've added (reflecting in-the-wild use cases we'd expect) — but we can open up another ticket to revisit this logic.
MichelleArk
left a comment
There was a problem hiding this comment.
Just one question, not necessarily blocking. Thanks so much for fixing this ✨
|
|
||
| # If the checksums match up, the contract has not changed, so same_contract: True | ||
| if self.contract.checksum == old.contract.checksum: | ||
| return True |
There was a problem hiding this comment.
I think this is wrong. If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case. Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.
There was a problem hiding this comment.
If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.
Yes! Good point! Let me add a test for this
Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.
Per thread above, I don't think we need to call build_contract_checksum at any point within same_contract
There was a problem hiding this comment.
Per other thread, we want to capture that the contract has been disabled AND the columns are different.
There was a problem hiding this comment.
If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case.
After looking more closely, the current formulation of the logic does work, because self.contract.checksum is None!
ipdb> self.contract.checksum == old.contract.checksum
False
ipdb> self.contract
Contract(enforced=False, checksum=None)
ipdb> old.contract
Contract(enforced=True, checksum='1698cf5a415f00ae0dee2c6e97bb51c020d46955847b2e9cec53a8e40d1afb13')I do agree that a more explicit condition for this would be better, rather than implicitly depending on that behavior to remain the case, so I've added it in
|
I'm going to merge this for inclusion in v1.5.0-rc1. Even if the logic is a bit verbose, and perhaps still not 100% perfect, we have better test coverage now than before, and it feels like a step closer to the behavior that we want. |
resolves #7332
Description
If there's a mismatch in the
checksum, then we need to dig into why to understand if the change is actually breaking. I'm not sure if this is the most performant way to do it, but it yields the intended functionality.I also renamed the exception from
ModelContractErrortoContractBreakingChangeError, and reworked the message that we display to users:I first did this with fancier string formatting within
CompiledNode.same_contract, but I added another commit that took the approach of storing more structured information onContractBreakingChangeError, and pretty-formatting the message there.Checklist
changie newto create a changelog entry