Refactor json_schema.py, implement JSON Schema to YAML#1182
Open
lapp0 wants to merge 1 commit intodottxt-ai:mainfrom
Open
Refactor json_schema.py, implement JSON Schema to YAML#1182lapp0 wants to merge 1 commit intodottxt-ai:mainfrom
json_schema.py, implement JSON Schema to YAML#1182lapp0 wants to merge 1 commit intodottxt-ai:mainfrom
Conversation
lapp0
commented
Sep 30, 2024
lapp0
commented
Sep 30, 2024
lapp0
commented
Sep 30, 2024
db73b4c to
c939c70
Compare
ce488b6 to
3a28324
Compare
3a28324 to
9648b30
Compare
brandonwillard
requested changes
Oct 1, 2024
Member
There was a problem hiding this comment.
The JSON schema code is now in outlines-core. Unless there was an issue opened for that to be done separately, it looks like we missed it in #1175.
These changes will need to be moved and—potentially—ported to Rust. At the very least, we can't have two versions of the same basic JSON schema logic.
9648b30 to
3efb728
Compare
Contributor
Author
|
Seems there was a miscommunication. Thanks for clarifying @brandonwillard, I'll get started on porting the changes to Rust. |
3efb728 to
99fa1ec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Refactor
json_schema.pyto be more coherent and extensible. Use extensibility to implement JSON Schema to YAML.Changes
to_regexinto a classJSONSchemaRegexGeneratorwith visitors which implement JSON Schema rules, and formatters which implement pattern construction.YAMLRegexGeneratorby subclassingJSONSchemaRegexGeneratorand overriding some formatters.Tests:
test_json_schema.pyso it's existing tests also apply to YAML.anyOfandallOf)test_generate.py::test_generate_json, test both json and yaml modes.Behavioral Changes
The only behavior changes are:
NotImplementedErroranyOf,allOf,oneOfanyOf: Previously broken, now ORs sub-patternsallOf: Previously broken, now ANDs sub-patterns via positive lookaheadoneOf: Warns user that it's usinganyOfinstead, and callsanyOfThe rules are much closer to the JSON Schema spec with
main, however JSON Schema spec isn't always desirable. Users can legalize the JSON Schema compliant validation rules viastrict_json_schema_subset=False, resulting in:items: If unspecified, allow additional items without constraintsproperties: If unspecified, allow additional properties without constraintsjson-schema.org test suite
This is a large change-set. To verify correctness, in addition to ensuring current tests pass,
test_json_schema_full.pytests compliance with JSON Schema by retrieving 1,245 test cases from the official json-schema.org test suite.mainNotImplementedError(acceptable: visible)Raising
NotImplementedErrormakes it clear to the user why a schema would fail during generation, and it does so before generation.test_json_schema_to_yaml_complianceFor each of the 263 tests which pass in
test_json_schema_to_json_compliance, we test to verify their corresponding yaml pattern is also correct.TODO
json_schemaso its clean and extensibletest_json_schema_full.pyto yamlUpdate docs to reflect new behaviour surrounding JSON Schema spec-compliant implementationstrict_json_schema_subsetFurther Work
json_schema.pydoes too much. This new structure makes separation of concerns clear, easing a refactor.JSONSchemaRegexGenerator.to_automata(...)Not using a pattern intermediate would simplify things.NotImplementedcomponents based on users opening issues.