[WIP] Add Python test generator#1547
[WIP] Add Python test generator#1547yawpitch wants to merge 20 commits intoexercism:masterfrom yawpitch:python-generator
Conversation
generators/generate.py
Outdated
| def _get_properties(obj): | ||
| if "cases" in obj: | ||
| for case in obj["cases"]: | ||
| yield from get_properties(case) |
There was a problem hiding this comment.
Travis-CI build fails because this syntax is not valid in Python 2.7.
|
The |
cmccandless
left a comment
There was a problem hiding this comment.
These are not my conclusive review comments, just my initial thoughts. I will review this more thoroughly tomorrow or the next day.
generators/templates/_default.j2
Outdated
| import unittest | ||
| {{ make_import(data) }} | ||
|
|
||
| # Tests adapted from `problem-specifications//canonical-data.json` @ {{ data.version }} |
There was a problem hiding this comment.
Should be v{{ data.version}}
There was a problem hiding this comment.
Good catch, fixed and pushed.
generators/generate.py
Outdated
| return repr(case["expected"]) | ||
|
|
||
|
|
||
| def main(): |
There was a problem hiding this comment.
I highly recommend the following modification:
def main(args=None): # where args may be a list of str
...
parser.parse_args(args) # uses sys.argv if args is NoneThis will make automating testing of generated test suites much easier.
There was a problem hiding this comment.
Interesting, hadn't thought of that use case. Fixed and pushed.
|
As is, the following exercises generate passing test suites:
Several others ought to be passing ( I would say that's a fantastic start! |
|
Bit weirded out by some of the items on that list of passing generated tests ... especially since I don't think there's canonical data for, for instance, And yeah, there's going to have to be some special-casing if we want the property names to continue to map to what they've been called in previously manually-generated tests ... there's no completely reliable pattern to the interpolations that people have made over time. |
Also, I'm for some reason any exercise that has error cases is indenting the test functions an extra level, making them undetectable. I'm not sure where the extra indentation is coming from, but I've confirmed that it doesn't occur if the One more thing: the line spacing around the canonical data reference is incorrect: is should be 2 blank lines above, 1 below. |
|
Here is the updated list of passing exercises, excluding those without canonical data:
|
|
I'll try and take a look at those two issues tonight; failing that it might have to be a in a day or two. |
generators/templates/__macros.j2
Outdated
| {% endmacro %} | ||
|
|
||
| {% macro add_assert_raises(case) %} | ||
| self.assertRaisesRegex({{ case.property | to_snake }}({{ case | format_input }}), {{ case | format_expect }}) |
There was a problem hiding this comment.
By convention, this macro should be:
with self.assertRaisesRegex({{ case.error_type | to_camel }}):
{{ case.property | to_snake }}({{ case | format_input }})
There was a problem hiding this comment.
In most cases, case.error_type will be ValueError. However, there are some exercises that define their own error types. I suspect some sort of file will have to be created that defines exceptions to the "use ValueError" rule (perhaps yaml would be a good choice for this?).
There was a problem hiding this comment.
Let me think about that one. Have you got a specific example of an exercise that does this?
There was a problem hiding this comment.
There was a problem hiding this comment.
Right, see now that's a good example ... someone has taken the canonical notion of some error being thrown and decided that a custom class name is required. That's an interpolation, where someone has unintentionally broken the ability to automatically generate these tests. And in fact the implementation of that error in the student's version is going to remain the two-line, do-nothing subclass of Exception that is provided in the slug ... so we're not testing the students work, we're testing the test writer's.
Personally I'd wonder if that's an opportunity to break backwards compatibility and simply check that any exception is being raised with the appropriate error message, rather than a specific, test writer defined exception.
Since the error message is all we get from the canonical data, that seems prudent, but it runs the risk of sacrificing the validity of past Community solutions on the altar of a smoother and more efficient rollout process of tests for future (uncertain) changes to the canonical data.
There was a problem hiding this comment.
I agree, however that might make more sense as an exercise to itself, as the exception hierarchy system in Python is quite dense (and also rather unique) ... like I'd love to have something that shows why you never except: do_something() in production code, but that involves something far more dense than is likely to be able to be defined in canonical data, much less built via an automated script.
I mean personally I hate what we're doing right now, where we're policing a specific error message (in English), but I cannot see a pathway by which we convert what little we know about the error state being handled in the canonical data to something we could get imported in the template, unless we leave that up to individual, per-exercise template and configuration.
Which I'm not ruling out, it just sounds like a lot of complexity for a relatively small number of known cases.
There was a problem hiding this comment.
An exercise that required the student to handle some of the vast range of possible OS error symbols defined in errno would be brilliant, BTW ... and they're relatively standard, which means there might be both some overlap with other languages and a meaningful way of writing canonical properties that would meaningfully map to meaningful tests that could also be meaningfully generated.
There was a problem hiding this comment.
I just said meaningful way too much ... it's quite late here. I'll come back to this tomorrow.
There was a problem hiding this comment.
I mean personally I hate what we're doing right now, where we're policing a specific error message
That's exactly what I'm trying to avoid. All we currently do is ensure that there is indeed a message, and that it is not empty; the Python track does not check for the verbatim error message described in the canonical data.
Which I'm not ruling out, it just sounds like a lot of complexity for a relatively small number of known cases.
It is; I think it could be done independent of whether the exercise has its own unique template or not. Something like this might be the simplest we could get:
# generators/errors.yml
forth:
errors_if_there_is_only_one_value_on_the_stack: IndexError# __macros.j2
{% macro add_assert_raises(case) %}
with self.assertRaisesRegex({{ case | error_type }}):
{{ case.property | to_snake}}({{ case | format_input }})
{% endmacro %}# generate.py
import yaml
with open('errors.yml') as f:
errors = yaml.load(f)
def error_type(case):
case_name = to_snake(case['description'])
current_exercise = ???
return errors[current_exercise].get(case_name, 'ValueError')It's still a little messy though, and I agree that it does seem like it might be to much effort for too few use cases.
An exercise that required the student to handle some of the vast range of possible OS error symbols defined in errno would be brilliant, BTW ... and they're relatively standard, which means there might be both some overlap with other languages and a meaningful way of writing canonical properties that would meaningfully map to meaningful tests that could also be meaningfully generated.
Sounds like a good exercise candidate; you could create an issue in problem-specifications to take the idea further!
There was a problem hiding this comment.
Sorry, been dragged away for a few days... mentoring queue keeps getting long, and I live on a boat and have to move it fairly often. I can see where you're going above, and yes, I do think it might be overkill, but it's not too bad an idea. Will need to figure out a means of differentiating a builtin alternative to ValueError from a non-builtin that must be imported from the exercise.
Do you thing an errors.yml for all exercises is better than an [exercise].yml with an errors section? I'm thinking the latter is better ... if no directory for the exercise exists, use the default template and no error (or other) config. If the directory exists, check it for both an exercise-specific template and an exercise-specific config.
See https://github.com/yawpitch/python/pull/1
I've determined this is caused by the yapf auto-formatting. In particular, the style configuration item |
|
If we're already creating individual directories, then it would be simpler to have just [exercise]/errors.yml and [exercise]/template.j2.
…On Oct 13, 2018, 12:40, at 12:40, Michael Morehouse ***@***.***> wrote:
yawpitch commented on this pull request.
> + self.assertRaisesRegex
+ except AttributeError:
+ self.assertRaisesRegex = self.assertRaisesRegexp
+{%- endif -%}
+{% endmacro %}
+
+{% macro add_assert_equal(case) %}
+{%- if not case.input %}
+self.assertEqual({{ case.property | to_snake }}(), {{ case |
format_expect }})
+{%- else %}
+self.assertEqual({{ case.property | to_snake }}({{ case | format_input
}}), {{ case | format_expect }})
+{% endif %}
+{% endmacro %}
+
+{% macro add_assert_raises(case) %}
+self.assertRaisesRegex({{ case.property | to_snake }}({{ case |
format_input }}), {{ case | format_expect }})
Sorry, been dragged away for a few days... mentoring queue keeps
getting long, and I live on a boat and have to move it fairly often. I
can see where you're going above, and yes, I do think it might be
overkill, but it's not too bad an idea. Will need to figure out a means
of differentiating a builtin alternative to ValueError from a
non-builtin that must be imported from the exercise.
Do you thing an `errors.yml` for all exercises is better than an
`[exercise].yml` with an _errors_ section? I'm thinking the latter is
better ... if no directory for the exercise exists, use the default
template and no error (or other) config. If the directory exists, check
it for both an exercise-specific template and an exercise-specific
config.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#1547 (comment)
|
|
To clarify, you mean directly in python/exericses/[exercise]? |
|
Actually, the precedent for track files that aren't user-facing would be python/exercises/[exercise]/.meta/
…On Oct 13, 2018, 12:51, at 12:51, Michael Morehouse ***@***.***> wrote:
To clarify, you mean directly in python/exericses/[exercise]?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#1547 (comment)
|
|
Right, seems sensible to expect exercise-specific templates and configuration to be in there. Rather than `errors.yml` I think perhaps a `generate.yml` with errors being a section item within it would be better. That way we can also have a section for import overrides for things like class names that won't be parseable from the canonical data. Could help minimize the need for custom templates.
…On Oct 13, 2018, 17:54 +0100, Corey McCandless ***@***.***>, wrote:
Actually, the precedent for track files that aren't user-facing would be python/exercises/[exercise]/.meta/
On Oct 13, 2018, 12:51, at 12:51, Michael Morehouse ***@***.***> wrote:
>To clarify, you mean directly in python/exericses/[exercise]?
>
>--
>You are receiving this because you commented.
>Reply to this email directly or view it on GitHub:
>#1547 (comment)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Fix whitespace formatting for exception handling. Handles a few different types of canonical input and expect kinds, but this will need additional thought. Should expand the number of tests that pass after auto-generation.
|
Ok, just did a push that should, in theory, get several more tests working. And there's a lot of ones that should be pretty easy fixes, just going to have to figure out which is more important, using the property names from the canonical data or retaining the legacy names that are in the tests. Then there are a lot of weird edge cases, where inputs and expectations have been put into arbitrary structures. I think we're probably going to have to define a small suite of format functions for those, then use per-exercise configuration to say which one(s) to use. We could also define a name map that says property X == test name Y in there. |
Changes made, still WIP. Will review again later.
|
Apologies for the late response. This one slipped between the cracks in my workflow somehow.
I think it's perfectly reasonable to have individual templates. How those templates handle the test generation is up to the implementer of that template (and subject to sensible review). |
|
Agreed, and my turn to apologize, missed this notification in a flood of GitHub emails. Haven't had a lot of time free to work on this one, I'm afraid. Hopefully I can take a longer look at it again soon. |
|
@yawpitch any movement on this? |
|
Afraid not. My free time has been limited and what I've had available has been soaked up with mentoring. I've got a few hours today and once I've cleared my backlog of notifications I'll try and take a look.
…On Dec 7, 2018, 13:14 +0000, Corey McCandless ***@***.***>, wrote:
@yawpitch any movement on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
First pass at per-exercise configuration; it's crude and we still need to figure out how to map canonical properties to classes, as well as needing a few different ways of handling argument variations ... but it's getting a surprisingly large number of them to passing.
|
Ok I've taken a solid first pass at per-exercise configuration and been able to get a significant chunk of exercises to the point where they're generating what I believe are workable tests. There are in some cases some substantial changes to the format of the test files themselves, as the various implementers took a rather wide range of approaches to the naming of tests and the layout of inputs and outputs ... I've gone for consistency over trying to keep a small diff. That said the diff will be much smaller if you run Things I haven't done: tried to handle any exercise that uses a class instead of functions; these will require their own template, though I've got some ideas about how to handle some of the simpler ones in configuration. Also haven't tried to deal with any of the many different ways that input arguments have been re-mapped from the canonical form ... several exercises flip positional places, others use keywords derived from the canonical properties, others just make things up ... will need to implement some more flexible ways of handling these as functions that get enabled via configuration. Overall though should get us a lot closer on a lot more exercises. |
| __pycache__ | ||
|
|
||
| # virtual environments | ||
| venv |
cmccandless
left a comment
There was a problem hiding this comment.
Overall, the code changes look great. I won't have the time until maybe next week to fetch your changes and mess around with the generated tests.
| - description: "encode decode" | ||
| property: "decode" | ||
| input: 'encode("Testing, 1 2 3, testing.")' | ||
| expected: "testing123testing" |
There was a problem hiding this comment.
Is this for track-specific cases?
There was a problem hiding this comment.
Yes. If the input value is a string it'll insert it as is, so property(STRING)... allows a little more flexibility for cases like this.
| try: | ||
| from yaml import CLoader as Loader, CDumper as Dumper | ||
| except ImportError: | ||
| from yaml import Loader, Dumper |
There was a problem hiding this comment.
I'm guessing this is for Python2/3 compatibility. Would you mind adding comments specifying which is which?
There was a problem hiding this comment.
It's not actually related to 2/3 ... if PyYAML was installed from the wheel or libyaml is otherwise available this gives a big speed boost by using the C extension, otherwise it falls back on the pure-Python version.
There was a problem hiding this comment.
Ah, of course. I should've taken that hint from the naming schemes. Still would you mind adding a comment explaining that for easy future reference?
Add comments to explain the yaml import statement.
- -o/--output is not -d/--output-dir - -o/--only is now used to specify a single exercise for which to generate tests
cmccandless
left a comment
There was a problem hiding this comment.
I forked your work in my own fork so I could test some ideas. Please check out this branch in my fork or this diff against your current revision (https://github.com/yawpitch/python/commit/e93669ea589c4894563583977acd73cc0577bf58) and let me know what you think.
I also think it would be a good idea to document what is valid in generate.yml. I would be willing to help out with writing this document.
| default="./config.json", | ||
| help="path to the Python track config.json file: (%(default)s)") | ||
| parser.add_argument( | ||
| "-o", |
There was a problem hiding this comment.
Can I recommend the following?
$ generate.py -h
...
-o EXERCISE, --only EXERCISE generate tests for just the exercise specified
-d DIRECTORY, --output-dir DIRECTORY path to the output directory: (./exercises)configlet uses the -o/--only flag like this, consistency among maintainer tools would be nice.
There was a problem hiding this comment.
Agreed on the -o flag; I'll look at your changes soon as I possibly can; I'm in the middle of a project right now that's eating up most of my time.
|
Any updates on this? This ended up being a much bigger project than I though; thanks for all your work on it so far! And to think I thought we could finished this the last week of September! |
|
Hi Corey,
Sorry to say I've got no update. I'd like t find time to round off the rough edges, but my sabbatical is coming to a close and I have to concentrate on projects that I need to complete before the work search begins in earnest.
Unfortunately part of that involves getting a new website up and running, and that's soaked up all my time as I had no idea HTML had changed so much since I moved into the back end of VFX. It's still several weeks away from live.
I think the branch is in a useable state for a lot of the simpler exercises, and could at least take the load off a list of white-listed exercises. Would be good to know which ones it's utterly failing on.
Yeah I think we were a bit ambitious on this one. Not if I could dedicate my full time to it, but right now I'm barely able to eke out time for minimal mentoring.
Sorry,
M
…On Jan 30, 2019, 15:39 +0000, Corey McCandless ***@***.***>, wrote:
Any updates on this?
This ended up being a much bigger project than I though; thanks for all your work on it so far! And to think I thought we could finished this the last week of September!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
That's quite alright. I don't have a lot of time for it myself, but it would be good if, when I do have some bandwidth, I could contribute directly to the work done so far. If that is fine with you, feel free to add me as a contributor on your fork. |
|
Happy for you to. I've sent the add request. |
|
Closing this as it's woefully out of date and we've gone down a different path. |
First pass that actually functions; there's definitely going to be some need for additional passes at the work of converting the canonical input dict to arguments that can be passed to Python functions; the canonical data is all over the place from exercise to exercise, and simply treating the dict as keyword arguments doesn't work well because quite a few exercises use keys that are legal dict keys but illegal keyword arguments (ie numbers, language reserved words).
But I think this is a decent start for Hacktoberfest.