Skip to content

[WIP] Add Python test generator#1547

Closed
yawpitch wants to merge 20 commits intoexercism:masterfrom
yawpitch:python-generator
Closed

[WIP] Add Python test generator#1547
yawpitch wants to merge 20 commits intoexercism:masterfrom
yawpitch:python-generator

Conversation

@yawpitch
Copy link
Contributor

@yawpitch yawpitch commented Oct 2, 2018

First pass that actually functions; there's definitely going to be some need for additional passes at the work of converting the canonical input dict to arguments that can be passed to Python functions; the canonical data is all over the place from exercise to exercise, and simply treating the dict as keyword arguments doesn't work well because quite a few exercises use keys that are legal dict keys but illegal keyword arguments (ie numbers, language reserved words).

But I think this is a decent start for Hacktoberfest.

def _get_properties(obj):
if "cases" in obj:
for case in obj["cases"]:
yield from get_properties(case)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis-CI build fails because this syntax is not valid in Python 2.7.

@yawpitch
Copy link
Contributor Author

yawpitch commented Oct 8, 2018

The yield from has been replaced with 2.7 compatible form.

Copy link
Contributor

@cmccandless cmccandless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not my conclusive review comments, just my initial thoughts. I will review this more thoroughly tomorrow or the next day.

import unittest
{{ make_import(data) }}

# Tests adapted from `problem-specifications//canonical-data.json` @ {{ data.version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be v{{ data.version}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed and pushed.

return repr(case["expected"])


def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend the following modification:

def main(args=None):  # where args may be a list of str
...
	parser.parse_args(args)  # uses sys.argv if args is None

This will make automating testing of generated test suites much easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, hadn't thought of that use case. Fixed and pushed.

@cmccandless
Copy link
Contributor

As is, the following exercises generate passing test suites:

  • hello-world,
  • reverse-string,
  • isogram,
  • pangram,
  • rna-transcription,
  • atbash-cipher,
  • acronym,
  • flatten-array,
  • pig-latin,
  • scrabble-score,
  • robot-name,
  • rail-fence-cipher,
  • bracket-push,
  • dot-dsl,
  • tree-building,
  • poker,
  • zebra-puzzle,
  • simple-linked-list,
  • linked-list,
  • protein-translation,
  • error-handling,
  • bank-account,
  • ledger

Several others ought to be passing (leap, bob, etc), but have mis-matching property names (is_leap_year vs leap_year, etc).

I would say that's a fantastic start!

cmccandless
cmccandless previously approved these changes Oct 9, 2018
@yawpitch
Copy link
Contributor Author

yawpitch commented Oct 9, 2018

Bit weirded out by some of the items on that list of passing generated tests ... especially since I don't think there's canonical data for, for instance, linked-list or simple-linked-list.

And yeah, there's going to have to be some special-casing if we want the property names to continue to map to what they've been called in previously manually-generated tests ... there's no completely reliable pattern to the interpolations that people have made over time.

@cmccandless
Copy link
Contributor

cmccandless commented Oct 9, 2018

especially since I don't think there's canonical data for, for instance, linked-list or simple-linked-list.

Good point. I ran a pretty naive script to check for passing exercises, so I haven't looked closely at those yet. Found the issue with my script; I don't verify that the test suite was modified before running pytest, so the known-passing test suite is what is running there.

Also, I'm for some reason any exercise that has error cases is indenting the test functions an extra level, making them undetectable. I'm not sure where the extra indentation is coming from, but I've confirmed that it doesn't occur if the if data.has_error block is not present.

One more thing: the line spacing around the canonical data reference is incorrect: is should be 2 blank lines above, 1 below.

@cmccandless
Copy link
Contributor

Here is the updated list of passing exercises, excluding those without canonical data:

  • hello-world
  • reverse-string
  • isogram
  • pangram
  • rna-transcription
  • isbn-verifier
  • atbash-cipher
  • acronym
  • flatten-array
  • pig-latin
  • scrabble-score
  • rail-fence-cipher
  • bracket-push
  • poker
  • zebra-puzzle
  • protein-translation

@yawpitch
Copy link
Contributor Author

yawpitch commented Oct 9, 2018

I'll try and take a look at those two issues tonight; failing that it might have to be a in a day or two.

{% endmacro %}

{% macro add_assert_raises(case) %}
self.assertRaisesRegex({{ case.property | to_snake }}({{ case | format_input }}), {{ case | format_expect }})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, this macro should be:

with self.assertRaisesRegex({{ case.error_type | to_camel }}):
	{{ case.property | to_snake }}({{ case | format_input }})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, case.error_type will be ValueError. However, there are some exercises that define their own error types. I suspect some sort of file will have to be created that defines exceptions to the "use ValueError" rule (perhaps yaml would be a good choice for this?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about that one. Have you got a specific example of an exercise that does this?

Copy link
Contributor

@cmccandless cmccandless Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, see now that's a good example ... someone has taken the canonical notion of some error being thrown and decided that a custom class name is required. That's an interpolation, where someone has unintentionally broken the ability to automatically generate these tests. And in fact the implementation of that error in the student's version is going to remain the two-line, do-nothing subclass of Exception that is provided in the slug ... so we're not testing the students work, we're testing the test writer's.

Personally I'd wonder if that's an opportunity to break backwards compatibility and simply check that any exception is being raised with the appropriate error message, rather than a specific, test writer defined exception.

Since the error message is all we get from the canonical data, that seems prudent, but it runs the risk of sacrificing the validity of past Community solutions on the altar of a smoother and more efficient rollout process of tests for future (uncertain) changes to the canonical data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, however that might make more sense as an exercise to itself, as the exception hierarchy system in Python is quite dense (and also rather unique) ... like I'd love to have something that shows why you never except: do_something() in production code, but that involves something far more dense than is likely to be able to be defined in canonical data, much less built via an automated script.

I mean personally I hate what we're doing right now, where we're policing a specific error message (in English), but I cannot see a pathway by which we convert what little we know about the error state being handled in the canonical data to something we could get imported in the template, unless we leave that up to individual, per-exercise template and configuration.

Which I'm not ruling out, it just sounds like a lot of complexity for a relatively small number of known cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exercise that required the student to handle some of the vast range of possible OS error symbols defined in errno would be brilliant, BTW ... and they're relatively standard, which means there might be both some overlap with other languages and a meaningful way of writing canonical properties that would meaningfully map to meaningful tests that could also be meaningfully generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just said meaningful way too much ... it's quite late here. I'll come back to this tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean personally I hate what we're doing right now, where we're policing a specific error message

That's exactly what I'm trying to avoid. All we currently do is ensure that there is indeed a message, and that it is not empty; the Python track does not check for the verbatim error message described in the canonical data.

Which I'm not ruling out, it just sounds like a lot of complexity for a relatively small number of known cases.

It is; I think it could be done independent of whether the exercise has its own unique template or not. Something like this might be the simplest we could get:

# generators/errors.yml
forth:
  errors_if_there_is_only_one_value_on_the_stack: IndexError
# __macros.j2
{% macro add_assert_raises(case) %}
with self.assertRaisesRegex({{ case | error_type }}):
    {{ case.property | to_snake}}({{ case | format_input }})
{% endmacro %}
# generate.py
import yaml
with open('errors.yml') as f:
    errors = yaml.load(f)

def error_type(case):
	case_name = to_snake(case['description'])
	current_exercise = ???
	return errors[current_exercise].get(case_name, 'ValueError')

It's still a little messy though, and I agree that it does seem like it might be to much effort for too few use cases.

An exercise that required the student to handle some of the vast range of possible OS error symbols defined in errno would be brilliant, BTW ... and they're relatively standard, which means there might be both some overlap with other languages and a meaningful way of writing canonical properties that would meaningfully map to meaningful tests that could also be meaningfully generated.

Sounds like a good exercise candidate; you could create an issue in problem-specifications to take the idea further!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, been dragged away for a few days... mentoring queue keeps getting long, and I live on a boat and have to move it fairly often. I can see where you're going above, and yes, I do think it might be overkill, but it's not too bad an idea. Will need to figure out a means of differentiating a builtin alternative to ValueError from a non-builtin that must be imported from the exercise.

Do you thing an errors.yml for all exercises is better than an [exercise].yml with an errors section? I'm thinking the latter is better ... if no directory for the exercise exists, use the default template and no error (or other) config. If the directory exists, check it for both an exercise-specific template and an exercise-specific config.

@cmccandless
Copy link
Contributor

Also, I'm for some reason any exercise that has error cases is indenting the test functions an extra level, making them undetectable. I'm not sure where the extra indentation is coming from, but I've confirmed that it doesn't occur if the if data.has_error block is not present.

See https://github.com/yawpitch/python/pull/1

One more thing: the line spacing around the canonical data reference is incorrect: is should be 2 blank lines above, 1 below.

I've determined this is caused by the yapf auto-formatting. In particular, the style configuration item blank_lines_around_top_level_definition=2. However, other formatting is incorrect if that item is changed to 1.

@cmccandless
Copy link
Contributor

cmccandless commented Oct 13, 2018 via email

@yawpitch
Copy link
Contributor Author

To clarify, you mean directly in python/exericses/[exercise]?

@cmccandless
Copy link
Contributor

cmccandless commented Oct 13, 2018 via email

@yawpitch
Copy link
Contributor Author

yawpitch commented Oct 13, 2018 via email

Fix whitespace formatting for exception handling. Handles a few
different types of canonical input and expect kinds, but this
will need additional thought. Should expand the number of tests
that pass after auto-generation.
@yawpitch
Copy link
Contributor Author

Ok, just did a push that should, in theory, get several more tests working. And there's a lot of ones that should be pretty easy fixes, just going to have to figure out which is more important, using the property names from the canonical data or retaining the legacy names that are in the tests.

Then there are a lot of weird edge cases, where inputs and expectations have been put into arbitrary structures. I think we're probably going to have to define a small suite of format functions for those, then use per-exercise configuration to say which one(s) to use. We could also define a name map that says property X == test name Y in there.

@cmccandless cmccandless dismissed their stale review October 15, 2018 17:36

Changes made, still WIP. Will review again later.

@cmccandless
Copy link
Contributor

Apologies for the late response. This one slipped between the cracks in my workflow somehow.

Then there are a lot of weird edge cases, where inputs and expectations have been put into arbitrary structures. I think we're probably going to have to define a small suite of format functions for those, then use per-exercise configuration to say which one(s) to use. We could also define a name map that says property X == test name Y in there.

I think it's perfectly reasonable to have individual templates. How those templates handle the test generation is up to the implementer of that template (and subject to sensible review).

@yawpitch
Copy link
Contributor Author

yawpitch commented Nov 9, 2018

Agreed, and my turn to apologize, missed this notification in a flood of GitHub emails. Haven't had a lot of time free to work on this one, I'm afraid. Hopefully I can take a longer look at it again soon.

@cmccandless
Copy link
Contributor

@yawpitch any movement on this?

@yawpitch
Copy link
Contributor Author

yawpitch commented Dec 7, 2018 via email

First pass at per-exercise configuration; it's crude and we still need
to figure out how to map canonical properties to classes, as well as
needing a few different ways of handling argument variations ... but
it's getting a surprisingly large number of them to passing.
@yawpitch
Copy link
Contributor Author

yawpitch commented Dec 9, 2018

Ok I've taken a solid first pass at per-exercise configuration and been able to get a significant chunk of exercises to the point where they're generating what I believe are workable tests. There are in some cases some substantial changes to the format of the test files themselves, as the various implementers took a rather wide range of approaches to the naming of tests and the layout of inputs and outputs ... I've gone for consistency over trying to keep a small diff.

That said the diff will be much smaller if you run yapf -i on the existing test file. I just did find ./exercises -name '*_test.py' | xargs yapf -i and saved myself a lot of parsing through whitespace.

Things I haven't done: tried to handle any exercise that uses a class instead of functions; these will require their own template, though I've got some ideas about how to handle some of the simpler ones in configuration.

Also haven't tried to deal with any of the many different ways that input arguments have been re-mapped from the canonical form ... several exercises flip positional places, others use keywords derived from the canonical properties, others just make things up ... will need to implement some more flexible ways of handling these as functions that get enabled via configuration.

Overall though should get us a lot closer on a lot more exercises.

__pycache__

# virtual environments
venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@cmccandless cmccandless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code changes look great. I won't have the time until maybe next week to fetch your changes and mess around with the generated tests.

- description: "encode decode"
property: "decode"
input: 'encode("Testing, 1 2 3, testing.")'
expected: "testing123testing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for track-specific cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the input value is a string it'll insert it as is, so property(STRING)... allows a little more flexibility for cases like this.

try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is for Python2/3 compatibility. Would you mind adding comments specifying which is which?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually related to 2/3 ... if PyYAML was installed from the wheel or libyaml is otherwise available this gives a big speed boost by using the C extension, otherwise it falls back on the pure-Python version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. I should've taken that hint from the naming schemes. Still would you mind adding a comment explaining that for easy future reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

yawpitch and others added 4 commits December 11, 2018 15:35
Add comments to explain the yaml import statement.
- -o/--output is not -d/--output-dir
- -o/--only is now used to specify a single exercise for which to generate tests
Copy link
Contributor

@cmccandless cmccandless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forked your work in my own fork so I could test some ideas. Please check out this branch in my fork or this diff against your current revision (https://github.com/yawpitch/python/commit/e93669ea589c4894563583977acd73cc0577bf58) and let me know what you think.

I also think it would be a good idea to document what is valid in generate.yml. I would be willing to help out with writing this document.

default="./config.json",
help="path to the Python track config.json file: (%(default)s)")
parser.add_argument(
"-o",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I recommend the following?

$ generate.py -h
...
-o EXERCISE, --only EXERCISE            generate tests for just the exercise specified
-d DIRECTORY, --output-dir DIRECTORY    path to the output directory: (./exercises)

configlet uses the -o/--only flag like this, consistency among maintainer tools would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the -o flag; I'll look at your changes soon as I possibly can; I'm in the middle of a project right now that's eating up most of my time.

@cmccandless
Copy link
Contributor

Any updates on this?

This ended up being a much bigger project than I though; thanks for all your work on it so far! And to think I thought we could finished this the last week of September!

@yawpitch
Copy link
Contributor Author

yawpitch commented Jan 30, 2019 via email

@cmccandless
Copy link
Contributor

That's quite alright. I don't have a lot of time for it myself, but it would be good if, when I do have some bandwidth, I could contribute directly to the work done so far. If that is fine with you, feel free to add me as a contributor on your fork.

@yawpitch
Copy link
Contributor Author

Happy for you to. I've sent the add request.

@yawpitch
Copy link
Contributor Author

yawpitch commented Sep 9, 2019

Closing this as it's woefully out of date and we've gone down a different path.

@yawpitch yawpitch closed this Sep 9, 2019
@yawpitch yawpitch deleted the python-generator branch February 13, 2020 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants