-
Notifications
You must be signed in to change notification settings - Fork 8
Joyce feedback #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joyce feedback #467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review looking a github diff (which is not as nice for notebooks...).
But already I saw some major change required: the jupyter notebook related to the similarity score is not compatible with the current branch (but works against dev). More description in comment.
And in permutation tutorial, the schema has been downgraded to v1 while it was v3.
docs/tutorial/Permutations.ipynb
Outdated
| "%%writefile {schema.name}\n", | ||
| "{\n", | ||
| " \"version\": 3,\n", | ||
| " \"version\": 1,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema had been upgraded to v3, but in this PR, it get retrograded to v1 again.
| "[[0, 312], [1, 1253], 1.0]\n", | ||
| "[[0, 407], [1, 3743], 1.0]\n", | ||
| "[[0, 670], [1, 3550], 1.0]\n" | ||
| "[76, 2345, 1.0]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we arrive at the problem with tutorials not compatible with the current branch: what is returned by the currently deployed service at testing.es.data61.xyz is not compatible with the version on this branch where the similarity score output type has changed. So to test your tutorial for the state of this branch, you have to build and deploy the entity service locally and test against (actually, no need to build as the docker images should already be on dockerhub for this branch).
| matplotlib | ||
| recordlinkage | ||
| requests | ||
| pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alphabetical order please :)
|
A small helper: if you want to run deploy locally the entity-service based on this branch, run: |
gusmith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had tiny comment in the Similarity score tutorial, and a bigger comment for a nearly untouched tutorial multiparty-linkage-in-entity-service.ipynb: one comment from Joyce was to show the schema and the datasets from Alice, Bob and Alice. Maybe just the header would be nice.
| " --apikey=\"{credentials['result_token']}\" \\\n", | ||
| " --server \"{url}\" \\\n", | ||
| " --threshold 0.9 \\\n", | ||
| " --threshold 0.75 \\\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot point to the good line, but line 749, there is a typo:
"Now after some delay (depending on the size) we can fetch the mask." should not mention a mask. I guess we could just say "the result"? It is in the "Results" section, in the md cell.
| "scores_matches = []\n", | ||
| "scores_non_matches = []\n", | ||
| "for (_, a), (_, b), score in data:\n", | ||
| "for a, b, score in data:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎
addressed the feedback from the Trello card.