-
Notifications
You must be signed in to change notification settings - Fork 8
Feature pull client data from object store #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
joyceyuu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. Just few questions
| api.dbinit.image.repository:$(backendImageName) | ||
| workers.image.repository:$(backendImageName) | ||
| workers.image.pullPolicy:Always | ||
| anonlink.objectstore.secure:false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering how yaml file can recognise anonnlink here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These overrides are translated into command line arguments to helm template much like setting individual values when deploying a chart (e.g., --set anonlink.objectstore.secure=false). The path in question anonlink.objectstore is defined in the values.yaml file.
You can see the final command here:
/opt/hostedtoolcache/helm/2.16.3/x64/linux-amd64/helm template /home/vsts/work/1/s/deployment/entity-service --name azk09d6034 --namespace test-azure --set global.postgresql.postgresqlPassword=notaproductionpassword --set api.ingress.enabled=false --set workers.replicaCount=4 --set api.www.image.repository=data61/anonlink-nginx --set api.www.image.pullPolicy=Always --set api.app.image.repository=data61/anonlink-app --set api.app.image.pullPolicy=Always --set api.dbinit.image.repository=data61/anonlink-app --set workers.image.repository=data61/anonlink-app --set workers.image.pullPolicy=Always --set anonlink.objectstore.secure=false --set anonlink.objectstore.uploadEnabled=true --set anonlink.objectstore.uploadAccessKey=testUploadAccessKey --set anonlink.objectstore.uploadSecretKey=testUploadSecretKey --set minio.accessKey=testMinioAccessKey --set minio.secretKey=testMinioSecretKey
| The caller must have both the `project_id` and a valid `upload_token` in order to contribute data, | ||
| both of these are generated when a project is created. | ||
| This endpoint can directly accept uploads up to several hundred MiB, and can pull encoding data from | ||
| an object store for larger uploads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 💯
| - type: integer | ||
|
|
||
|
|
||
| BlockMap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is BlockMap compulsory? What if there are only encodings from external object. Would BlockMap become arrays of same blockid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it isn't required, see above where it is referenced - only the encodings are required:
EncodingUpload:
description: Object that contains one data provider's encodings
type: object
required: [encodings]
properties:
encodings:
...
blocks:
oneOf:
- $ref: '#/components/schemas/BlockMap'
What if there are only encodings from external object. Would BlockMap become arrays of same blockid?
blocks (schema/type BlockMap) would be the same regardless of whether the encodings are externally provided or directly provided. The blocks are a map from encoding id to block ids:
{
"1": ["block1", "block2"],
"2": ["block2"]
}| - name: postgresql | ||
| version: 8.3.0 | ||
| repository: https://kubernetes-charts.storage.googleapis.com | ||
| version: 8.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often would you recommend us to check the version of dependent charts? Would there be similar dependency bot for charts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it anytime you change the kubernetes deployment template. I haven't seen an automatic dependabot like thing for helm chart dependencies.
| 1. Get the entity service URL by running: | ||
|
|
||
| {{- if contains "NodePort" .Values.api.service.type }} | ||
| {{- if eq .Values.api.service.type "NodePort" "ClusterIP" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we wrap code in double curly bracket? Is it some Javascript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is helm template specific - a combination of the Go template language with some extra functions and helm deployment data. See the docs here
wilko77
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal for this data pull from object store feature is that we can support population sized linkages. Without blocking, this is not really achievable. After all the hard work you put in we are so close now to achieve this. Why did you decide to not handle blocking information as well?
| #- $ref: '#/ExternalData' | ||
|
|
||
| EncodingArray: | ||
| description: Array of encodings, comprising the base64-encoding (and encoding id?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an API defines the format, not asks for clarification...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, yeah I wasn't sure about taking the extra step at this point to allow the user to provide the encoding ids. I decided it wasn't part of this feature. I'll remove the question from the api spec
| blocks: | ||
| oneOf: | ||
| - $ref: '#/components/schemas/BlockMap' | ||
| ## TODO may be useful to handle external blocking data too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely. With very small blocks, the blocking info is in the same order of size as the encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be part of a follow up PR
| filename = None | ||
| # Set the state to 'pending' in the uploads table | ||
| with DBConn() as conn: | ||
| db.insert_encoding_metadata(conn, filename, dp_id, receipt_token, encoding_count=count, block_count=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's with the filename = None business?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a way of keeping the column in the uploads table set to NULL.
| # # Now work out if all parties have added their data | ||
| # if clks_uploaded_to_project(project_id): | ||
| # logger.info("All parties data present. Scheduling any queued runs") | ||
| # check_for_executable_runs.delay(project_id, serialize_span(parent_span)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is check_for_executable_runs called in the case of external data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that looks like an oversight on my part. Clearly prequeued runs with external data are not covered by a test :-/
I hope this wasn't a surprise given it was mentioned in the design doc. I specified that only encodings would be supported in the first implementation due to the time available. I wasn't sure if we'd have to create a new binary format for blocking info and that didn't seem too related to the core functionality of "pulling data". However, I have developed a branch that supports external blocking information (see #551). |
…building docker images
…pull data from object store
85e1543 to
54eb646
Compare
|
Changes incorporated in PR #551 |
Builds on #548
Adds the REST api changes and backend changes to pull binary encoding data from an object store.