Skip to content

Conversation

@hardbyte
Copy link
Collaborator

@hardbyte hardbyte commented Apr 24, 2020

Builds on #548

Adds the REST api changes and backend changes to pull binary encoding data from an object store.

@hardbyte hardbyte requested review from joyceyuu and wilko77 April 24, 2020 05:20
@hardbyte hardbyte marked this pull request as ready for review April 26, 2020 21:04
Copy link
Contributor

@joyceyuu joyceyuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me. Just few questions

api.dbinit.image.repository:$(backendImageName)
workers.image.repository:$(backendImageName)
workers.image.pullPolicy:Always
anonlink.objectstore.secure:false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering how yaml file can recognise anonnlink here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These overrides are translated into command line arguments to helm template much like setting individual values when deploying a chart (e.g., --set anonlink.objectstore.secure=false). The path in question anonlink.objectstore is defined in the values.yaml file.

You can see the final command here:

/opt/hostedtoolcache/helm/2.16.3/x64/linux-amd64/helm template /home/vsts/work/1/s/deployment/entity-service --name azk09d6034 --namespace test-azure --set global.postgresql.postgresqlPassword=notaproductionpassword --set api.ingress.enabled=false --set workers.replicaCount=4 --set api.www.image.repository=data61/anonlink-nginx --set api.www.image.pullPolicy=Always --set api.app.image.repository=data61/anonlink-app --set api.app.image.pullPolicy=Always --set api.dbinit.image.repository=data61/anonlink-app --set workers.image.repository=data61/anonlink-app --set workers.image.pullPolicy=Always --set anonlink.objectstore.secure=false --set anonlink.objectstore.uploadEnabled=true --set anonlink.objectstore.uploadAccessKey=testUploadAccessKey --set anonlink.objectstore.uploadSecretKey=testUploadSecretKey --set minio.accessKey=testMinioAccessKey --set minio.secretKey=testMinioSecretKey

The caller must have both the `project_id` and a valid `upload_token` in order to contribute data,
both of these are generated when a project is created.
This endpoint can directly accept uploads up to several hundred MiB, and can pull encoding data from
an object store for larger uploads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 💯

- type: integer


BlockMap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is BlockMap compulsory? What if there are only encodings from external object. Would BlockMap become arrays of same blockid?

Copy link
Collaborator Author

@hardbyte hardbyte Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't required, see above where it is referenced - only the encodings are required:

EncodingUpload:
      description: Object that contains one data provider's encodings
      type: object
      required: [encodings]
      properties:
        encodings:
          ...
        blocks:
          oneOf:
            - $ref: '#/components/schemas/BlockMap'

What if there are only encodings from external object. Would BlockMap become arrays of same blockid?

blocks (schema/type BlockMap) would be the same regardless of whether the encodings are externally provided or directly provided. The blocks are a map from encoding id to block ids:

{
  "1": ["block1", "block2"],
  "2": ["block2"]
}

- name: postgresql
version: 8.3.0
repository: https://kubernetes-charts.storage.googleapis.com
version: 8.9.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often would you recommend us to check the version of dependent charts? Would there be similar dependency bot for charts?

Copy link
Collaborator Author

@hardbyte hardbyte Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it anytime you change the kubernetes deployment template. I haven't seen an automatic dependabot like thing for helm chart dependencies.

1. Get the entity service URL by running:

{{- if contains "NodePort" .Values.api.service.type }}
{{- if eq .Values.api.service.type "NodePort" "ClusterIP" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we wrap code in double curly bracket? Is it some Javascript?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is helm template specific - a combination of the Go template language with some extra functions and helm deployment data. See the docs here

Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal for this data pull from object store feature is that we can support population sized linkages. Without blocking, this is not really achievable. After all the hard work you put in we are so close now to achieve this. Why did you decide to not handle blocking information as well?

#- $ref: '#/ExternalData'

EncodingArray:
description: Array of encodings, comprising the base64-encoding (and encoding id?).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an API defines the format, not asks for clarification...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, yeah I wasn't sure about taking the extra step at this point to allow the user to provide the encoding ids. I decided it wasn't part of this feature. I'll remove the question from the api spec

blocks:
oneOf:
- $ref: '#/components/schemas/BlockMap'
## TODO may be useful to handle external blocking data too
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely. With very small blocks, the blocking info is in the same order of size as the encodings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be part of a follow up PR

filename = None
# Set the state to 'pending' in the uploads table
with DBConn() as conn:
db.insert_encoding_metadata(conn, filename, dp_id, receipt_token, encoding_count=count, block_count=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's with the filename = None business?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a way of keeping the column in the uploads table set to NULL.

# # Now work out if all parties have added their data
# if clks_uploaded_to_project(project_id):
# logger.info("All parties data present. Scheduling any queued runs")
# check_for_executable_runs.delay(project_id, serialize_span(parent_span))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is check_for_executable_runs called in the case of external data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that looks like an oversight on my part. Clearly prequeued runs with external data are not covered by a test :-/

@hardbyte
Copy link
Collaborator Author

The goal for this data pull from object store feature is that we can support population sized linkages. Without blocking, this is not really achievable. After all the hard work you put in we are so close now to achieve this. Why did you decide to not handle blocking information as well?

I hope this wasn't a surprise given it was mentioned in the design doc. I specified that only encodings would be supported in the first implementation due to the time available. I wasn't sure if we'd have to create a new binary format for blocking info and that didn't seem too related to the core functionality of "pulling data". However, I have developed a branch that supports external blocking information (see #551).

@hardbyte hardbyte force-pushed the feature-data-pull-endpoint branch from 85e1543 to 54eb646 Compare April 28, 2020 05:06
@hardbyte hardbyte requested a review from wilko77 April 28, 2020 23:55
@hardbyte
Copy link
Collaborator Author

Changes incorporated in PR #551

@hardbyte hardbyte closed this Apr 30, 2020
@hardbyte hardbyte deleted the feature-data-pull-endpoint branch April 30, 2020 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants