@@ -38,18 +38,26 @@ The sections are:
3838- Specifications for implementation (including the REST API).
3939- decisions and alternatives considered. What would be the "best"/"easiest" approach.
4040
41+ ## User Stories
42+
43+ - As a user with a multi-gigabyte dataset on my machine and a slow, intermittent NBN internet connection, I want my upload
44+ to happen as quickly and smoothly as possible.
45+
46+ - As a user with an encoded dataset already available in an S3 bucket, I want Anonlink to fetch it directly so I don't
47+ have to download and then upload it.
4148
4249## Requirements
4350
44- - A method for requesting temporary & restricted object store credentials shall be added to the Anonlink Entity Service
45- API. The temporary credentials will authorize the holder to upload data, the credentials ** must not** provide access
46- to any other data providers encodings, or other working data stored in the object store.
47- - A method for a data provider to provide an object store URI pointing to the encoding data instead of uploading it
48- shall be added to the Anonlink Entity Service API. This method ** may** support pulling directly via http as well.
49- - A deployment option to expose the MinIO object store via an ingress will be added. As this increases the attack
50- surface an internal security review of MinIO must be conducted. Deployment using default or publicly committed
51- credentials must be mitigated.
52- - The maximum size of a clients' encodings will be 1TiB.
51+ - A user shall be able to point to data in an object store instead of directly uploading to Anonlink.
52+ - A user may be able to point to data via a http URL instead of directly uploading to Anonlink.
53+ - The system shall provide a mechanism to grant temporary & restricted object store credentials.
54+ - The system shall accept uploads of up to 1TiB of data.
55+ - In the event of an upload being interrupted, a user shall be able to resume the upload at a later stage without
56+ having to re-upload data already sent.
57+ - The system shall provide an option to expose the MinIO object store.
58+ - The client tool shall not share users object store credentials with the service without explicit direction.
59+ - The client tool shall support uploading to the service's object store.
60+ - The client tool may support uploading to an external object store.
5361
5462
5563## High Level Design
@@ -169,7 +177,9 @@ EncodingArray:
169177
170178Returns a set of temporary security credentials that the client can use to upload data to the object store.
171179
172- ` /authorize-external-upload `
180+ ` /projects/{project_id}/authorise-external-upload `
181+
182+ A valid upload token is required to authorise this call.
173183
174184The response will contain:
175185
@@ -181,31 +191,33 @@ The response will contain:
181191* ` upload `
182192 * ` endpoint ` (e.g. ` minio.anonlink.example.com ` )
183193 * ` bucket ` (e.g. ` anonlink-uploads ` )
184- * ` file ` (e.g. ` 2020/05/Z7hSjluf6gEbxxyy.json ` )
194+ * ` path ` (e.g. ` 2020/05/Z7hSjluf6gEbxxyy ` )
185195
186196This endpoint may fail if the object store does not support creating temporary credentials.
187197This feature may be entirely disabled in the server configuration, see ` settings.py ` and ` values.yaml ` .
188198
189- The temporary credentials must be configured to have a security policy only allowing uploads to a particular
190- bucket for a period of time. The client will not be able to list objects or retrieve objects from this bucket.
199+ The temporary credentials must be configured to have a security policy only allowing uploads to a particular folder in
200+ a bucket for a period of time. The client will not be able to list objects or retrieve objects from this bucket.
191201
192202An example policy:
193203
204+ ``` json
205+ {
206+ "Version" : " 2012-10-17" ,
207+ "Statement" : [
194208 {
195- "Version": "2012-10-17",
196- "Statement": [
197- {
198- "Action": [
199- "s3:PutObject"
200- ],
201- "Effect": "Allow",
202- "Resource": [
203- "arn:aws:s3:::anonlink-uploads/*"
204- ],
205- "Sid": "Upload-access-to-specific-bucket-only"
206- }
207- ]
208- }
209+ "Action" : [
210+ " s3:PutObject"
211+ ],
212+ "Effect" : " Allow" ,
213+ "Resource" : [
214+ " arn:aws:s3:::anonlink-uploads/2020/05/Z7hSjluf6gEbxxyy/*"
215+ ],
216+ "Sid" : " Upload-access-to-specific-bucket-only"
217+ }
218+ ]
219+ }
220+ ```
209221
210222
211223A possible future extension is to take advantage of MinIO's [ Security Token Service (STS)] ( https://docs.min.io/docs/minio-sts-quickstart-guide.html )
@@ -218,6 +230,35 @@ Relevant links:
218230* https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#generating-a-presigned-url-to-upload-a-file
219231* https://docs.min.io/docs/upload-files-from-browser-using-pre-signed-urls.html
220232
233+ * Test/prototype procedure*
234+
235+ Need to create a new user (can't use the root credentials for sts)
236+
237+ ``` sh
238+ $ docker run -it --entrypoint /bin/sh minio/mc
239+ # mc config host add minio http://192.168.1.25:9000 AKIAIOSFODNN7EXAMPLE wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
240+ # mc admin user add minio newuser newuser123
241+ # mc admin policy set minio writeonly user=newuser
242+ ```
243+
244+ Add a _ restricted_ section to your ` .aws/configure ` :
245+
246+ ```
247+ [restricted]
248+ aws_access_key_id = newuser
249+ aws_secret_access_key = newuser123
250+ region = us-east-1
251+ ```
252+
253+ In another terminal use the aws cli to fetch temp credentials:
254+
255+ ```
256+ aws --profile restricted --endpoint-url http://localhost:9000 sts assume-role --policy '{"Version":"2012-10-17","Statement":[{"Sid":"Stmt1","Effect":"Allow","Action":"s3:*","Resource":"arn:aws:s3:::*"}]}' --role-arn arn:xxx:xxx:xxx:xxxx --role-session-name anything
257+ ```
258+
259+ Alternatively use minio or boto3
260+
261+
221262### Client Side Specification
222263
223264The client side implementation will be in ` anonlink-client ` , there will be both a public, documented, Python API as well as
@@ -226,6 +267,54 @@ an implementation via the command line tool.
226267Uploading will be implemented using either [ ` boto3 ` ] ( https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html ) or
227268[ MinIO] ( https://docs.min.io/docs/python-client-api-reference ) at the implementors discretion.
228269
270+ In the default case, uploading via the client tool will involve making the three network requests in sequence:
271+
272+ - Retrieving temporary object store credentials.
273+ - Uploading encodings to the object store.
274+ - Informing the anonlink entity service of the upload.
275+
276+
277+ If the user already has object store credentials (e.g. for ` S3 ` ) they can upload without having to request
278+ temporary credentials. The anonlink client tool aims to deal with this transparently.
279+
280+ #### Example client tool usage
281+
282+ The default case using temporary credentials for our object store:
283+
284+ ``` sh
285+ $ anonlink upload mydata.csv < AUTH params etc>
286+ ```
287+
288+ Where the user wants our client to upload to their own bucket, optionally providing the AWS credential profile to use:
289+
290+ ``` sh
291+ $ anonlink upload mydata.csv [--profile easd] --upload-to=s3://my-own-bucket/mydata.csv < AUTH params etc>
292+ ```
293+
294+ If the user wants to use already uploaded data, they will have to either explicitly provide the anonlink entity service
295+ with credentials that are appropriate to share with the service, or explicitly request temporary credentials.
296+
297+ ``` sh
298+ $ anonlink upload s3://my-own-bucket/mydata.csv [--profile easd] --request-read-only-credentials \
299+ < AUTH params etc>
300+ ```
301+
302+ or
303+ ``` sh
304+ $ anonlink upload s3://my-own-bucket/mydata.csv [--profile easd] --share-aws-credentials-with-server \
305+ < AUTH params etc>
306+ ```
307+
308+ It is very important that the client doesn't assume it can share a user's AWS credentials with the service.
309+
310+ This means something like:
311+ ```
312+ $ anonlink upload s3://my-own-bucket/mydata.csv
313+ ```
314+
315+ Explicitly telling us could be via the additional command line arguments shown above, or an ` ~/.anonlink ` config file
316+ or via environment variables.
317+
229318#### Progress Handling
230319
231320The Python api to upload encodings to the object store will be written to support a user supplied callback. See example progress
@@ -238,16 +327,23 @@ used to show progress during hashing.
238327#### Error Handling
239328
240329Errors during upload detected by the object store client will be caught and raised as an ` AnonlinkClientError ` .
241- ` ResponseError ` exceptions should be caught and presented to the command line user without a traceback. The object store credentials
242- may be used during a retry attempt. If the object store credentials have expired the client may request new credentials from
243- the Anonlink Entity Service.
330+ ` ResponseError ` exceptions should be caught and presented to the command line user without a traceback. The object store
331+ credentials may be used during a retry attempt. If the object store credentials have expired the client may automatically
332+ request new credentials from the Anonlink Entity Service.
244333
245334### Deployment Changes
246335
247- Extra policy, user, & bucket must be created in MinIO.
336+ Extra policy, user, & bucket must be created in MinIO. This can be carried out by a new container
337+ ` init_object_store ` that includes the ` mc ` command line tool for Minio. In the kubernetes deployment
338+ this will run as a job similar to our ` init_db ` job.
339+
340+ An option to expose Minio will be added to the deployment.
341+ When the option is enabled, Minio will be exposed via an ingress.
248342
249343#### Ingress configuration
250344
345+ ** Domain** By default Minio will be available at the ` minio ` sub-domain of the service's domain/s.
346+
251347** Proxy** The ingress controller may need to be modified to support proxying large uploads through to MinIO.
252348
253349** TLS** As MinIO is going to be exposed to the internet it must be protected with a certificate.
@@ -262,6 +358,16 @@ making our server download illegal data, making our server download terabytes an
262358
263359If using our own object store we can dictate the bucket and file, but we want to support an external object store too.
264360
361+ As exposing Minio increases the attack surface an internal security review of MinIO must be conducted.
362+ Deployment using default or publicly committed credentials must be mitigated.
363+
364+ ## Alternatives (WIP)
365+
366+ The primary alternative that we considered was modification to the Anonlink REST api to handle partial uploads.
367+ The Anonlink api and client would be modified to deal with multipart uploads. The reason this design wasn't
368+ pursued is we would have to design and implement everything on both the server and client side - the retrying
369+ mechanisms, integrity checks, dropped connections, proxy handling etc.
370+
265371## Additional Considerations (WIP)
266372
267373Initially we will support pulling from an S3 like object store. How about direct HTTP/both? Is FTP still common?
0 commit comments