[Blob][QuickQuery]Add Quick Query Support #10946

xiafu-msft · 2020-04-21T02:18:42Z

No description provided.

sdk/storage/azure-storage-blob/setup.py

annatisch · 2020-04-23T15:47:42Z

sdk/storage/azure-storage-blob/azure/storage/blob/_quick_query_helper.py

+                                    parsed_result['description'],
+                                    parsed_result['position']) if parsed_result.get('fatal') else None
+            if progress_callback:
+                progress_callback(fatal, self.bytes_processed, self.total_bytes)


We should remove this progress callback - instead we can make this object iterable, so that if a user needs this information that can iterate over the data in the result.

fatal, bytes_processed and total bytes are service defined schema, they are mixed with data in query response body. That means when we use avro to parse the query response body, avro could give us some data(not guaranteed a full record), fatal, bytes_processed or total bytes. If we parse the record for users, we bytes_processed will not make sense, because we can only give user the full record even more bytes for the next record has been returned by arvo parser.

Besides the record could be JSON, or CSV type, the record separator and column separator etc. are based on what user specified which make it more difficult to extract the result from

Also currently query doesn't allow us to do range download.. Once we trigger query, it will keep returning data until there's no more result. So we cannot use ItemPaged....

We had a discussion across all platforms and think make it has a similar interface as download, we will have further discussion if you feel strongly about making it iterable.

Thoughts @annatisch ?

sdk/storage/azure-storage-blob/azure/storage/blob/_models.py

annatisch · 2020-04-23T16:01:20Z

sdk/storage/azure-storage-blob/azure/storage/blob/_blob_client.py

+        return options
+
+    @distributed_trace
+    def quick_query(self, query_expression,  # type: str


I know the feature is called quick query.... but I kinda think maybe the API should be called something different as this name doesn't really describe what it does.... though that would be a cross-language discussion.

renamed to query, all platform are using this name now.

annatisch · 2020-05-29T23:20:05Z

sdk/storage/azure-storage-blob/azure/storage/blob/_blob_client.py

+        return options
+
+    @distributed_trace
+    def query(self, query_expression,  # type: str


I think this should be
def query_data(self, query, **kwargs)
or
def query_blob_data(self, query, **kwargs)

Thoughts?

the returned items are not blobs, it's the partial content of a single blob(eg. the first column of a csv file)

all the platforms are calling it query, so do we want all platforms to rename it?

annatisch · 2020-05-29T23:25:00Z

sdk/storage/azure-storage-blob/azure/storage/blob/__init__.py

    CustomerProvidedEncryptionKey,
-    ContainerEncryptionScope
+    ContainerEncryptionScope,
+    QuickQueryError,


Can we rename both QuickQueryError and QuickQueryReader?

what would you suggest?

sdk/storage/azure-storage-blob/azure/storage/blob/_models.py

* [Storage]STG73 * [Blob][Swagger]Update Swagger (#10943) * [Blob][Swagger]Regenerate Swagger Code * fix container test failure caused by list_containers include type change * [Storage] Internal avro parser. (#10764) * initial avro parser * try fixing test... * falling in love with python compatibility... * make linter happy. * raise StopIteration when there is no more bytes instead of tracking file length. * async avro parser * fix syntax for Python 3.5 * get rid of 'readers_schema' as we only honor schema that has been written to file ('writer_schema'). * pr feedback * trim unused code. * pr feedback. * simplify skip sync in next. * move avro tests from _shared. * Jumbo blob support (#11176) * wip * initial test coverage. * wip. * wip * single upload. * add async tests. * disable 50k block tests. * datalake append. * async datalake * disable tests that send large payload over network. * pr feedback. * Undelete share (#11394) * Undelete container (#11339) * [Storage][Blob] Added support for Object Replication (#11525) * Blob versioning (#11154) * [Blob][QuickQuery]Add Quick Query Support (#10946) * [Blob][STG73]Blob Tags (#11418) * regenerate code (#11964) * fix the bug which caused only showing fatal error (#11997) * [Storage][STG73]Address API Review Comments (#12111) * [Storage][STG73]Address API Review Comments * [Storage][STG73]dict<policy, rules> -> list(ObjectReplicationPolicy) * fix blob tag_value test * expose ObjectReplicationPolicy and ObjectReplicationRule, fix test * fix test * Changefeed (#10755) * [ChangeFeed]Add ChangeFeed Package * test_avro failure * update dev_requirement.txt * change namespace to azure.storage.blob.changefeed * address comments * optimize memory when reading changefeed events * namespace change * set up package change * fix failed tests * readme and kwargs * Update sdk/storage/azure-storage-blob-changefeed/azure/storage/blob/changefeed/_change_feed_client.py Co-authored-by: Rakshith Bhyravabhotla <[email protected]> * address comments * 'azure-storage-blob>=12.3.0' which does not match the frozen requirement 'azure-storage-blob~=1.3' Co-authored-by: Rakshith Bhyravabhotla <[email protected]> * [Storage-Blob] Quick Query API (#11991) * Renamed query error * Renamed query reader * Updated config models * Updated format request params * Updated iterator * fix the bug which caused only showing fatal error * Updated Error message * Fixed query helper * Started test conversion * small fix * Fixed tests * Updated error handling + json model * Updated recordings * Removed old recording * Added iter tests * Iter test recordings * Fix test * Remove extra recording * Fix pylint * Some docs cleanup * Renamed iter_records -> iter_stream * Review feedback * Updated tests * Missing commas * Fix syntax * Fix pylint Co-authored-by: xiafu <[email protected]> * tag sas (#12258) * tag sas * disable undelete_container * pylint * skip undelete_container tests * [Blob][Versioning]Disable Versioning Live Test (#12281) * [Blob][QQ]Default output_format to input_format (#12283) * [Storage][Jumbo]Remove super (#12314) * [Storage][JumboBlob]remove empty super() * pypy3 * change sas version to latest * set tags account location to central canada * re-recording queue * changefeed paths generator * mark tests for vid as playback only * fix changefeed * fix pylint make the test account location for tags to central canada * add a delay before calling find_blobs_by_tags * remove tags header * mark a large file test playback only * revert "mark a large file test playback only" skip upload large file test address comment * move tag permission and filter_by_tags permission to kwargs Co-authored-by: Kamil Sobol <[email protected]> Co-authored-by: Ze Qian Zhang <[email protected]> Co-authored-by: Rakshith Bhyravabhotla <[email protected]> Co-authored-by: annatisch <[email protected]>

xiafu-msft changed the base branch from master to feature/storage-stg73 April 21, 2020 02:18

xiafu-msft changed the title Qq [Blob][QuickQuery]Add Quick Query Support Apr 21, 2020

xiafu-msft requested review from annatisch and lmazuel April 21, 2020 02:23

annatisch reviewed Apr 23, 2020

View reviewed changes

sdk/storage/azure-storage-blob/setup.py Outdated Show resolved Hide resolved

annatisch reviewed Apr 23, 2020

View reviewed changes

sdk/storage/azure-storage-blob/azure/storage/blob/_models.py Outdated Show resolved Hide resolved

annatisch reviewed Apr 23, 2020

View reviewed changes

xiafu-msft force-pushed the feature/storage-stg73 branch from c8b24fe to 8f45e1a Compare May 2, 2020 00:47

xiafu-msft force-pushed the feature/storage-stg73 branch from 8f45e1a to d08d639 Compare May 12, 2020 21:16

annatisch added APIChange This PR contains an addition or change to the API signature and must be reviewed by an architect. Storage Storage Service (Queues, Blobs, Files) labels May 15, 2020

xiafu-msft added 2 commits May 27, 2020 13:07

[Blob][QuickQuery]Add Quick Query Support

a81248b

address comments

6f6613b

xiafu-msft force-pushed the QQ branch from a39aacb to 6f6613b Compare May 27, 2020 22:15

annatisch reviewed May 29, 2020

View reviewed changes

sdk/storage/azure-storage-blob/azure/storage/blob/_models.py Outdated Show resolved Hide resolved

annatisch reviewed May 29, 2020

View reviewed changes

sdk/storage/azure-storage-blob/azure/storage/blob/_models.py Show resolved Hide resolved

optimize memory, address comments

b88b816

xiafu-msft marked this pull request as ready for review June 10, 2020 07:15

xiafu-msft requested review from amishra-dev, rakshith91 and zezha-msft as code owners June 10, 2020 07:15

xiafu-msft requested a review from annatisch June 10, 2020 07:16

pylint fix

380800b

xiafu-msft merged commit 48d1e50 into Azure:feature/storage-stg73 Jun 10, 2020

xiafu-msft added a commit that referenced this pull request Jun 10, 2020

[Blob][QuickQuery]Add Quick Query Support (#10946)

86a3712

xiafu-msft mentioned this pull request Jun 23, 2020

[Storage-Blob] Quick Query API #11991

Merged

[Blob][QuickQuery]Add Quick Query Support #10946

[Blob][QuickQuery]Add Quick Query Support #10946

Uh oh!

Conversation

xiafu-msft commented Apr 21, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annatisch May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

annatisch May 29, 2020 •

edited

Loading