Skip to content

Directory model has very bad performance on large directories. #10

@eap

Description

@eap

In the _dir_model method there is a segment of code that calls self.get on every contained blob if contents are set (see below). This organization creates a massive slowdown when navigating to GCS directories with more than a dozen files.

I suspect it should be straightforward to refactor this to directly use the returned google.cloud.storage.Blob objects returned by bucket.list_blobs. Similarly for directories with many sub-directories, you should be able to use the list of prefixes directly rather than running self.get many times.

offending code:

def _dir_model(self, path, members, content=True):
    ...
    for blob in blobs:
        ...
        contents.append(self.get(

I've filed this bug for tracking purposes - I don't have the bandwidth to resolve the bug at present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions