Skip to content

Add content_encoding and cache_control to gcp_cloud_storage sink #24505

@benjamin-awd

Description

@benjamin-awd

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

GCS automatically transcodes gzipped files based on the Content-Encoding of a file which can be problematic for certain downstream consumers since it disables range requests and the Content-Length header does not match the file size expected by the client.

In particular, the Cache-Control header is very useful since it instructs GCS to bypass its automatic transcoding behavior and serve the file exactly as stored (compressed) if no-transform is passed.

This restores support for HTTP Range Requests and accurate Content-Length headers, allowing ClickHouse (and other parallel processing engines) to download and decompress the files correctly on their end.

Attempted Solutions

The current workaround is to store raw JSON without compression, but this adds up in terms of cost very quickly

Proposal

Add support for content_encoding and cache_control

References

No response

Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: featureA value-adding code addition that introduce new functionality.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions