Skip to content

make DefaultLogicalExtensionCodec support serialisation of build in file formats #16944

@milenkovicm

Description

@milenkovicm

Is your feature request related to a problem or challenge?

At the moment ballista overrides LogicalExtensionCodec::try_decode_file_format & LogicalExtensionCodec::try_encode_file_format providing support for:

file_format_codecs: vec![
            Arc::new(ParquetLogicalExtensionCodec {}),
            Arc::new(CsvLogicalExtensionCodec {}),
            Arc::new(JsonLogicalExtensionCodec {}),
            Arc::new(ArrowLogicalExtensionCodec {}),
            Arc::new(AvroLogicalExtensionCodec {}),
        ],

as seen at 1.

Should we want to integrate ballista with datafusion python we would need to provide a custom LogicalExtensionCodec implementing same logic or reuse ballista LogicalExtensionCodec implementation, which would complicate integration a bit.

As this file types are supported out of the box in datafusion would it make sense to implement encoder/decoder or them in DefaultLogicalExtensionCodec?

Describe the solution you'd like

Should this proposal make sense, implement support for it in DefaultLogicalExtensionCodec similar to what is supported in ballista already:

fn try_decode_file_format(
      &self,
      buf: &[u8],
      ctx: &datafusion::prelude::SessionContext,
  ) -> Result<Arc<dyn datafusion::datasource::file_format::FileFormatFactory>> {
      let proto = FileFormatProto::decode(buf)
          .map_err(|e| DataFusionError::Internal(e.to_string()))?;

      let codec = self
          .file_format_codecs
          .get(proto.encoder_position as usize)
          .ok_or(DataFusionError::Internal(
              "Can't find required codec in file codec list".to_owned(),
          ))?;

      codec.try_decode_file_format(&proto.blob, ctx)
  }

  fn try_encode_file_format(
      &self,
      buf: &mut Vec<u8>,
      node: Arc<dyn datafusion::datasource::file_format::FileFormatFactory>,
  ) -> Result<()> {
      let mut blob = vec![];
      let (encoder_position, _) =
          self.try_any(|codec| codec.try_encode_file_format(&mut blob, node.clone()))?;

      let proto = FileFormatProto {
          encoder_position,
          blob,
      };
      proto
          .encode(buf)
          .map_err(|e| DataFusionError::Internal(e.to_string()))
  }

as seen at 2

Describe alternatives you've considered

alternative would be to keep everything as it is.

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is neededprotoRelated to proto crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions