[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType #22251

arunmahadevan · 2018-08-28T08:04:10Z

What changes were proposed in this pull request?

toAvroType converts spark data type to avro schema. It always appends the record name to namespace so its impossible to have an Avro namespace independent of the record name.

When invoked with a spark data type like,

val sparkSchema = StructType(Seq(
    StructField("name", StringType, nullable = false),
    StructField("address", StructType(Seq(
        StructField("city", StringType, nullable = false),
        StructField("state", StringType, nullable = false))),
    nullable = false)))
 
// map it to an avro schema with record name "employee" and top level namespace "foo.bar",
val avroSchema = SchemaConverters.toAvroType(sparkSchema,  false, "employee", "foo.bar")

// result is
// avroSchema.getName = employee
// avroSchema.getNamespace = foo.bar.employee
// avroSchema.getFullname = foo.bar.employee.employee

The patch proposes to fix this so that the result is

avroSchema.getName = employee
avroSchema.getNamespace = foo.bar
avroSchema.getFullname = foo.bar.employee

How was this patch tested?

New and existing unit tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

arunmahadevan · 2018-08-28T08:05:26Z

cc @gengliangwang @dongjoon-hyun

SparkQA · 2018-08-28T08:25:37Z

Test build #95338 has finished for PR 22251 at commit f474839.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2018-08-28T08:52:25Z

external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala

-        }
-
+        val childNameSpace = if (nameSpace != "") s"$nameSpace.$recordName" else recordName
        val fieldsAssembler = builder.record(recordName).namespace(nameSpace).fields()


+1, this line is the only difference for the whole code change. The namespace here should not be the one with recordName at the end.

HyukjinKwon · 2018-08-28T08:55:17Z

external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala

    }
  }

+  test("check namespace - toAvroType") {


@arunmahadevan, can we add a simple end-to-end test as well?

Its sort of covered in the below existing cases. Do you think we need more?

Validate namespace in avro file that has nested records with the same name
conversion to avro and back with namespace

gengliangwang · 2018-08-28T08:56:52Z

external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala

+        nullable = false)))
+    val employeeType = SchemaConverters.toAvroType(sparkSchema,
+      recordName = "employee",
+      nameSpace = "foo.bar")


nit: could you also add a case for nameSpace as "" ?

Added a test case for toAvroType with empty namespace

gengliangwang

LGTM, thanks for the fix!

SparkQA · 2018-08-28T17:59:31Z

Test build #95367 has finished for PR 22251 at commit 2153428.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-08-29T01:24:49Z

Merged to master.

`toAvroType` converts spark data type to avro schema. It always appends the record name to namespace so its impossible to have an Avro namespace independent of the record name. When invoked with a spark data type like, ```java val sparkSchema = StructType(Seq( StructField("name", StringType, nullable = false), StructField("address", StructType(Seq( StructField("city", StringType, nullable = false), StructField("state", StringType, nullable = false))), nullable = false))) // map it to an avro schema with record name "employee" and top level namespace "foo.bar", val avroSchema = SchemaConverters.toAvroType(sparkSchema, false, "employee", "foo.bar") // result is // avroSchema.getName = employee // avroSchema.getNamespace = foo.bar.employee // avroSchema.getFullname = foo.bar.employee.employee ``` The patch proposes to fix this so that the result is ``` avroSchema.getName = employee avroSchema.getNamespace = foo.bar avroSchema.getFullname = foo.bar.employee ``` New and existing unit tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#22251 from arunmahadevan/avro-fix. Authored-by: Arun Mahadevan <[email protected]> Signed-off-by: hyukjinkwon <[email protected]> (cherry picked from commit 68ec207) RB=2106636 BUG=LIHADOOP-53221 G=spark-reviewers R=mshen,ekrogen A=ekrogen

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType

f474839

gengliangwang reviewed Aug 28, 2018

View reviewed changes

HyukjinKwon reviewed Aug 28, 2018

View reviewed changes

gengliangwang reviewed Aug 28, 2018

View reviewed changes

gengliangwang approved these changes Aug 28, 2018

View reviewed changes

Added test case for toAvroType nested with empty namespace

2153428

arunmahadevan force-pushed the avro-fix branch from 5330c45 to 2153428 Compare August 28, 2018 17:35

asfgit closed this in 68ec207 Aug 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType #22251

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType #22251

Uh oh!

arunmahadevan commented Aug 28, 2018

Uh oh!

arunmahadevan commented Aug 28, 2018

Uh oh!

SparkQA commented Aug 28, 2018

Uh oh!

gengliangwang Aug 28, 2018

Uh oh!

HyukjinKwon Aug 28, 2018

Uh oh!

arunmahadevan Aug 28, 2018

Uh oh!

gengliangwang Aug 28, 2018

Uh oh!

arunmahadevan Aug 28, 2018

Uh oh!

gengliangwang left a comment

Uh oh!

SparkQA commented Aug 28, 2018

Uh oh!

HyukjinKwon commented Aug 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType #22251

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType #22251

Uh oh!

Conversation

arunmahadevan commented Aug 28, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

arunmahadevan commented Aug 28, 2018

Uh oh!

SparkQA commented Aug 28, 2018

Uh oh!

gengliangwang Aug 28, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Aug 28, 2018

Choose a reason for hiding this comment

Uh oh!

arunmahadevan Aug 28, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang Aug 28, 2018

Choose a reason for hiding this comment

Uh oh!

arunmahadevan Aug 28, 2018

Choose a reason for hiding this comment

Uh oh!

gengliangwang left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 28, 2018

Uh oh!

HyukjinKwon commented Aug 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants