Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
init
  • Loading branch information
ulysses-you committed Aug 11, 2020
commit 556b70911809d5e5b35c6bde86063fbcf3704fc9
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.objects._
import org.apache.spark.sql.catalyst.util.{ArrayData, MapData}
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
import org.apache.spark.util.Utils


/**
Expand Down Expand Up @@ -377,6 +378,29 @@ object ScalaReflection extends ScalaReflection {
expressions.Literal.create(null, ObjectType(cls)),
newInstance
)

case t if isSubtype(t, localTypeOf[Enumeration#Value]) =>
// package example
// object Foo extends Enumeration {
// type Foo = Value
// val E1, E2 = Value
// }
// the fullName of tpe is example.Foo.Foo, but we need example.Foo so that
// we can call example.Foo.withName to deserialize string to enumeration.
val className = t.asInstanceOf[TypeRef].pre.typeSymbol.asClass.fullName
// this check is for spark-shell which give a default package name like '$line1.$read$$iw'
if (className.startsWith("$")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually understand this limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a jline issue that default package is something like $line32.$read.$iw.$iw.$iw.$iw.$iw. We can't find Enum class using reflect. I choose to forbid it, but please tell me if there exists a better way to go.

throw new UnsupportedOperationException(
s"Enumeration class required package name, but found $className")
}

val clazz = Utils.classForName(className)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative:

val parent = t.asInstanceOf[TypeRef].pre.typeSymbol.asClass
val cls = scala.reflect.runtime.universe
  .runtimeMirror(getClass.getClassLoader)
  .runtimeClass(parent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pure scala reflection fix the issues with the REPL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fixed, exists the same issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what is the error? Regardless, I'm not sure this limitation should hold up merging the basic support for enums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check this again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After deep check some code, I found the magic.

  1. we can get FooEnum class with default package using mirror. runtimeClass.
  2. we failed in StaticInvoke which will re-reflect the class if class name end with $.

The StaticInvoke related code and pr #20753:

  val objectName = staticObject.getName.stripSuffix("$")
  val cls = if (staticObject.getName == objectName) {
    staticObject
  } else {
    Utils.classForName(objectName)
  }

I cann't find more comment about this code, seems we can use staticObject directly instead of re-reflect it, is it ?
cc @kiszk

StaticInvoke(
clazz,
ObjectType(getClassFromType(t)),
"withName",
createDeserializerForString(path, false) :: Nil,
returnNullable = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I'm correct, it would be good to include a test case for this case. What happens if you actually do operations on the null value stored as an enum? Is the nullability of the resulting schema correct. Do operations like .isNotNull work correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need returnNullable = false since Enumeration.withName will never return null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you actually do operations on the null value stored as an enum

We will get an exception, and the behavior is same as Java enum what we supported.

        case other if other.isEnum =>
          createSerializerForString(
            Invoke(inputObject, "name", ObjectType(classOf[String]), returnNullable = false))

}
}

Expand Down Expand Up @@ -562,6 +586,14 @@ object ScalaReflection extends ScalaReflection {
}
createSerializerForObject(inputObject, fields)

case t if isSubtype(t, localTypeOf[Enumeration#Value]) =>
createSerializerForString(
Invoke(
inputObject,
"toString",
ObjectType(classOf[java.lang.String]),
returnNullable = false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as withName, toString also return a non-null value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I agree.


case _ =>
throw new UnsupportedOperationException(
s"No Encoder found for $tpe\n" + walkedTypePath)
Expand Down Expand Up @@ -739,6 +771,8 @@ object ScalaReflection extends ScalaReflection {
val Schema(dataType, nullable) = schemaFor(fieldType)
StructField(fieldName, dataType, nullable)
}), nullable = true)
case t if isSubtype(t, localTypeOf[Enumeration#Value]) =>
Schema(StringType, nullable = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is nullable = false, I believe this should be reserved for primitive types that cannot be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be true.

case other =>
throw new UnsupportedOperationException(s"Schema for type $other is not supported")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import java.sql.{Date, Timestamp}
import scala.reflect.runtime.universe.TypeTag

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.FooEnum.FooEnum
import org.apache.spark.sql.catalyst.analysis.UnresolvedExtractValue
import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, If, SpecificInternalRow, UpCast}
import org.apache.spark.sql.catalyst.expressions.objects.{AssertNotNull, NewInstance}
Expand Down Expand Up @@ -90,6 +91,13 @@ case class FooWithAnnotation(f1: String @FooAnnotation, f2: Option[String] @FooA

case class SpecialCharAsFieldData(`field.1`: String, `field 2`: String)

object FooEnum extends Enumeration {
type FooEnum = Value
val E1, E2 = Value
}

case class FooClassWithEnum(i: Int, e: FooEnum)

object TestingUDT {
@SQLUserDefinedType(udt = classOf[NestedStructUDT])
class NestedStruct(val a: Integer, val b: Long, val c: Double)
Expand Down Expand Up @@ -437,4 +445,12 @@ class ScalaReflectionSuite extends SparkFunSuite {
StructField("f2", StringType))))
assert(deserializerFor[FooWithAnnotation].dataType == ObjectType(classOf[FooWithAnnotation]))
}

test("SPARK-32585: Support scala enumeration in ScalaReflection") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider putting a test case in ExpressionEncoderSuite as well as I think that checks various combinations of evaluation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, not realized this test.

assert(schemaFor[FooClassWithEnum].dataType == StringType)
assert(serializerFor[FooClassWithEnum].dataType == StructType(Seq(
StructField("i", IntegerType, false),
StructField("e", StringType, false))))
assert(deserializerFor[FooClassWithEnum].dataType == ObjectType(classOf[FooClassWithEnum]))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import org.scalatest.exceptions.TestFailedException
import org.scalatest.prop.TableDrivenPropertyChecks._

import org.apache.spark.{SparkException, TaskContext}
import org.apache.spark.sql.catalyst.ScroogeLikeExample
import org.apache.spark.sql.catalyst.{FooClassWithEnum, FooEnum, ScroogeLikeExample}
import org.apache.spark.sql.catalyst.encoders.{OuterScopes, RowEncoder}
import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
import org.apache.spark.sql.catalyst.util.sideBySide
Expand Down Expand Up @@ -1926,6 +1926,13 @@ class DatasetSuite extends QueryTest
}
}
}

test("SPARK-32585: Support scala enumeration in ScalaReflection") {
checkDataset(
Seq(FooClassWithEnum(1, FooEnum.E1), FooClassWithEnum(2, FooEnum.E2)).toDS(),
Seq(FooClassWithEnum(1, FooEnum.E1), FooClassWithEnum(2, FooEnum.E2)): _*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the above comment, I would add a test case where the enum value is null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the null test.

)
}
}

object AssertExecutionId {
Expand Down