Skip to content

Conversation

@yanboliang
Copy link
Contributor

callUDF has been deprecated and will be removed in Spark 2.0. We should replace the use of callUDF with udf for ML.
I was trying to directly wrap createTransformFunc to udf(which was illustrated by the following code snippet) as an initial attempt, but it hits a bug of TypeTag ... NotSerializableException which exists at Scala 2.10.

abstract class UnaryTransformer[IN: TypeTag, OUT: TypeTag, T <: UnaryTransformer[IN, OUT, T]]
  extends Transformer with HasInputCol with HasOutputCol with Logging {
  ......
  override def transform(dataset: DataFrame): DataFrame = {
    transformSchema(dataset.schema, logging = true)
    val transformFunc = udf { input: IN => this.createTransformFunc(input) }
    dataset.withColumn($(outputCol), transformFunc(col($(inputCol))))
  }
  ......
}

@SparkQA
Copy link

SparkQA commented Jan 1, 2016

Test build #48565 has finished for PR 10544 at commit b4c4329.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 2, 2016

This works too. I have a separate pull request that adds a new API for this: #10547

@rxin
Copy link
Contributor

rxin commented Jan 2, 2016

BTW is transformFunc a public API that custom transformers are supposed to implement? If it is, this is technically an API breaking change you are making.

@yanboliang
Copy link
Contributor Author

@rxin transformFunc is not a public API, but I think your PR is more concise and I will close my PR.

@yanboliang yanboliang closed this Jan 2, 2016
@yanboliang yanboliang deleted the spark-12597 branch January 2, 2016 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants