Skip to content
Prev Previous commit
Next Next commit
address comments and add test cases
  • Loading branch information
kevinyu98 committed Jul 10, 2018
commit caf6b6dd15c3e3fef01aa60e81de76b407feb6e2
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst.analysis

import java.util.Locale

import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import scala.util.Random
Expand Down Expand Up @@ -1212,17 +1214,21 @@ class Analyzer(
object LookupFunctions extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
val catalogFunctionNameSet = new mutable.HashSet[FunctionIdentifier]()
plan.transformAllExpressions {
case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f
case f: UnresolvedFunction if catalog.functionExists(f.name) =>
catalogFunctionNameSet.add(f.name)
f
case f: UnresolvedFunction =>
withPosition(f) {
throw new NoSuchFunctionException(f.name.database.getOrElse("default"),
f.name.funcName)
}
}
plan.transformAllExpressions {
case f: UnresolvedFunction if catalogFunctionNameSet.contains(f.name) => f
case f: UnresolvedFunction if catalog.functionExists(f.name) =>
catalogFunctionNameSet.add(normalizeFuncName(f.name))
f
case f: UnresolvedFunction =>
withPosition(f) {
throw new NoSuchFunctionException(f.name.database.getOrElse("default"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think this should be current database instead of default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Yeah..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @dilipbiswal @gatorsmile @liancheng ok, I will change this.

f.name.funcName)
}
}
}

private def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = {
FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), name.database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name.database.getOrElse("default")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the FunctionIdentifier's signature for database is Option, it is not string. since we are just used in this local cache, I think it is ok to not convert to "default" string. I saw when we do the registerFunction in FunctionRegistry.scala, we didn't put the "default" in normalizeFuncName either. What do you think? thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Hello Simon, I will leave this as it is for now, but if you think it is better to have the Option(name.database.getOrElse("default")) in the catalogFunctionNameSet, let me know. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for replying late. What I thought is if two FunctionIdentifiers, one is with default database name Some("default"), another is None. They should be equal to each other here.

I actually mean name.database.orElse(Some("default")).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Shouldn't we be using the current database if database is not specified ? I am trying to understand why we should use "default" here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinyu98 For built-in functions, we don't need to normalize their database name.

Rethink about it, actually here it is not for function resolution, I think it is OK to leave it as name.database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @kevinyu98 We need to check what happens in the following case .

use currentdb;
select currentdb.function1(), function1() from ....

In this case, the 2nd function should be resolved from the local cache if this optimization
were to work. If we just use name.database instead of defaulting to current database , will it still happen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dilipbiswal @viirya Thanks for pointing this out. If we just use the name.database, the cache will store "None" for the database name, the 2nd function will not resolved from the local cache. We need to use the catalog.getCurrentDatabase for the database name in the cache.
After running more test cases, I think it is better to cache the external function name only, not include the build-in function. If we all agree this approach, I can submit the code for review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For built-in functions, it may no a big deal if we don't find it in this cache. It should be very fast to query built-in functions. I remember the main issue of this ticket is external function lookup where it means more loading on connection with metastore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree @viirya

}
}

Expand Down