-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22846][SQL] Fix table owner is null when creating table through spark sql or thriftserver #20034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @gatorsmile could you review this issue? |
| /** Returns the configuration for the current session. */ | ||
| def conf: HiveConf = state.getConf | ||
|
|
||
| private val userName = state.getAuthenticator.getUserName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this returns null?
|
ok to test |
|
can you add a test? |
| def conf: HiveConf = state.getConf | ||
|
|
||
| private val userName = state.getAuthenticator.getUserName | ||
| private val userName = conf.getUser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BruceXu1991. I want to reproduce your problem here. Could you describe your environment more specifically? For me, 2.2.1 works like the following.
scala> spark.version
res0: String = 2.2.1
scala> sql("CREATE TABLE spark_22846(a INT)")
scala> sql("DESCRIBE FORMATTED spark_22846").show
+--------------------+--------------------+-------+
| col_name| data_type|comment|
+--------------------+--------------------+-------+
| a| int| null|
| | | |
|# Detailed Table ...| | |
| Database| default| |
| Table| spark_22846| |
| Owner| dongjoon| |There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does this happen in case of MySQL as Hive metastore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I met this problem by using MySQL as Hive metastore.
what's more, when I execute DESCRIBE FORMATTED spark_22846, NullPointerException will occur.
'''
DESCRIBE FORMATTED offline.spark_22846;
Error: java.lang.NullPointerException (state=,code=0)
'''
and the detail stack info:
17/12/22 18:18:10 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.NullPointerException
at scala.collection.immutable.StringOps$.length$extension(StringOps.scala:47)
at scala.collection.immutable.StringOps.length(StringOps.scala:47)
at scala.collection.IndexedSeqOptimized$class.isEmpty(IndexedSeqOptimized.scala:27)
at scala.collection.immutable.StringOps.isEmpty(StringOps.scala:29)
at scala.collection.TraversableOnce$class.nonEmpty(TraversableOnce.scala:111)
at scala.collection.immutable.StringOps.nonEmpty(StringOps.scala:29)
at org.apache.spark.sql.catalyst.catalog.CatalogTable.toLinkedHashMap(interface.scala:301)
at org.apache.spark.sql.execution.command.DescribeTableCommand.describeFormattedTableInfo(tables.scala:559)
at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:537)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:767)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
this result of NPE is that owner is null. The relevant source code is below:
def toLinkedHashMap: mutable.LinkedHashMap[String, String] = {
.........
line 301: if (owner.nonEmpty) map.put("Owner", owner)
........
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know how Hive get the username internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your response, Fan.
now the current implementation of spark-2.2.1 is
private val userName = state.getAuthenticator.getUserName
when the implementation of state.getAuthenticator is HadoopDefaultAuthenticator, which is default in hive conf, the username is got.
however, in the case that the implementation of state.getAuthenticator is SessionStateUserAuthenticator, which is used in my case, then username will be null.
the simplified code below explains the reason:
- HadoopDefaultAuthenticator
public class HadoopDefaultAuthenticator implements HiveAuthenticationProvider {
@Override
public String getUserName() {
return userName;
}
@Override
public void setConf(Configuration conf) {
this.conf = conf;
UserGroupInformation ugi = null;
try {
ugi = Utils.getUGI();
} catch (Exception e) {
throw new RuntimeException(e);
}
this.userName = ugi.getShortUserName();
if (ugi.getGroupNames() != null) {
this.groupNames = Arrays.asList(ugi.getGroupNames());
}
}
}
public class Utils {
public static UserGroupInformation getUGI() throws LoginException, IOException {
String doAs = System.getenv("HADOOP_USER_NAME");
if(doAs != null && doAs.length() > 0) {
return UserGroupInformation.createProxyUser(doAs, UserGroupInformation.getLoginUser());
}
return UserGroupInformation.getCurrentUser();
}
}
it shows that HadoopDefaultAuthenticator will get username through Utils.getUGI(), so the username is HADOOP_USER_NAME or LoginUser.
- SessionStateUserAuthenticator
public class SessionStateUserAuthenticator implements HiveAuthenticationProvider {
@Override
public void setConf(Configuration arg0) {
}
@Override
public String getUserName() {
return sessionState.getUserName();
}
}
it shows that SessionStateUserAuthenticator get the username through sessionState.getUserName(), which is null, because username is not used in the instantiation of sessionState. Here is the instantiation of SessionState in HiveClientImpl
So getting username through conf.getUser may be more compatible to various use case.
the related code in HiveConf:
public class HiveConf extends Configuration {
public String getUser() throws IOException {
try {
UserGroupInformation le = Utils.getUGI();
return le.getUserName();
} catch (LoginException var2) {
throw new IOException(var2);
}
}
}
|
Test build #85270 has finished for PR 20034 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
fix table owner is null when create new table through spark sql
How was this patch tested?
manual test.
1、first create a table
2、then select the table properties from mysql which connected to hive metastore
Please review http://spark.apache.org/contributing.html before opening a pull request.