Skip to content

Conversation

@BruceXu1991
Copy link

@BruceXu1991 BruceXu1991 commented Dec 20, 2017

What changes were proposed in this pull request?

fix table owner is null when create new table through spark sql

How was this patch tested?

manual test.
1、first create a table
2、then select the table properties from mysql which connected to hive metastore

Please review http://spark.apache.org/contributing.html before opening a pull request.

@BruceXu1991
Copy link
Author

@cloud-fan @gatorsmile could you review this issue?

/** Returns the configuration for the current session. */
def conf: HiveConf = state.getConf

private val userName = state.getAuthenticator.getUserName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this returns null?

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

can you add a test?

def conf: HiveConf = state.getConf

private val userName = state.getAuthenticator.getUserName
private val userName = conf.getUser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceXu1991. I want to reproduce your problem here. Could you describe your environment more specifically? For me, 2.2.1 works like the following.

scala> spark.version
res0: String = 2.2.1

scala> sql("CREATE TABLE spark_22846(a INT)")

scala> sql("DESCRIBE FORMATTED spark_22846").show
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                   a|                 int|   null|
|                    |                    |       |
|# Detailed Table ...|                    |       |
|            Database|             default|       |
|               Table|         spark_22846|       |
|               Owner|            dongjoon|       |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does this happen in case of MySQL as Hive metastore?

Copy link
Author

@BruceXu1991 BruceXu1991 Dec 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I met this problem by using MySQL as Hive metastore.
what's more, when I execute DESCRIBE FORMATTED spark_22846, NullPointerException will occur.

'''

DESCRIBE FORMATTED offline.spark_22846;
Error: java.lang.NullPointerException (state=,code=0)

'''

and the detail stack info:

17/12/22 18:18:10 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.NullPointerException
        at scala.collection.immutable.StringOps$.length$extension(StringOps.scala:47)
        at scala.collection.immutable.StringOps.length(StringOps.scala:47)
        at scala.collection.IndexedSeqOptimized$class.isEmpty(IndexedSeqOptimized.scala:27)
        at scala.collection.immutable.StringOps.isEmpty(StringOps.scala:29)
        at scala.collection.TraversableOnce$class.nonEmpty(TraversableOnce.scala:111)
        at scala.collection.immutable.StringOps.nonEmpty(StringOps.scala:29)
        at org.apache.spark.sql.catalyst.catalog.CatalogTable.toLinkedHashMap(interface.scala:301)
        at org.apache.spark.sql.execution.command.DescribeTableCommand.describeFormattedTableInfo(tables.scala:559)
        at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:537)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:767)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)

this result of NPE is that owner is null. The relevant source code is below:

def toLinkedHashMap: mutable.LinkedHashMap[String, String] = {
.........
line 301: if (owner.nonEmpty) map.put("Owner", owner)
........
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know how Hive get the username internally?

Copy link
Author

@BruceXu1991 BruceXu1991 Dec 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your response, Fan.

now the current implementation of spark-2.2.1 is

private val userName = state.getAuthenticator.getUserName

when the implementation of state.getAuthenticator is HadoopDefaultAuthenticator, which is default in hive conf, the username is got.

however, in the case that the implementation of state.getAuthenticator is SessionStateUserAuthenticator, which is used in my case, then username will be null.

the simplified code below explains the reason:

  1. HadoopDefaultAuthenticator
public class HadoopDefaultAuthenticator implements HiveAuthenticationProvider {
@Override
  public String getUserName() {
    return userName;
  }

  @Override
  public void setConf(Configuration conf) {
    this.conf = conf;
    UserGroupInformation ugi = null;
    try {
      ugi = Utils.getUGI();
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
    this.userName = ugi.getShortUserName();
    if (ugi.getGroupNames() != null) {
      this.groupNames = Arrays.asList(ugi.getGroupNames());
    }
  }
}

public class Utils {
  public static UserGroupInformation getUGI() throws LoginException, IOException {
    String doAs = System.getenv("HADOOP_USER_NAME");
    if(doAs != null && doAs.length() > 0) {
      return UserGroupInformation.createProxyUser(doAs, UserGroupInformation.getLoginUser());
    }
    return UserGroupInformation.getCurrentUser();
  }
}

it shows that HadoopDefaultAuthenticator will get username through Utils.getUGI(), so the username is HADOOP_USER_NAME or LoginUser.

  1. SessionStateUserAuthenticator
public class SessionStateUserAuthenticator implements HiveAuthenticationProvider {
  @Override
  public void setConf(Configuration arg0) {
  }

  @Override
  public String getUserName() {
    return sessionState.getUserName();
  }
}

it shows that SessionStateUserAuthenticator get the username through sessionState.getUserName(), which is null, because username is not used in the instantiation of sessionState. Here is the instantiation of SessionState in HiveClientImpl

So getting username through conf.getUser may be more compatible to various use case.

the related code in HiveConf:

public class HiveConf extends Configuration {

    public String getUser() throws IOException {
        try {
            UserGroupInformation le = Utils.getUGI();
            return le.getUserName();
        } catch (LoginException var2) {
            throw new IOException(var2);
        }
    }

}

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85270 has finished for PR 20034 at commit e8c3035.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants