Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Move spark.sql.hive.thriftServer.async
  • Loading branch information
wangyum committed Sep 22, 2019
commit eeda69946426ea70d33e6c3a1f8e1d056b41c8b0
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,6 @@ trait SharedSparkSessionBase
// this rule may potentially block testing of other optimization rules such as
// ConstantPropagation etc.
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)
// Hive Thrift server should not executes SQL queries in an asynchronous way
// because we may set session configuration.
.set("spark.sql.hive.thriftServer.async", "false")
conf.set(
StaticSQLConf.WAREHOUSE_PATH,
conf.get(StaticSQLConf.WAREHOUSE_PATH) + "/" + getClass.getCanonicalName)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ import org.apache.commons.lang3.exception.ExceptionUtils
import org.apache.hadoop.hive.conf.HiveConf.ConfVars
import org.apache.hive.service.cli.HiveSQLException

import org.apache.spark.SparkException
import org.apache.spark.{SparkConf, SparkException}
import org.apache.spark.sql.SQLQueryTestSuite
import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
import org.apache.spark.sql.catalyst.util.fileToString
import org.apache.spark.sql.execution.HiveResult
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

Expand Down Expand Up @@ -75,6 +76,11 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite {
}
}

override def sparkConf: SparkConf = super.sparkConf
// Hive Thrift server should not executes SQL queries in an asynchronous way
// because we may set session configuration.
.set(HiveUtils.HIVE_THRIFT_SERVER_ASYNC, false)
Copy link
Member

@zsxwing zsxwing Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean this is broken and the user should also turn if off? If so, should we change the default value? Otherwise, our tests are actually testing something that's rarely used.

Hive Thrift server should not executes SQL queries in an asynchronous way because we may set session configuration.

Could you clarify what's the exact issue? Is it because the background thread is missing some thread-local variables because threads are reused? Can we copy them from the parent thread here? https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L186

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, this suite is not aiming concurrency stress test. It's just targeting SQL execution one by one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it exposes some bugs in the default mode. Seems worth to fix the bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, that doesn't imply a bug in the default mode. That means we want to run one by one simply in this test suite.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, if there is a bug, definitely we should fix it. But, let's not enable that in this test suite. That is completely a separate issue, isn't it? With a separate UT and a patch, that should be handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With HiveUtils.HIVE_THRIFT_SERVER_ASYNC enabled the Thriftserver will still execute queries one by one. The difference is that it will not block the request:

  • With HIVE_THRIFT_SERVER_ASYNC=false client sends a query in an TExecuteStatementReq. The query executes, and only after it finishes the server responds with a TExecuteStatementResp. Then the client calls TGetOperationStatusReq to see if the result was a success or failure, and then potentially continues fetching results...
  • With HIVE_THRIFT_SERVER_ASYNC=true client sends a query in an TExecuteStatementReq, and the server starts it in a background thread and immediately returns a handle in the response. Then the client periodically polls with TGetOperationStatusReq until the query is finished, an then potentailly continues fetching results...

In both cases, the Hive JDBC driver executes one query at once and there is no concurrency.

I think this setting does not need to be set.


override val isTestWithConfigSets = false

/** List of test cases to ignore, in lower cases. */
Expand Down