[SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store #28707

xuanyuanking · 2020-06-02T18:35:21Z

What changes were proposed in this pull request?

Introduce UnsafeRow format validation for streaming state store.

Why are the changes needed?

Currently, Structured Streaming directly puts the UnsafeRow into StateStore without any schema validation. It's a dangerous behavior when users reusing the checkpoint file during migration. Any changes or bug fix related to the aggregate function may cause random exceptions, even the wrong answer, e.g SPARK-28067.

Does this PR introduce any user-facing change?

Yes. If the underlying changes are detected when the checkpoint is reused during migration, the InvalidUnsafeRowException will be thrown.

How was this patch tested?

UT added. Will also add integrated tests for more scenario in another PR separately.

xuanyuanking · 2020-06-02T18:35:54Z

cc @zsxwing @rednaxelafx @cloud-fan @HeartSaVioR

SparkQA · 2020-06-02T23:47:28Z

Test build #123447 has finished for PR 28707 at commit 4d14961.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR

I agree this change is valid based on the fact Spark doesn't store the schema of state (and there's no validation between actual schema and the actual row), but this should be considered as a last resort because of the huge limitations. Safety guards must be placed in front of this - like SPARK-27237, which I think it covers various general issues with providing clearer guide of schema incompatibility between state and the query being run.

HeartSaVioR · 2020-06-02T23:20:00Z

.../scala/org/apache/spark/sql/execution/streaming/state/StreamingAggregationStateManager.scala

+  override def unsafeRowFormatValidation(row: UnsafeRow, schema: StructType): Unit = {
+    if (checkFormat && SQLConf.get.getConf(
+        SQLConf.STREAMING_STATE_FORMAT_CHECK_ENABLED) && row != null) {
+      if (schema.fields.length != row.numFields) {


This method exposes implementation details of UnsafeRow directly. Could we please let UnsafeRow have such check method? UnsafeRow itself is aware of data types so the check method can receive the list of data types and do the assertion by its own.

Actually that's the first version I did. Since the checking logic is only used for streaming aggregation query and also depends on the streaming config, I choose to put it in StreamingAggregationStateManager, WDYT?

I was hoping we could move the core validation logic to either UnsafeRow itself, or some sort of UnsafeRowUtils, maybe somewhere in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util.

This util function would either return a boolean indicating passed/failed integrity check, or it could return more details. I'd probably go with the former first. It would not do any conf checks -- that's the caller's responsibility. This utility is useful for debugging low-level stuff in general, and would come in handy in both Spark SQL and Structured Streaming debugging.

Then we can call that util function from here, after checking the confs. And the exception throwing logic can be left here too.

HeartSaVioR · 2020-06-02T23:39:45Z

.../scala/org/apache/spark/sql/execution/streaming/state/StreamingAggregationStateManager.scala

+ */
+class InvalidUnsafeRowException
+  extends SparkException("The UnsafeRow format is invalid. This may happen when using the old " +
+    "version or broken checkpoint file. To resolve this problem, you can try to restart the " +


I'm not sure I understand with the possible root causes and the proposed solutions. The problem comes either schema is incompatible (probably due to the change of the query, or change of the underlying aggregation function) or row is corrupted, which any solution described here can not resolve.

"Old version" here is ambiguous, because there's another semantic of "version" here, state format, which is not expected to introduce such incompatible format issue. Did you see the case?

Thanks for the comments, I rephrase the error message to make it clearer. Yep, there are several ways that can lead to the invalid format and we need to list them all. Done in ee048bc

HeartSaVioR · 2020-06-02T23:43:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .booleanConf
      .createWithDefault(true)

+  val STREAMING_STATE_FORMAT_CHECK_ENABLED =


This is misleading - we're only detecting the case from streaming aggregation.

BTW should we have configuration for this, given that this only does essential check which all rows must have been passed?

Thanks, rename it in ee048bc. Considering it's an extra checking and still have overhead, I keep the feature flag for safety.

skambha · 2020-06-03T01:46:38Z

@xuanyuanking , can you please explain how this will fix the issue where we have changed something in the internal implementation of sum in SPARK-28067, how does that affect previous states and what would be the expected behavior. From a query level, the sum schema is same. Is the checkpoint storing information that is coming from intermediate states. Are we storing unsafe rows from the updateExpression/ or the merge phases of aggregation?

HeartSaVioR · 2020-06-03T03:11:37Z

And personally I'd rather do the check in StateStore with additional overhead of reading "a" row in prior to achieve the same in all stateful operations.

  /** Get or create a store associated with the id. */
  def get(
      storeProviderId: StateStoreProviderId,
      keySchema: StructType,
      valueSchema: StructType,
      indexOrdinal: Option[Int],
      version: Long,
      storeConf: StateStoreConf,
      hadoopConf: Configuration): StateStore = {
    require(version >= 0)
    val storeProvider = loadedProviders.synchronized {
      startMaintenanceIfNeeded()
      val provider = loadedProviders.getOrElseUpdate(
        storeProviderId,
        StateStoreProvider.createAndInit(
          storeProviderId.storeId, keySchema, valueSchema, indexOrdinal, storeConf, hadoopConf)
      )
      reportActiveStoreInstance(storeProviderId)
      provider
    }
    val store = storeProvider.getStore(version)
    val iter = store.iterator()
    if (iter.nonEmpty) {
      val rowPair = iter.next()
      val key = rowPair.key
      val value = rowPair.value
      // TODO: validate key with key schema
      // TODO: validate value with value schema
    }
    store
  }

For streaming aggregations it initializes "two" state stores so the overhead goes to "two" rows, but I don't think the overhead matters much.

If we really concern about the overhead of making additional "iterator" or do the validation on early phase (where it might be possible the state store may not be accessed), just have a StateStore wrapper wrapping store and do the same - only validate once for the first "get". In either way, we never need to restrict the functionality to the streaming aggregation.

cloud-fan · 2020-06-03T06:43:48Z

@skambha it doesn't fix the issue, it gives a better error message when we hit the issue.

cloud-fan · 2020-06-03T06:45:55Z

I think this PR and SPARK-27237 are orthogonal, and we should have both. SPARK-27237 is a bit hard to be merged as it changes the checkpoint. We may need more reviews to see if it's future proof (e.g. when we want to support schema evolution of the state store).

Anyway, this PR covers the cases that people upgrade from Spark 2.x to 3.x, which is necessary even if we have SPARK-27237.

cloud-fan · 2020-06-03T09:12:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala


+  val STREAMING_AGGREGATION_STATE_FORMAT_CHECK_ENABLED =
+    buildConf("spark.sql.streaming.aggregationStateFormatCheck.enabled")
+      .doc("Whether to detect a streaming aggregation query may try to use an invalid UnsafeRow " +


nit: When true, check if the UnsafeRow from the state store is valid or not when running streaming aggregation queries. This can happen if the state store format has been changed.

Thanks, rephrase in 10a7980.

cloud-fan · 2020-06-03T09:12:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      .doc("Whether to detect a streaming aggregation query may try to use an invalid UnsafeRow " +
+        "in the state store.")
+      .version("3.1.0")
+      .internal()


we usually put internal() right after buildConf(...)

Thanks, done in 10a7980.

xuanyuanking · 2020-06-03T09:15:30Z

@skambha it doesn't fix the issue, it gives a better error message when we hit the issue.

Yep, WIP for the integrated test of the state store format invalidation. I will show you the difference with/ this patch on the error message.

Safety guards must be placed in front of this - like SPARK-27237, which I think it covers various general issues with providing clearer guide of schema incompatibility between state and the query being run.

Yes, the two approach addresses different sides of this issue, SPARK-27237 require an extra file to keep the schema, which can make the schema checking possible. This one is a guard for random failure or correctness bug.

rednaxelafx

Thank you very much for taking on this verification! I've used the same technique of checking unsafe row's structural integrity on quite a few occasions and it's been a very useful thing to have in the toolbox.

Having a util function that does this check inside of Spark would be very handy for future low-level debugging / investigations.

I agree with @cloud-fan that this verification feature is completely orthogonal to baking the schema info into the persisted state.

I'd strongly vote for having the schema info as a part of the persisted state instead of only having a blob that we interpret as UnsafeRows without any guardrails. But doing so changes the binary format of the persisted state, so I'd really love to see it as a piece of the puzzle in a long term plan to improve the state store.

The verification proposed in this PR does not make any changes to the binary format, so it could be useful for both Spark master branch and existing releases.

For the UnsafeRow structural integrity guarantees / heuristics, I'd propose the following candidate invariants to consider: given a row: UnsafeRow and a expectedSchema: StructType

schema.fields.length == row.numFields should always be true (already covered in this PR)
UnsafeRow.calculateBitSetWidthInBytes(row.numFields) <= row.getSizeInBytes should always be true. A stricter < should be true if the expectedSchema contains at least one field. Not covered in this PR yet
For variable-length fields: if null bit says it's null then don't do anything, else extract offset and size
- 0 <= size < row.getSizeInBytes should always be true. We can be even more precise than this, where the upper bound of size can only be as big as the variable length part of the row.
- offset should be >= fixed sized part of the row
- offset + size should be within the row bounds (already covered by this PR)
- We can make further assumptions on the UnsafeRow format, by assuming that if field1.ordinal < field2.ordinal, then field1.offset + field1.size <= field2.offset. This assumes that the fields were written into in left-to-right order, which doesn't have to be the case, but all the write logic I know of in Spark fits this assumption. So this can be considered an optional heuristic.
For fixed-length fields that are narrower than 8 bytes (boolean / byte / short / int / float), if null bit says it's null then don't do anything, else:
- check if the unused bits in the field are all zeros. The UnsafeRowWriter's write() methods make this guarantee.

When I did manual debugging, sometimes I'd also check the first couple of characters in a UTF8String from an UnsafeRow and see if the characters make sense as UTF-8. That's not something easily checkable here so I wouldn't suggest that.

If we know the length of the entire buffer of the backing store for UnsafeRows, we should make sure our offset + size never goes beyond that, too.

rednaxelafx · 2020-06-03T09:59:15Z

.../scala/org/apache/spark/sql/execution/streaming/state/StreamingAggregationStateManager.scala

  def values(store: StateStore): Iterator[UnsafeRow]
+
+  /** Check the UnsafeRow format with the expected schema */
+  def unsafeRowFormatValidation(row: UnsafeRow, schema: StructType): Unit


Nit: I'd like use "verb + noun" names for actions, and "nouns" for properties.
Here it'd be some form of "validate structural integrity". WDYT?

rednaxelafx · 2020-06-03T10:08:35Z

.../scala/org/apache/spark/sql/execution/streaming/state/StreamingAggregationStateManager.scala

+  override def unsafeRowFormatValidation(row: UnsafeRow, schema: StructType): Unit = {
+    if (checkFormat && SQLConf.get.getConf(
+        SQLConf.STREAMING_STATE_FORMAT_CHECK_ENABLED) && row != null) {
+      if (schema.fields.length != row.numFields) {


I was hoping we could move the core validation logic to either UnsafeRow itself, or some sort of UnsafeRowUtils, maybe somewhere in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util.

This util function would either return a boolean indicating passed/failed integrity check, or it could return more details. I'd probably go with the former first. It would not do any conf checks -- that's the caller's responsibility. This utility is useful for debugging low-level stuff in general, and would come in handy in both Spark SQL and Structured Streaming debugging.

Then we can call that util function from here, after checking the confs. And the exception throwing logic can be left here too.

rednaxelafx · 2020-06-03T10:16:01Z

.../scala/org/apache/spark/sql/execution/streaming/state/StreamingAggregationStateManager.scala

+          val offset = (offsetAndSize >> 32).toInt
+          val size = offsetAndSize.toInt
+          if (size < 0 ||
+              offset < UnsafeRow.calculateBitSetWidthInBytes(row.numFields) + 8 * row.numFields ||


UnsafeRow.calculateBitSetWidthInBytes(row.numFields) + 8 * row.numFields this part is loop invariant. Please hoist it out of the loop manually here. It's the same kind of logic as UnsafeRowWriter's

this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields); this.fixedSize = nullBitsSize + 8 * numFields;

We may want to use the same or similar names for the hoisted variables.

row.getSizeInBytes on the next line is also loop invariant. Let's also hoist that out.

SparkQA · 2020-06-03T10:54:53Z

Test build #123477 has finished for PR 28707 at commit 10a7980.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-06-03T12:30:53Z

Having a util function that does this check inside of Spark would be very handy for future low-level debugging / investigations.

+1. How about put the validation code in a new object UnsafeRowUtils?

SparkQA · 2020-06-03T12:58:09Z

Test build #123475 has finished for PR 28707 at commit ee048bc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2020-06-03T12:58:25Z

@rednaxelafx Great thanks for the detailed comment and guidance. I'm addressing these comments.

How about put the validation code in a new object UnsafeRowUtils?

Sure, a separate utils object makes more sense as we want it to be a general validation logic.

HeartSaVioR · 2020-06-03T23:54:08Z

Will this be included to Spark 3.0.0? If this is to unblock SPARK-28067 to be included to Spark 3.0.0 then it's OK to consider this first, but if this plans to go to Spark 3.1 then I'm not sure about the priority - are all of you aware that the PR for SPARK-27237 was submitted more than a year ago, and still be considered as later?

I still don't get why the proposal is restricting its usage to streaming aggregation, whereas the mechanism is a validation of the UnsafeRow which can be applied to all stateful operations. Let's not to pinpoint the problem we've just seen.

Also from my side the overhead of the validation logic looks to be trivial compared to the operations stateful operators will take - we don't do the validation for all rows, even don't sample, just the first one. Unless we have a chance to bring a show-stopper bug in the validation logic (so that we need to provide the way to disable the validation), I'm not seeing the needs of new configuration.

HeartSaVioR · 2020-06-04T00:01:19Z

And I think SPARK-27237 doesn't require a sort of "future-proof" which is preferably be done with a thing with risk - it doesn't touch the existing part of checkpoint and simply put the schema information into a new file. If we find a better way to pack the schema information into the checkpoint, we can simply discard/ignore the file or craft a logic to migrate smoothly. No risk on rolling back in future.

cloud-fan · 2020-06-04T06:17:41Z

Yea we need this PR to unblock backporting SPARK-28067 to 3.0.

the mechanism is a validation of the UnsafeRow which can be applied to all stateful operations.

What are other stateful operations that use unsafe row? I think we can apply the check everywhere.

Unless we have a chance to bring a show-stopper bug in the validation logic

This is something we don't know. Adding a flag seems safer.

it doesn't touch the existing part of checkpoint and simply put the schema information into a new file.

I'm not saying we shouldn't merge it. I just want to prioritize this PR so that we may be able to include sum correctness bug in 3.0.

HeartSaVioR · 2020-06-04T06:59:04Z

What are other stateful operations that use unsafe row? I think we can apply the check everywhere.

State store itself stores UnsafeRow, hence it applies to everywhere in stateful operations. I'd propose to do it like #28707 (comment) instead of fixing everywhere.

xuanyuanking · 2020-06-04T08:00:22Z

@skambha You can check the integrated tests in #28725. If we delete the validation, we'll get a NPE for this test, and get an assertion in the unsafe row for this test. That is to say, we will get random failures during reusing the checkpoint written by the old Spark version.

xuanyuanking · 2020-06-04T17:06:42Z

@HeartSaVioR After taking a further look. Instead of dealing with the iterator, how about adding the invalidation for all state store operations in StateStoreProvider? Since we can get the key/value row during load map. WDYT?

skambha · 2020-06-05T00:23:24Z

@skambha You can check the integrated tests in #28725. If we delete the validation, we'll get a NPE for this test, and get an assertion in the unsafe row for this test. That is to say, we will get random failures during reusing the checkpoint written by the old Spark version.

Thanks for adding the test.

HeartSaVioR · 2020-06-05T01:14:00Z

@HeartSaVioR After taking a further look. Instead of dealing with the iterator, how about adding the invalidation for all state store operations in StateStoreProvider? Since we can get the key/value row during load map. WDYT?

It would be nice to see the proposed change by code to avoid misunderstanding, like I proposed in previous comment. (anything including commit in your fork or text comment is OK) I'll try out my alternative (wrapping State Store) and show the code change. Thanks!

EDIT: Please deal with interface whenever possible - there're different implementations of state store providers and we should avoid sticking to the specific implementation.

HeartSaVioR · 2020-06-05T03:39:32Z

My alternative with wrapping state store is something like below:

  class RowValidatingStateStore(
      underlying: StateStore,
      keyType: Seq[DataType],
      valueType: Seq[DataType]) extends StateStore {
    private var isValidated = false

    override def get(key: UnsafeRow): UnsafeRow = {
      val value = underlying.get(key)
      if (!isValidated) {
        validateRow(value, valueType)
        isValidated = true
      }
      value
    }

    override def id: StateStoreId = underlying.id
    override def version: Long = underlying.version
    override def put(key: UnsafeRow, value: UnsafeRow): Unit = underlying.put(key, value)
    override def remove(key: UnsafeRow): Unit = underlying.remove(key)
    override def commit(): Long = underlying.commit()
    override def abort(): Unit = underlying.abort()
    override def iterator(): Iterator[UnsafeRowPair] = underlying.iterator()
    override def metrics: StateStoreMetrics = underlying.metrics
    override def hasCommitted: Boolean = underlying.hasCommitted

    private def validateRow(row: UnsafeRow, rowDataType: Seq[DataType]): Unit = {
      // TODO: call util method with row and data type to validate - note that it can only check with value schema
    }
  }

  def get(...): StateStore = {
    require(version >= 0)
    val storeProvider = loadedProviders.synchronized {
      ...
    }
    // TODO: add if statement to see whether it should wrap state store or not
    new RowValidatingStateStore(storeProvider.getStore(version, keySchema, valueSchema))
  }

The example code only checks in get operation, which is insufficient to check "key" row in state. That said, iterator approach still provides more possibility of validation, though the validation of unsafe row itself doesn't have enough coverage of checking various incompatibility issues (Definitely we should have another guards as well) so that's a sort of OK to only cover value side.

xuanyuanking · 2020-06-05T20:54:44Z

All the comments addressed in 1f71563. Thanks for the review!
It also includes my alternative of adding the invalidation for all state store operations in StateStoreProvider, PTAL.

SparkQA · 2020-06-05T22:16:12Z

Test build #123580 has finished for PR 28707 at commit 7a5e09a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2020-06-05T22:40:28Z

Sorry my comment was edited so you may be missed the content, but it is also a sort of pointing out for "pinpointing" - do you think your approach works with other state store providers as well? The root cause isn't bound to the implementation of state store provider but this patch is only addressing HDFS state store provider.

I guess you're trying to find how it can be done less frequently, first time the state is loaded from the file, which is optimal. While I think it can be even done without binding to the state store provider implementation if we really need it (check only once when the provider instance is created), have we measured the actual overhead? If the overhead turns out to be trivial then it won't be matter we run validation check for each batch. It sounds to be sub-optimal, but the overhead would be trivial.

…format for all state store

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala

cloud-fan · 2020-06-17T11:56:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+  extends SparkException("The streaming query failed by state format invalidation. " +
+    "The following reasons may cause this: 1. An old Spark version wrote the checkpoint that is " +
+    "incompatible with the current one; 2. Broken checkpoint files; 3. The query is changed " +
+    "among restart. For the first case, you can try to restart the application without " +


For the first case: I think it's for the cases?

The resolution is for the first case. For the rest cases listing, they should be considered as user problems.

cloud-fan · 2020-06-17T11:56:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+ * An exception thrown when an invalid UnsafeRow is detected in state store.
+ */
+class InvalidUnsafeRowException
+  extends SparkException("The streaming query failed by state format invalidation. " +


Does it have to be SparkException?

No, change it to RuntimeException. Done in fd74ff9.

cloud-fan

LGTM except for a few comments.

cloud-fan · 2020-06-17T12:02:52Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

  @volatile private var storeConf: StateStoreConf = _
  @volatile private var hadoopConf: Configuration = _
  @volatile private var numberOfVersionsToRetainInMemory: Int = _
+  @volatile private var isValidated = false


Can we add a TODO that this validation should be moved to a higher level so that it works for all state store implementations?

Thanks, add the TODO in fd74ff9.

SparkQA · 2020-06-17T15:45:20Z

Test build #124166 has finished for PR 28707 at commit 01007fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class StateStoreConf(

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala

SparkQA · 2020-06-17T20:04:20Z

Test build #124171 has finished for PR 28707 at commit fd74ff9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

SparkQA · 2020-06-18T06:19:35Z

Test build #124186 has finished for PR 28707 at commit 557eb30.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-06-19T05:56:47Z

thanks, merging to master! (I think this patch is too big to backport)

xuanyuanking · 2020-06-19T07:57:27Z

Thanks all for reviewing!
I'll review #24173 as the next step for schema validation.

gatorsmile · 2020-07-27T18:58:41Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val STATE_STORE_FORMAT_VALIDATION_ENABLED =
+    buildConf("spark.sql.streaming.stateStore.formatValidation.enabled")
+      .internal()
+      .doc("When true, check if the UnsafeRow from the state store is valid or not when running " +


Change UnsafeRow to checkpoint ? Most end users do not know what are UnsafeRow

Sure, will submit a follow-up PR today.

### What changes were proposed in this pull request? Address comment in #28707 (comment) ### Why are the changes needed? Hide the implementation details in the config doc. ### Does this PR introduce _any_ user-facing change? Config doc change. ### How was this patch tested? Document only. Closes #29315 from xuanyuanking/SPARK-31894-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

probot-autolabeler bot added SQL STRUCTURED STREAMING labels Jun 2, 2020

HeartSaVioR reviewed Jun 3, 2020

View reviewed changes

HeartSaVioR mentioned this pull request Jun 3, 2020

[SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow #27627

Closed

cloud-fan reviewed Jun 3, 2020

View reviewed changes

rednaxelafx reviewed Jun 3, 2020

View reviewed changes

xuanyuanking and others added 8 commits June 17, 2020 19:00

initial commit

2153abf

address comments

179208a

address comments

4c919ca

Address comments:1.enhance the validation of unsafe row; 2.check the …

b83f0c3

…format for all state store

nit

0313016

temp for deduplicate

fc5ad19

Add config of value checking for deduplicate

12eb2a2

change the config for specific operator only

01007fb

xuanyuanking force-pushed the SPARK-31894 branch from e3d841c to 01007fb Compare June 17, 2020 11:01

cloud-fan reviewed Jun 17, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala Show resolved Hide resolved

cloud-fan reviewed Jun 17, 2020

View reviewed changes

cloud-fan approved these changes Jun 17, 2020

View reviewed changes

cloud-fan reviewed Jun 17, 2020

View reviewed changes

Address comments

fd74ff9

cloud-fan reviewed Jun 17, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala Outdated Show resolved Hide resolved

HeartSaVioR reviewed Jun 17, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

address comments

557eb30

cloud-fan closed this in 86b54f3 Jun 19, 2020

xuanyuanking deleted the SPARK-31894 branch June 19, 2020 07:57

xuanyuanking mentioned this pull request Jun 19, 2020

[SPARK-31905][SS] Add compatibility tests for streaming state store format #28725

Closed

xuanyuanking mentioned this pull request Jul 21, 2020

[SPARK-27237][SS] Introduce State schema validation among query restart #24173

Closed

gatorsmile reviewed Jul 27, 2020

View reviewed changes

xuanyuanking mentioned this pull request Jul 31, 2020

[SPARK-31894][SS][FOLLOW-UP] Rephrase the config doc #29315

Closed

[SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store #28707

[SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store #28707

Uh oh!

Conversation

xuanyuanking commented Jun 2, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xuanyuanking commented Jun 2, 2020

Uh oh!

SparkQA commented Jun 2, 2020

Uh oh!

HeartSaVioR left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skambha commented Jun 3, 2020

Uh oh!

HeartSaVioR commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jun 3, 2020

Uh oh!

cloud-fan commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rednaxelafx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 3, 2020

Uh oh!

cloud-fan commented Jun 3, 2020

Uh oh!

SparkQA commented Jun 3, 2020

Uh oh!

xuanyuanking commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 4, 2020

Uh oh!

cloud-fan commented Jun 4, 2020

Uh oh!

HeartSaVioR left a comment •

edited

Loading

HeartSaVioR commented Jun 3, 2020 •

edited

Loading

cloud-fan commented Jun 3, 2020 •

edited

Loading

xuanyuanking commented Jun 3, 2020 •

edited

Loading

xuanyuanking commented Jun 3, 2020 •

edited

Loading

HeartSaVioR commented Jun 3, 2020 •

edited

Loading

HeartSaVioR commented Jun 5, 2020 •

edited

Loading

HeartSaVioR commented Jun 5, 2020 •

edited

Loading

HeartSaVioR commented Jun 5, 2020 •

edited

Loading

cloud-fan Jun 17, 2020 •

edited

Loading