apache · HeartSaVioR · Aug 26, 2018 · Aug 27, 2018 · Aug 28, 2018 · Dec 17, 2018
diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md
@@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Notes**
+
+- There're couple of configurations which are not modifiable once you run the query. If you really want to make changes for these configurations, you have to discard checkpoint and start a new query.
+  - `spark.sql.shuffle.partitions`
+    - This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged.
 val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions") 
   .doc("The default number of partitions to use when shuffling data for joins or aggregations.") 
   .intConf 
   .createWithDefault(200) 
 val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions") 
   .doc("The default number of partitions to use when shuffling data for joins or aggregations.") 
   .intConf 
   .createWithDefault(200) 
+    - If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning.
+      - e.g. `df.groupBy("time").count().coalesce(10)` reduces the number of tasks by 10, whereas `spark.sql.shuffle.partitions` may be bigger.
+      - After `coalesce`, the number of (reduced) tasks will be kept unless another shuffle happens.
+  - `spark.sql.streaming.stateStore.providerClass`
+    - To read previous state of the query properly, the class of state store provider should be unchanged.
+  - `spark.sql.streaming.multipleWatermarkPolicy`
+    - Modification of this would lead inconsistent watermark value when query contains multiple watermarks, hence the policy should be unchanged.
+
 **Further Reading**
 
 - See and run the

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -266,7 +266,9 @@ object SQLConf {
     .createWithDefault(Long.MaxValue)
 
   val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions")
-    .doc("The default number of partitions to use when shuffling data for joins or aggregations.")
+    .doc("The default number of partitions to use when shuffling data for joins or aggregations. " +
+      "Note: For structured streaming, this configuration cannot be changed between query " +
 "Note: This configuration cannot be changed between query restarts from the same " + 
 "checkpoint location.") 
 "Note: This configuration cannot be changed between query restarts from the same " + 
 "checkpoint location.") 
+      "restarts from the same checkpoint location.")
     .intConf
     .createWithDefault(200)
 
@@ -868,7 +870,9 @@ object SQLConf {
       .internal()
       .doc(
         "The class used to manage state data in stateful streaming queries. This class must " +
-          "be a subclass of StateStoreProvider, and must have a zero-arg constructor.")
+          "be a subclass of StateStoreProvider, and must have a zero-arg constructor. " +
+          "Note: For structured streaming, this configuration cannot be changed between query " +
+          "restarts from the same checkpoint location.")
       .stringConf
       .createWithDefault(
         "org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider")