Skip to content

Commit 8197ee3

Browse files
maropudongjoon-hyun
authored andcommitted
[SPARK-33690][SQL] Escape meta-characters in showString
### What changes were proposed in this pull request? This PR intends to escape meta-characters (e.g., \n and \t) in `Dataset.showString`. Before this PR: ``` scala> Seq("aaa\nbbb\t\tccccc").toDF("value").show() +--------------+ | value| +--------------+ |aaa bbb ccccc| +--------------+ ``` After this PR: ``` +-----------------+ | value| +-----------------+ |aaa\nbbb\t\tccccc| +-----------------+ ``` ### Why are the changes needed? For better output. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test. Closes apache#30647 from maropu/EscapeMetaInShow. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 45af3c9 commit 8197ee3

File tree

4 files changed

+47
-5
lines changed

4 files changed

+47
-5
lines changed

docs/sql-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ license: |
2626

2727
- In Spark 3.2, `spark.sql.adaptive.enabled` is enabled by default. To restore the behavior before Spark 3.2, you can set `spark.sql.adaptive.enabled` to `false`.
2828

29+
- In Spark 3.2, the meta-characters `\n` and `\t` are escaped in the `show()` action. In Spark 3.1 or earlier, the two metacharacters are output as it is.
30+
2931
## Upgrading from Spark SQL 3.0 to 3.1
3032

3133
- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,9 @@ class Dataset[T] private[sql](
308308
val str = cell match {
309309
case null => "null"
310310
case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
311-
case _ => cell.toString
311+
case _ =>
312+
// Escapes meta-characters not to break the `showString` format
313+
cell.toString.replaceAll("\n", "\\\\n").replaceAll("\t", "\\\\t")
312314
}
313315
if (truncate > 0 && str.length > truncate) {
314316
// do not show ellipses for strings shorter than 4 characters.

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1235,6 +1235,44 @@ class DataFrameSuite extends QueryTest
12351235
assert(df.showString(10, vertical = true) === expectedAnswer)
12361236
}
12371237

1238+
test("SPARK-33690: showString: escape meta-characters") {
1239+
val df1 = Seq("aaa\nbbb\tccc").toDF("value")
1240+
assert(df1.showString(1, truncate = 0) ===
1241+
"""+-------------+
1242+
||value |
1243+
|+-------------+
1244+
||aaa\nbbb\tccc|
1245+
|+-------------+
1246+
|""".stripMargin)
1247+
1248+
val df2 = Seq(Seq("aaa\nbbb\tccc")).toDF("value")
1249+
assert(df2.showString(1, truncate = 0) ===
1250+
"""+---------------+
1251+
||value |
1252+
|+---------------+
1253+
||[aaa\nbbb\tccc]|
1254+
|+---------------+
1255+
|""".stripMargin)
1256+
1257+
val df3 = Seq(Map("aaa\nbbb\tccc" -> "aaa\nbbb\tccc")).toDF("value")
1258+
assert(df3.showString(1, truncate = 0) ===
1259+
"""+--------------------------------+
1260+
||value |
1261+
|+--------------------------------+
1262+
||{aaa\nbbb\tccc -> aaa\nbbb\tccc}|
1263+
|+--------------------------------+
1264+
|""".stripMargin)
1265+
1266+
val df4 = Seq("aaa\nbbb\tccc").toDF("value").selectExpr("named_struct('v', value)")
1267+
assert(df4.showString(1, truncate = 0) ===
1268+
"""+----------------------+
1269+
||named_struct(v, value)|
1270+
|+----------------------+
1271+
||{aaa\nbbb\tccc} |
1272+
|+----------------------+
1273+
|""".stripMargin)
1274+
}
1275+
12381276
test("SPARK-7319 showString") {
12391277
val expectedAnswer = """+---+-----+
12401278
||key|value|

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -261,11 +261,11 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite
261261
"PartitionFilters: \\[isnotnull\\(k#xL\\), dynamicpruningexpression\\(k#xL " +
262262
"IN subquery#x\\)\\]"
263263
val expected_pattern3 =
264-
"Location: InMemoryFileIndex \\[.*org.apache.spark.sql.ExplainSuite" +
265-
"/df2/.*, ... 99 entries\\]"
264+
"Location: InMemoryFileIndex \\[\\S*org.apache.spark.sql.ExplainSuite" +
265+
"/df2/\\S*, ... 99 entries\\]"
266266
val expected_pattern4 =
267-
"Location: InMemoryFileIndex \\[.*org.apache.spark.sql.ExplainSuite" +
268-
"/df1/.*, ... 999 entries\\]"
267+
"Location: InMemoryFileIndex \\[\\S*org.apache.spark.sql.ExplainSuite" +
268+
"/df1/\\S*, ... 999 entries\\]"
269269
withNormalizedExplain(sqlText) { normalizedOutput =>
270270
assert(expected_pattern1.r.findAllMatchIn(normalizedOutput).length == 1)
271271
assert(expected_pattern2.r.findAllMatchIn(normalizedOutput).length == 1)

0 commit comments

Comments
 (0)