[SPARK-48490][CORE] Unescapes any literals for message of MessageWithContext #46824

panbingkun · 2024-05-31T14:44:54Z

What changes were proposed in this pull request?

The pr aims to unescapes any literals for message of MessageWithContext

Why are the changes needed?

For example, before this PR

logInfo("This is a log message\nThis is a new line \t other msg")

It will output:

24/05/31 22:53:27 INFO PatternLoggingSuite: This is a log message
This is a new line 	 other msg

But:

logInfo(log"This is a log message\nThis is a new line \t other msg")

It will output:

24/05/31 22:53:59 ERROR PatternLoggingSuite: This is a log message\nThis is a new line \t other msg

Obviously, the latter is not the result we expected.

Does this PR introduce any user-facing change?

Yes, fix bug.

How was this patch tested?

Add new UT.
Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

…Context

panbingkun · 2024-05-31T15:01:33Z

cc @gengliangwang

common/utils/src/main/scala/org/apache/spark/internal/Logging.scala

panbingkun · 2024-05-31T21:36:14Z

@gengliangwang
In addition, we also encountered this issue during migration: #46822

gengliangwang · 2024-05-31T23:44:32Z

Thanks, merging to master

dongjoon-hyun · 2024-06-17T22:14:55Z

common/utils/src/main/scala/org/apache/spark/internal/Logging.scala

 */
 class LogEntry(messageWithContext: => MessageWithContext) {
-  def message: String = messageWithContext.message
+  def message: String = StringEscapeUtils.unescapeJava(messageWithContext.message)


Hi, @panbingkun and @gengliangwang .

This seems to break SparkR somehow. Could you take a look at this?

https://github.com/apache/spark/actions/workflows/build_sparkr_window.yml

-- Error ('test_basic.R:25:3'): create DataFrame from list or data.frame ------- Error in `handleErrors(returnStatus, conn)`: java.lang.IllegalArgumentException: Unable to parse unicode value: serF at org.apache.commons.text.translate.UnicodeUnescaper.translate(UnicodeUnescaper.java:55) at org.apache.commons.text.translate.AggregateTranslator.translate(AggregateTranslator.java:58) at org.apache.commons.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:101) at org.apache.commons.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:63) at org.apache.commons.text.StringEscapeUtils.unescapeJava(StringEscapeUtils.java:802) at org.apache.spark.internal.LogEntry.message(Logging.scala:103)

Okay, Let me investigate it

@dongjoon-hyun
I have identified the root cause of this issue and am currently attempting to fix it
#46897 (comment)

### What changes were proposed in this pull request? Even with the fix in #46824, the escape sequences (`\r`, `\n`, `\t`, etc) are not handled properly. For example, when we use `log"\n"`, the StringContext interprets `\n` as a literal backslash `\` followed by `n` instead of a newline character. As a result, the bytes of `log"\n".message` becomes `[92, 110]`, instead of `[10]`. This PR is to fix the issue by using the method StringContext.processEscapes in `LogStringContext`. ### Why are the changes needed? To ensure that escape sequences are properly processed in Spark logs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT ### Was this patch authored or co-authored using generative AI tooling? No Closes #47050 from gengliangwang/fixEscape. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Kent Yao <[email protected]>

[SPARK-48490][CORE] Unescapes any literals for message of MessageWith…

31fb7c2

…Context

panbingkun marked this pull request as ready for review May 31, 2024 15:01

gengliangwang reviewed May 31, 2024

View reviewed changes

common/utils/src/main/scala/org/apache/spark/internal/Logging.scala Outdated Show resolved Hide resolved

optimize

a5d44a7

gengliangwang approved these changes May 31, 2024

View reviewed changes

gengliangwang closed this in 114164b May 31, 2024

dongjoon-hyun reviewed Jun 17, 2024

View reviewed changes

gengliangwang mentioned this pull request Jun 21, 2024

[SPARK-48490][CORE][FOLLOWUP] Properly process escape sequences #47050

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-48490][CORE] Unescapes any literals for message of MessageWithContext #46824

[SPARK-48490][CORE] Unescapes any literals for message of MessageWithContext #46824

Uh oh!

panbingkun commented May 31, 2024 •

edited

Loading

Uh oh!

panbingkun commented May 31, 2024

Uh oh!

Uh oh!

panbingkun commented May 31, 2024

Uh oh!

gengliangwang commented May 31, 2024

Uh oh!

dongjoon-hyun Jun 17, 2024

Uh oh!

panbingkun Jun 19, 2024

Uh oh!

panbingkun Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-48490][CORE] Unescapes any literals for message of MessageWithContext #46824

[SPARK-48490][CORE] Unescapes any literals for message of MessageWithContext #46824

Uh oh!

Conversation

panbingkun commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

panbingkun commented May 31, 2024

Uh oh!

Uh oh!

panbingkun commented May 31, 2024

Uh oh!

gengliangwang commented May 31, 2024

Uh oh!

dongjoon-hyun Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

panbingkun Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

panbingkun Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

panbingkun commented May 31, 2024 •

edited

Loading