Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 19, 2024

What changes were proposed in this pull request?

The PR aims to solve the issue encountered in the following scenarios:
#46897 (comment)
We force the execution of StringEscapeUtils.unescapeJava operation on message of MessageWithContext, which will result in unexpected behavior.

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA
Existed UT

Was this patch authored or co-authored using generative AI tooling?

No.


def basicMsgWithEscapeCharMDC: LogEntry =
log"This is a log message\nThis is a new line \t other msg"
log"This is a log message" + LF + log"This is a new line " + TAB + log" other msg"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the above writing may seem a bit ugly, so far, the only method I can find that meets the requirements and is relatively acceptable

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 19, 2024

org.apache.commons.text.StringEscapeUtils.unescapeJava("C:\\Users\\runneradmin\\AppData\\Local\\Temp\\spark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e\\userFiles-aaee9fd8-13b2-4cb2-a968-52895086988b\\run-all.R")
Unable to parse unicode value: serF
java.lang.IllegalArgumentException: Unable to parse unicode value: serF
	at org.apache.commons.text.translate.UnicodeUnescaper.translate(UnicodeUnescaper.java:55)
	at org.apache.commons.text.translate.AggregateTranslator.translate(AggregateTranslator.java:58)
	at org.apache.commons.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:101)
	at org.apache.commons.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:63)
	at org.apache.commons.text.StringEscapeUtils.unescapeJava(StringEscapeUtils.java:802)

@panbingkun
Copy link
Contributor Author

@panbingkun panbingkun marked this pull request as ready for review June 19, 2024 12:04
@panbingkun
Copy link
Contributor Author

Regarding the unexpected behaviors that already exist in our Spark, I will work together to fix them once this PR is completed. eg:

logInfo(log"Unzipped from ${MDC(PATH, dfsZipFile)}\n\t${MDC(PATHS, files.mkString("\n\t"))}")

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 19, 2024

The root cause is that it be recognized as a string, not a character in StringContext, eg: '\n'

  • when write as log"\n", it is actually:
92, 110
  • when write as s"\n", it is actually:
10

* Companion class for lazy evaluation of the MessageWithContext instance.
*/
class LogEntry(messageWithContext: => MessageWithContext) {
def message: String = StringEscapeUtils.unescapeJava(messageWithContext.message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe simply try-catch, and return the original string if it fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows users are arguably less in any event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and logging shouldn't fail the application anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..., this may have successfully unescaped, but the result does not meet expectations, such as:

println(StringEscapeUtils.unescapeJava("C:\\Users\\runneradmin\\AppData\\Local\\remp\\spark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e"))

We may expect it to output as:

C:\Users\runneradmin\AppData\Local\remp\spark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e

Actual, its output is:

empspark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e

So it's better not to execute the unescape operation and let the user manually use the method I wrote in UT to achieve its expected behavior, although it does look a bit ugly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and logging shouldn't fail the application anyway.

Yes, we can use try ... catch temporarily fixes it, but it has some of my flaws mentioned above.
Or should we do this (use try ... catch) first and let build_sparkr_window pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this approach but I think we need @gengliangwang 's look at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's wait for @gengliangwang's review.
Thanks!

@gengliangwang
Copy link
Member

The root cause is that it be recognized as a string, not a character in StringContext, eg: '\n'
when write as log"\n", it is actually:
92, 110

@panbingkun I don't get it. How does this happen in windows? We are passing entry.message to the log4j methods.


private final val LF_CHAR: Char = '\n'
private final val TAB_CHAR: Char = '\t'
final val LF = MessageWithContext(s"$LF_CHAR", new java.util.HashMap[String, String]())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait..are you suggesting that we should use this constant in log messages? If so, I would prefer try...catch as @HyukjinKwon suggested.

Copy link
Contributor Author

@panbingkun panbingkun Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is just to make it easier for developers to use commonly used escape characters, such as:
    line break - LF_CHAR, tab - TAB_CHAR
    Of course, we can also use it this way:
    image

  • Additional explanation, if we write it as follows, it will exhibit unexpected behavior:
    image
    Because it is written in this way, in our framework it will consider "\n" as a string rather than a character
    image
    that is to say:

    s"\n" != log"\n".message
    

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 20, 2024

The root cause is that it be recognized as a string, not a character in StringContext, eg: '\n'
when write as log"\n", it is actually:
92, 110

@panbingkun I don't get it. How does this happen in windows? We are passing entry.message to the log4j methods.

  • It should not only be on Windows, as long as there are strings like "\r", "\t", "\n" in our log messages, similar unexpected behavior will occur.
  • However, it happens that the file path on Windows is like this: "C:\Users\runneradmin\AppData\Local\remp\spark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e", while on Mac OS or Linux systems, the file path is "/Users/runneradmin/AppData/Local/remp/spark-9f4b7e74-ea9e-46fc-bde6-1449cc9f6e8e", so we did not encounter it before.

MessageWithContext(sb.toString(), context)
}

def log(c: Char): MessageWithContext = {
Copy link
Contributor Author

@panbingkun panbingkun Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have provided a new method for the character

@gengliangwang
Copy link
Member

@panbingkun I created a new approach in #47050

@panbingkun panbingkun closed this Jun 21, 2024
@panbingkun
Copy link
Contributor Author

closing in favor of #47050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants