Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Apr 3, 2024

What changes were proposed in this pull request?

The pr aims to migrate logError in module MLLib with variables to structured logging framework.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

}

private def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = {
protected def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = {
Copy link
Contributor Author

@panbingkun panbingkun Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change private to protected
because some class extends Logging and override the method logInfo, logWarn, 'logError', eg:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala

/**
* Logs a LogEntry which message with a prefix that uniquely identifies the training session.
*/
override def logError(entry: LogEntry): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can write it as follows:

override def logError(entry: LogEntry): Unit = {
   super.logError(MessageWithContext(prefix + entry.message, entry.context))
}

But it seems that the efficiency is not as high as mentioned above.

@panbingkun
Copy link
Contributor Author

cc @gengliangwang

val MIN_SIZE = Value
val REMOTE_ADDRESS = Value
val POD_ID = Value
val NUM_ITERATIONS = Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's sort the keys.

val msg = s"Classification labels should be in [0 to ${numClasses - 1}]. " +
s"Found $numInvalid invalid labels."
val msg = log"Classification labels should be in " +
log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about making the log keys generic. How about making it RANGE here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

s"Found $numInvalid invalid labels."
val msg = log"Classification labels should be in " +
log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " +
log"Found ${MDC(NUM_CLASSIFICATION_LABELS, numInvalid)} invalid labels."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about making the log keys generic. How about making it COUNT here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah


if (rawCoefficients == null) {
val msg = s"${optimizer.getClass.getName} failed."
val msg = log"${MDC(OPTIMIZER_CLASS_NAME, optimizer.getClass.getName)} failed."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple duplicated codes in the changes of this PR. Let's create a method to reduce duplications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, very good suggestion

case e: java.lang.AssertionError =>
logError(s"FAILED for numIterations=$numIterations, learningRate=$learningRate," +
s" subsamplingRate=$subsamplingRate")
logError(log"FAILED for numIterations=${MDC(NUM_ITERATIONS, numIterations)}, " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a method to reduce duplicated code in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@panbingkun
Copy link
Contributor Author

@gengliangwang all done.

@panbingkun panbingkun marked this pull request as ready for review April 4, 2024 00:57
@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants