-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework #45837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…red logging framework
| } | ||
|
|
||
| private def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = { | ||
| protected def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change private to protected
because some class extends Logging and override the method logInfo, logWarn, 'logError', eg:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala
| /** | ||
| * Logs a LogEntry which message with a prefix that uniquely identifies the training session. | ||
| */ | ||
| override def logError(entry: LogEntry): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can write it as follows:
override def logError(entry: LogEntry): Unit = {
super.logError(MessageWithContext(prefix + entry.message, entry.context))
}
But it seems that the efficiency is not as high as mentioned above.
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
Outdated
Show resolved
Hide resolved
| val MIN_SIZE = Value | ||
| val REMOTE_ADDRESS = Value | ||
| val POD_ID = Value | ||
| val NUM_ITERATIONS = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's sort the keys.
| val msg = s"Classification labels should be in [0 to ${numClasses - 1}]. " + | ||
| s"Found $numInvalid invalid labels." | ||
| val msg = log"Classification labels should be in " + | ||
| log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about making the log keys generic. How about making it RANGE here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
| s"Found $numInvalid invalid labels." | ||
| val msg = log"Classification labels should be in " + | ||
| log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " + | ||
| log"Found ${MDC(NUM_CLASSIFICATION_LABELS, numInvalid)} invalid labels." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about making the log keys generic. How about making it COUNT here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
|
|
||
| if (rawCoefficients == null) { | ||
| val msg = s"${optimizer.getClass.getName} failed." | ||
| val msg = log"${MDC(OPTIMIZER_CLASS_NAME, optimizer.getClass.getName)} failed." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple duplicated codes in the changes of this PR. Let's create a method to reduce duplications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, very good suggestion
| case e: java.lang.AssertionError => | ||
| logError(s"FAILED for numIterations=$numIterations, learningRate=$learningRate," + | ||
| s" subsamplingRate=$subsamplingRate") | ||
| logError(log"FAILED for numIterations=${MDC(NUM_ITERATIONS, numIterations)}, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a method to reduce duplicated code in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
@gengliangwang all done. |
|
Thanks, merging to master |
What changes were proposed in this pull request?
The pr aims to migrate
logErrorin moduleMLLibwith variables tostructured logging framework.Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.