Skip to content

Conversation

@asl3
Copy link
Contributor

@asl3 asl3 commented Jun 12, 2024

What changes were proposed in this pull request?

This PR bans logging messages using logInfo, logWarning, logError and containing variables without MDC wrapper

Why are the changes needed?

This makes development and PR review of the structured logging migration easier.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual test, verified it will throw errors on invalid logging messages.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the BUILD label Jun 12, 2024
@asl3
Copy link
Contributor Author

asl3 commented Jun 12, 2024

</check>

<check customId="logInlineVariable" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
<parameters><parameter name="regex">log(?:Info|Warning|Error)\(s".*(\$|\+\s*[^\s"]).*"\)</parameter></parameters>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to detect the ".format(..)` as well.
Also, need to exclude the testing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar rules also need to be added to dev/checkstyle.xml to check Java code

Copy link
Contributor

@LuciferYang LuciferYang Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, need to exclude the testing code.

This might be difficult, as for scalastyle-maven-plugin, there is currently no mechanism similar to Java checkstyle's SuppressionFilter that can ignore certain rules in specific files or code blocks. The rule ignoring of scalastyle is global and cannot be targeted at specific files or code blocks. After defining the check rule, we may only be able to skip the relevant checks in a way similar to the following:

// scalastyle:off logInlineVariable
logInfo(s"....")
// scalastyle:on logInlineVariable

@pan3793
Copy link
Member

pan3793 commented Jun 12, 2024

@gengliangwang It might be aggressive to ban all log{Info|Warning|Error} calls with string at the current stage, especially without SPARK-47688.

I suppose that the values defined in LogKeys are public APIs, and they should not be removed after 4.0.0 is released. Well, the current list might need to polish, for example, ALPHA, BOOT, INIT might be too general to represent the concrete concept, they were exposed because the current implementation requires all substitutable variables MUST be a LogKeys.

Values of LogKeys like *_TIME are inconsistent, some values have the unit but some do not.

@pan3793
Copy link
Member

pan3793 commented Jun 12, 2024

I strongly suggest reconsidering SPARK-47688 and being conservative on defining LogKeys in the first GA version that delivers this awesome feature, and evaluating each new LogKeys carefully in the subsequent versions to avoid breaking public API as much as possible.

also cc @cloud-fan @LuciferYang

@gengliangwang
Copy link
Member

@pan3793 I appreciate the feedback and concerns raised regarding the migration of variables in our new structured logging framework for Apache Spark. I'd like to address these concerns and provide some clarity on our approach.

Why Migrate All Variables?

The primary reason for migrating all variables in our log messages is to avoid ongoing debates about which specific keys should be included as log keys. By disallowing the inlining of variables in log messages and ensuring all variables are explicitly named and migrated, we achieve a consistent and comprehensive logging structure. This approach ensures uniformity across all log entries.

Current Status of Migration

The migration process is nearly complete, which means we've already invested significant effort into ensuring that our log entries adhere to the new framework. Moving forward, as long as we follow the new logging rules, we maintain control over all the variables in our logs. This control is crucial for effective log analysis and troubleshooting.

Managing Variable Overload

To address concerns about the potential overload of variables for users, we have a couple of solutions:

  • Configuration Allow List: We can build a configuration option to pass an allow list of log keys, enabling users to specify which variables should appear in their logs.
  • Partitioned Log Keys: We can partition the log keys so that only essential ones appear by default, while others show up in verbose logging mode. This way, users can choose the level of detail they want to see based on their needs.

Our goal with these changes is to enhance the clarity, consistency, and utility of our logs.

@pan3793
Copy link
Member

pan3793 commented Jun 13, 2024

@gengliangwang thanks for the summary.

To address concerns about the potential overload of variables for users ...

Major concerns come from public API exposure. Spark is conservative for deleting deprecated public API, for example, HiveContext has been marked as deprecated since 2.0 and it still lives. Nearly a thousand LogKeys were added in a short time, and were exposed as public API.

The primary reason for migrating all variables in our log messages is to avoid ongoing debates about which specific keys should be included as log keys.

IMO the debates are valuable, I do think NOT all variables are suitable for the LogKey concept. Additionally, we may need to tune the original logs to adapt to the new structured logging framework. For example, exposing the TASK_ID to all task-specific logs makes it easy to filter out each task's logs.

@gengliangwang
Copy link
Member

gengliangwang commented Jun 13, 2024

Nearly a thousand LogKeys were added in a short time, and were exposed as public API.

LogKey is totally internal. It is under the package org.apache.spark.internal

Additionally, we may need to tune the original logs to adapt to the new structured logging framework. For example, exposing the TASK_ID to all task-specific logs makes it easy to filter out each task's logs.

Yes, we should improve this. But this is orthogonal to what this PR tries to do. With the enforcement of making variables as MDC, we can remind developers to put variables as MDC or verbose MDC in new log entries.
On the other hand, new log entries may not use the new framework, and the project eventually becomes useless.

gengliangwang pushed a commit that referenced this pull request Jun 19, 2024
### What changes were proposed in this pull request?
This PR migrates Scala logging to comply with the scala style changes in [#46979](#46947)

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by ensuring `dev/scalastyle` checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46980 from asl3/logging-migrationscala.

Authored-by: Amanda Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
@asl3 asl3 marked this pull request as ready for review June 24, 2024 23:27
val scalaStyleOnCompileConfig: String = {
val in = "scalastyle-config.xml"
val base_style_config = "scalastyle-config.xml"
val prod_style_config = "scalastyle-config-prod.xml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this a convention? Does something named -prod only check the format of a specific directory?
  2. It is necessary to synchronize the modifications to the scalastyle-maven-plugin in the parent pom.xml.

Copy link
Member

@gengliangwang gengliangwang Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here: scalastyle-config-prod.xml is for production code, which means it won't check the style of tests.

But I can't tell the difference of the target files from the PR changes...

@asl3 asl3 marked this pull request as draft July 2, 2024 15:58
gengliangwang pushed a commit that referenced this pull request Jul 9, 2024
### What changes were proposed in this pull request?
This PR makes additional Scala logging migrations to comply with the scala style changes in #46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47256 from asl3/morestructuredloggingmigrations.

Authored-by: Amanda Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
gengliangwang added a commit that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?
This PR makes additional Scala logging migrations to comply with the scala style changes in #46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47275 from asl3/formatstructuredlogmigrations.

Lead-authored-by: Amanda Liu <[email protected]>
Co-authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
gengliangwang pushed a commit that referenced this pull request Jul 18, 2024
### What changes were proposed in this pull request?
This PR migrates `src/main/scala/org/apache/spark/util/logging/FileAppender.scala` to comply with the scala style changes in #46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47394 from asl3/asl3/migratenewfiles.

Authored-by: Amanda Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
This PR makes additional Scala logging migrations to comply with the scala style changes in apache#46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47256 from asl3/morestructuredloggingmigrations.

Authored-by: Amanda Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
This PR makes additional Scala logging migrations to comply with the scala style changes in apache#46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47275 from asl3/formatstructuredlogmigrations.

Lead-authored-by: Amanda Liu <[email protected]>
Co-authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
This PR migrates `src/main/scala/org/apache/spark/util/logging/FileAppender.scala` to comply with the scala style changes in apache#46947

### Why are the changes needed?
This makes development and PR review of the structured logging migration easier.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Tested by ensuring dev/scalastyle checks pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47394 from asl3/asl3/migratenewfiles.

Authored-by: Amanda Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
@gengliangwang
Copy link
Member

I am closing this one since we already merged #47239

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants