Skip to content

Conversation

@Blackbaud-JasonBodnar
Copy link

Description

Lower the levels of two logging statements. According to the Log4J docs WARN should be used for "an event that might possible lead to an error". Starting up a client is not an event that could lead to an error.

All SDK Contribution checklist:

  • [ X ] The pull request does not introduce [breaking changes]
  • [ X ] CHANGELOG is updated for new features, bug fixes or other significant changes.
  • [ X ] I have read the contribution guidelines.

General Guidelines and Best Practices

  • [ X ] Title of the pull request is clear and informative.
  • [ X ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • [ NA ] Pull request includes test coverage for the included changes.

@github-actions
Copy link
Contributor

Thank you for your contribution @Blackbaud-JasonBodnar! We will review the pull request and get back to you soon.

@github-actions github-actions bot added Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jun 26, 2024
@kushagraThapar
Copy link
Member

Thanks @Blackbaud-JasonBodnar for this PR, while it makes sense to treat this log as info, however, I think the reason this was put as warn is because of the importance of client creation event. We have seen customers switching off the info logs in their application and ends up creating multiple cosmos clients without noticing it. That's why this was changed from info to warn here - #39169

@FabianMeiswinkel please provide more information on this change if the above doesn't make sense!

@Blackbaud-JasonBodnar
Copy link
Author

Thanks @Blackbaud-JasonBodnar for this PR, while it makes sense to treat this log as info, however, I think the reason this was put as warn is because of the importance of client creation event. We have seen customers switching off the info logs in their application and ends up creating multiple cosmos clients without noticing it. That's why this was changed from info to warn here - #39169

@FabianMeiswinkel please provide more information on this change if the above doesn't make sense!

But you are breaking standards. WARN should only be used for potential problems. If there is an issue with multiple clients being created then that should be logged as WARN.

Organizations take WARN and ERROR messages very seriously often alerting employees when too many are seen during a time window. My organization reviews our WARN and ERROR messages every day and these messages just obscure real messages that need to looked into.

Looking at the commit you refer to I see in one file:

  private def logInfoOrWarning(msg: => String): Unit = {
    if (this.verboseLoggingAfterReEnqueueingRetriesEnabled.get()) {
      log.logWarning(msg)
    } else {
      log.logInfo(msg)
    }
  }

Why wasn't something similar used for the CosmosClientBuilder? Or better yet, log it at the appropriate INFO or DEBUG level and when there is an issue tell customers to set their logging appropriately so they can see these messages. That's why we have different log levels and it is so easy to change the level of what is logged for each instance.

@Blackbaud-JasonBodnar
Copy link
Author

The other issue with logging these as WARN is that costs us money. We retain INFO, WARN and ERROR logs for 90 days. Other levels are only retained for a week. We're deployed in 4 zones in Azure with at least two nodes in each zone. If we release 4 times a day that's a minimum of 64 messages of useless information a day that we're now keeping for 90 days vs 7 days.

@kushagraThapar
Copy link
Member

@Blackbaud-JasonBodnar - we understand the concern, however, this specific log level is very important for our debugging purposes, because customers unintentionally create multiple clients, and sometimes not log info messages.

That being said, for your use case, we can change these log levels behind an environment flag, which will allow our customers to opt-out of these log levels as default warn and log them as info.
Would you be interested to make that follow up change in this PR? You can check our Configs.java file which contains all the environment variables and how they are mapped.

@Blackbaud-JasonBodnar
Copy link
Author

@kushagraThapar : What percentage of your customers are unintentionally creating multiple clients? And what percentage of them are not seeing INFO level logs? And what percentage of them are then contacting you? And what do you tell them? Check your logs? Could you not tell them "Make sure you are logging INFO level messages and then check your logs?"

My point is it seems like you're forcing the burden of a small number of customers onto the rest of your customers.

Sure, I can change the PR so this is configurable that's like asking people to opt out of spam when you really should be asking people to opt into spam if they really want or need it.

@kushagraThapar
Copy link
Member

@Blackbaud-JasonBodnar this is not just limited to the number of clients (which more often than not - I would say majority of customers create it :)) - but this is also about the information that is logged as part of this log - which contains many important aspects like tcp info, preferred regions, and many other important things that customers should consider while building their applications because this changes the behavior of their applications.

It is not about just seeing this log or ignoring it, rather a proactive warning showing what all configurations you are using to build the client which many customers seem to ignore.

If it helps, you can turn off the warning logging for these 2 classes in your application log4j properties as well but logging them as warning is important for the SDK use case and business.

@Blackbaud-JasonBodnar
Copy link
Author

@kushagraThapar : What you just described is why they should be logged as INFO.

INFO Designates informational messages that highlight the progress of the application at coarse-grained level.
WARN Designates potentially harmful situations.

If it helps, you can turn off the warning logging for these 2 classes in your application log4j properties as well but logging them as warning is important for the SDK use case and business.

Yes, but what happens when these classes start logging real warnings that really need attention? I don't see them. That's why you have different logging levels and why WARN is used for warnings and INFO is used for providing information about connections and such.

Why don't you log a warning (or even error) when a second client is created? You could even include text informing them to enable INFO logs to see all client creation.

@github-actions
Copy link
Contributor

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Sep 20, 2024
@Blackbaud-JasonBodnar
Copy link
Author

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

Still active. Waiting for it to be merged.

@github-actions github-actions bot removed the no-recent-activity There has been no recent activity on this issue. label Sep 20, 2024
@dvanackere-lpg
Copy link

I completely agree with @Blackbaud-JasonBodnar. We closely monitor all warnings and errors in our production environment and those logs generates false positives which is very annoying. Is there a reason why it is sill not merged?

@kushagraThapar
Copy link
Member

@Blackbaud-JasonBodnar @dvanackere-lpg I understand your use case and concerns about logging these as WARN messages. However, unfortunately we can't make this change as it is, reason being whenever a customer reaches out to us as a SEV-A ticket with any issues they are seeing, we really need to have access to these couple of logs as they show how the client was initialized with all the config and other important settings. We cannot ask the customer to enable this log (if it was INFO) and ask them to come back again if the issue happens again. We have to honor our supportability and customer trust.

That being said, I will make a change where we can expose a JVM property which can log this in INFO if you would like, however, a general note, this log is of utmost importance and should not be ignored. Will keep you posted with my change, and it should unblock your use case.

@jcbodnar
Copy link

jcbodnar commented Nov 6, 2024

Is it not possible to log these only when there's an error -- IE multiple clients are created in the same process? For those of us doing things correctly why should we have to see this log?

@kushagraThapar
Copy link
Member

@jcbodnar to provide more clarity on this, creating multiple clients is not an error, rather a practice against recommendation. It is not entirely an issue, if your compute can handle multiple Cosmos Clients in the same JVM, however, we recommend against that.

Another thing, this particular log prints out details that are required even if there is only a single cosmos client, for example machineId, Correlation Id, and other internal details like nativeTransport, fastClientOpen, etc. which are very critical for debugging if an issue arises in any condition like network issues, intermittent availability hits, etc.

Imagine your production system is down, and I ask you to re-enable this log and reach out to me next time when your production is down again, how frustrating that is for any customer :)

That being said, adding a JVM flag and hiding this log warn behind that flag should solve your issue, or do you have any other suggestions? I am open to feedback.

@github-actions
Copy link
Contributor

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Jan 10, 2025
@github-actions
Copy link
Contributor

Hi @Blackbaud-JasonBodnar. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

@github-actions github-actions bot closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. no-recent-activity There has been no recent activity on this issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants