fix: log swallowed exceptions in core server components by SoftFx · Pull Request #1023 · SoftFx/Hierarchical-Sensor-Monitoring

SoftFx · 2026-03-23T15:53:45Z

Summary

Three catch blocks in the server core were silently swallowing exceptions with no logging, making production issues impossible to diagnose.

TreeValuesCache — MigrateDatabaseV2: logs sensorId + exception when value deserialization fails, instead of silently continue-ing
AlertScheduleProvider — LoadSchedulesFromDb: logs parse error with schedule Id/Name when _parser.Parse() throws, before falling back to raw entity
TreeStateSnapshot — constructor: added NLog logger, logs deserialization error when state snapshot fails to load instead of silently setting HasData = false

Impact

Previously these failures were completely invisible in logs. Now operators can see exactly which sensors/schedules are corrupted and investigate.

🤖 Generated with Claude Code

Three silent catch blocks were hiding errors with no diagnostic info: - TreeValuesCache.MigrateDatabaseV2: log sensorId when value deserialization fails during V2 migration instead of silently skipping the record - AlertScheduleProvider.LoadSchedulesFromDb: log parse error and entity id/name when schedule parsing fails and fallback to raw entity is used - TreeStateSnapshot ctor: add NLog logger and log deserialization error when state snapshot fails to load, instead of silently setting HasData=false Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-23T15:55:29Z

PR Review: fix/log-swallowed-exceptions

Overall assessment: Clean, focused, and correct fix. The three catch-all blocks were genuine observability gaps and the logging additions are well-targeted with useful context (IDs, names). No bugs or security issues introduced.

What's good

Each log message includes actionable identifiers (entity.Id, entity.Name, sensorId) so operators can correlate log entries with specific data.
Existing behavior (fallback / continue / HasData = false) is preserved unchanged — safe, low-risk fix.
TreeStateSnapshot correctly uses a static readonly Logger on a sealed class.

Issues

Minor — Logger style inconsistency

TreeStateSnapshot adds:

private static readonly Logger _logger = LogManager.GetCurrentClassLogger();

But TreeValuesCache uses:

private readonly Logger _logger = LogManager.GetLogger(nameof(TreeValuesCache));

And AlertScheduleProvider uses an instance (not static) field with GetCurrentClassLogger(). All three approaches work correctly with NLog, but using a consistent pattern across the codebase reduces confusion. static readonly is the idiomatic NLog recommendation.

Minor — `using` directive ordering in `TreeStateSnapshot.cs`

The new using NLog; and using System; are inserted after the HSMDatabase.* project usings:

using HSMDatabase.AccessManager;
using HSMDatabase.AccessManager.DatabaseEntities.SnapshotEntity;
using HSMServer.Core.DataLayer;
using NLog;        // ← added here
using System;      // ← added here
using System.Threading.Tasks;

Convention (and StyleCop/Roslyn defaults) place System.* and framework namespaces before third-party and project namespaces. Suggest:

using System;
using System.Threading.Tasks;
using HSMDatabase.AccessManager;
using HSMDatabase.AccessManager.DatabaseEntities.SnapshotEntity;
using NLog;
using HSMServer.Core.DataLayer;
using HSMServer.Core.TreeStateSnapshot.States;

Worth noting — pre-existing fragile fallback in `AlertScheduleProvider`

This PR didn't introduce it, but the catch block in LoadSchedulesFromDatabase is now more visible:

catch (Exception ex)
{
    _logger.Error(ex, $"Failed to parse alert schedule. Id = {entity.Id}, Name = {entity.Name}. Using raw entity as fallback.");
    schedule = new AlertSchedule()
    {
        Id = new Guid(entity.Id),   // throws if entity.Id is null or malformed
        ...
    };
}

If entity.Id is null or not a valid GUID, new Guid(entity.Id) will throw a second unhandled exception that propagates out of the foreach, aborting loading of all remaining schedules. Since this PR adds logging just above it, it's now a good moment to also wrap the fallback in a nested try/catch or validate entity.Id first. Not a blocker for this PR, but worth a follow-up issue.

Summary

Category	Finding	Severity
Logger style	Inconsistent `static` vs instance field across three files	Nit
`using` order	`System` and `NLog` usings out of conventional order	Nit
Pre-existing	Fallback in `AlertScheduleProvider` can throw again on bad `entity.Id`	Follow-up

No blockers. Approve with the above nits noted.

…atabaseV2 AlertScheduleProvider: skip unparseable schedules instead of adding them to cache as broken objects. A partially-constructed AlertSchedule with no parsed working hours has unpredictable IsWorkingTime behavior - skipping is safer than a fallback that silently misfires alerts. TreeValuesCache.MigrateDatabaseV2: add success/failed counters and log migration summary so data loss during migration is visible in logs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-23T15:58:38Z

PR Review: fix/log-swallowed-exceptions

Overall direction is correct — silently swallowed exceptions are genuinely harmful for observability. However there are a few issues worth addressing before merging.

1. Breaking behavioral change in `AlertScheduleProvider.cs` (Bug)

File: src/server/HSMServer.Core/AlertSchedule/AlertScheduleProvider.cs

The old code caught parse failures and fell back to constructing a minimal AlertSchedule from raw entity data (Id, Name, Timezone, Schedule), so the schedule was still usable even if parsing failed. The new code logs and continues — the schedule is skipped entirely.

This is a significant semantic change: any schedule that previously survived a parse error via the fallback path will now simply not exist in _cache. Downstream code that looks up a schedule by ID will get a miss instead of a degraded-but-functional entry.

The PR description says "before falling back to raw entity" but the fallback has actually been removed, which makes the description inaccurate.

Recommendation: Either keep the fallback (add the log and keep the fallback object creation), or explicitly justify why dropping the schedule is safer than the fallback. If the fallback object was itself causing problems (e.g. invalid state that crashes later), document that reasoning.

// Consider: log AND fall back, rather than log AND skip
catch (Exception ex)
{
    _logger.Error(ex, $"Failed to parse alert schedule. Id = {entity.Id}, Name = {entity.Name}. Using raw entity as fallback.");
    schedule = new AlertSchedule()
    {
        Id = new Guid(entity.Id),
        Name = entity.Name,
        Timezone = entity.Timezone,
        Schedule = entity.Schedule,
    };
}

2. Inaccurate `success` counter in `MigrateDatabseV2` (Minor Bug)

File: src/server/HSMServer.Core/Cache/TreeValuesCache.cs

success++ is placed after the if (!policy.TimeIsUp(val.Time)) block, so it counts every record that deserialized successfully — including records not actually written to the database because TimeIsUp was true (value expired).

The log message says "{success} records migrated" which implies they were stored, but expired records are counted too.

Additionally, records where GetSensor(sensorId) returns null are silently skipped with no logging and no counter. These could represent a meaningful data loss during migration.

Recommendation:

if (!policy.TimeIsUp(val.Time))
{
    _database.AddSensorValue(sensorId, val);
    success++;   // only count what was actually stored
}
// else: optionally log skipped-as-expired records

And consider a counter or log for null-sensor records:

var sensor = GetSensor(sensorId);
if (sensor == null)
{
    _logger.Warn($"Sensor not found during V2 migration, skipping. SensorId = {sensorId}");
    failed++;
    continue;
}

3. Misleading log message in `TreeStateSnapshot.cs` (Nit)

File: src/server/HSMServer.Core/TreeStateSnapshot/TreeStateSnapshot.cs

"Starting with empty state" could be misleading if part of the snapshot was processed before the exception occurred (e.g., some sensors or keys already populated). A more precise message like "Snapshot deserialization failed, discarding partial state" would better reflect what actually happens.

Summary

Severity	File	Issue
Medium	`AlertScheduleProvider.cs`	Fallback silently removed — schedules now dropped on parse error instead of degraded-but-loaded
Low	`TreeValuesCache.cs`	`success` counter over-counts (includes expired/not-stored records); null-sensor records unaccounted
Nit	`TreeStateSnapshot.cs`	Log message may be slightly misleading

The logging additions themselves are the right call. The TreeStateSnapshot change is clean. The main concern is the AlertScheduleProvider behavioral change which could manifest as missing alert schedules in production.

github-actions · 2026-03-25T12:17:19Z

PR Review: fix/log-swallowed-exceptions

Overall direction is correct — silent catch blocks are a real observability problem. However, there are a few bugs and a behavioral regression introduced by the fix.

Bug: `entity.Id` logs as `"System.Byte[]"` — `AlertScheduleProvider.cs:138`

AlertScheduleEntity.Id is byte[] (see HSMDatabase.AccessManager/DatabaseEntities/AlertScheduleEntity.cs). String-interpolating it directly produces "System.Byte[]" in the log message, which is useless for diagnosis.

The original code correctly converted it: new Guid(entity.Id).

Current (broken):

_logger.Error(ex, $"Failed to parse alert schedule. Id = {entity.Id}, Name = {entity.Name}. ...");

Fix:

_logger.Error(ex, $"Failed to parse alert schedule. Id = {new Guid(entity.Id)}, Name = {entity.Name}. ...");

Behavioral regression: corrupt schedule is now silently dropped — `AlertScheduleProvider.cs:139`

The old code fell back to a stub AlertSchedule populated from the raw entity fields, keeping the schedule ID present in _cache. The new code does continue, removing the entry entirely.

This is a breaking behavior change: any downstream code that looks up a schedule by ID (e.g., when evaluating alert policies) will now get a cache miss for a schedule that previously existed. Depending on how callers handle null or missing cache entries, this could surface as NullReferenceExceptions or silently broken alerting for sensors using that schedule.

The PR description still says "before falling back to raw entity," but the implementation no longer does that.

If skipping is truly the intended behavior, callers that reference this schedule ID need to be audited for how they handle missing entries.

`success` counter is misleading — `TreeValuesCache.cs:1869`

success++ increments for every record where the sensor was found and deserialization succeeded, including records filtered out by policy.TimeIsUp(val.Time) (i.e., records that were intentionally not written to the database). The summary log says "records migrated" which implies they were written.

Additionally, records where GetSensor(sensorId) returns null are silently skipped — they're counted in neither success nor failed. The migration summary gives no indication of how many records were skipped this way.

Minor, but the counter name and log message should either reflect "successfully processed" instead of "migrated," or the counter should only increment after AddSensorValue is called.

`AddSensorValue` exceptions are unhandled — `TreeValuesCache.cs:1866`

The PR adds a catch around ConvertFromJson, but _database.AddSensorValue(sensorId, val) at line 1866 remains unguarded. If it throws, the entire migration aborts mid-loop with no summary log, and any partial progress is lost with no indication of where it stopped. This was pre-existing, but in the context of a PR that's hardening this method, it's worth addressing.

`TreeStateSnapshot.cs` — looks good

Adding the logger and logging the deserialization error before setting HasData = false is correct and clean. No issues here.

Merge branch 'master' into fix/log-swallowed-exceptions

6900495

GonzoSpire merged commit b86fef0 into master Mar 26, 2026
1 check passed

GonzoSpire deleted the fix/log-swallowed-exceptions branch March 26, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: log swallowed exceptions in core server components#1023

fix: log swallowed exceptions in core server components#1023
GonzoSpire merged 3 commits intomasterfrom
fix/log-swallowed-exceptions

SoftFx commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SoftFx commented Mar 23, 2026

Summary

Impact

Uh oh!

github-actions bot commented Mar 23, 2026

PR Review: fix/log-swallowed-exceptions

What's good

Issues

Minor — Logger style inconsistency

Minor — using directive ordering in TreeStateSnapshot.cs

Worth noting — pre-existing fragile fallback in AlertScheduleProvider

Summary

Uh oh!

github-actions bot commented Mar 23, 2026

PR Review: fix/log-swallowed-exceptions

1. Breaking behavioral change in AlertScheduleProvider.cs (Bug)

2. Inaccurate success counter in MigrateDatabseV2 (Minor Bug)

3. Misleading log message in TreeStateSnapshot.cs (Nit)

Summary

Uh oh!

github-actions bot commented Mar 25, 2026

PR Review: fix/log-swallowed-exceptions

Bug: entity.Id logs as "System.Byte[]" — AlertScheduleProvider.cs:138

Behavioral regression: corrupt schedule is now silently dropped — AlertScheduleProvider.cs:139

success counter is misleading — TreeValuesCache.cs:1869

AddSensorValue exceptions are unhandled — TreeValuesCache.cs:1866

TreeStateSnapshot.cs — looks good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Minor — `using` directive ordering in `TreeStateSnapshot.cs`

Worth noting — pre-existing fragile fallback in `AlertScheduleProvider`

1. Breaking behavioral change in `AlertScheduleProvider.cs` (Bug)

2. Inaccurate `success` counter in `MigrateDatabseV2` (Minor Bug)

3. Misleading log message in `TreeStateSnapshot.cs` (Nit)

Bug: `entity.Id` logs as `"System.Byte[]"` — `AlertScheduleProvider.cs:138`

Behavioral regression: corrupt schedule is now silently dropped — `AlertScheduleProvider.cs:139`

`success` counter is misleading — `TreeValuesCache.cs:1869`

`AddSensorValue` exceptions are unhandled — `TreeValuesCache.cs:1866`

`TreeStateSnapshot.cs` — looks good