Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactor risk/notable query to dispatch search sid
  • Loading branch information
xqi-splunk committed Jun 23, 2025
commit 1349ff0c1ccae6b54aa562e48d14636ef4625e1e
31 changes: 23 additions & 8 deletions contentctl/objects/correlation_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,9 @@ class CorrelationSearch(BaseModel):
# cleanup of this index
test_index: str | None = Field(default=None, min_length=1)

# The search ID of the last dispatched search; this is used to query for risk/notable events
sid: str | None = Field(default=None)

# The logger to use (logs all go to a null pipe unless ENABLE_LOGGING is set to True, so as not
# to conflict w/ tqdm)
logger: logging.Logger = Field(
Expand Down Expand Up @@ -473,6 +476,8 @@ def dispatch(self) -> splunklib.Job:
f"Job {job.sid} has finished running in {time_to_execute} seconds."
)

self.sid = job.sid

return job # type: ignore
except HTTPError as e:
raise ServerError(
Expand Down Expand Up @@ -581,10 +586,15 @@ def get_risk_events(self, force_update: bool = False) -> list[RiskEvent]:

# TODO (#248): Refactor risk/notable querying to pin to a single savedsearch ID
# Search for all risk events from a single scheduled search (indicated by orig_sid)
query = (
f'search index=risk search_name="{self.name}" [search index=risk search '
f'search_name="{self.name}" | tail 1 | fields orig_sid] | tojson'
)
if self.sid is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case of maintaining this code path? Do we expect to be invoking get_risk_events without a successful dispatch returning an SID?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 1029 to line 1040, there are codes to make sure the indexes are currently empty and the detection is starting from a disabled state calling the self.risk_event_exists().

These validations are happened before dispatch(), therefore, the self.sid is always None for this case. If we drop the previous old query, the new query would always return empty and might give false positive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. In that case I have two thoughts:

  1. Do we still need this cleanup check at the start of every test? This was very necessary when this was first implemented, as we did no filtering on SIDs of any kind, or even on the search name. We just said that the indexes needed to start empty, and if anything ended up in them, it was a success (so cleaning up beforehand was essential). Now, we search on both the detection name and the dispatched SID, so the cleanup prior to testing becomes less relevant. What do we think about dropping it? @xqi-splunk @pyth0n1c

  2. If we want to keep the cleanup prior to testing, then rework risk_event_exists() s.t. it takes a flag indicating whether we're checking for any risk/notable events related to this search, or for those specific to the SID. e.g. a param like for_sid which defaults to True. If for_sid is True and the SID is still None, it should throw an error. If for_sid is False, it should just search for any events matching the detection (no need for the subquery filtering).

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are filtering by SID (and on top of that the search name) I do not think this is required. We do run the post cleanup here as well:

Given that we are presently relying on the caller to do Attack Data Cleanup, I think this change is okay and is a small speed optimization of a second or two per test due to running less SPL (having this cleanup up function remove attack data, if we passed test_index as a value other than None would cause issues here anyway).

tl;dr - I would remove the cleanup check at the start of every test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second point, I wonder what's the use case of for_sid set to False? Since we set to have maximum 3 retries, I feel like might be a bit risky if we simply search for any events matching the detection. That might give out doubling risk/notable objects. Suppose we encounter some extreme cases where the risk/notable objects doesn't generate in the first attempt but some how generated during the second attempt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For option 2, I was only suggesting that bool field in the case we still wanted to do cleanup pre-test. But I agree with Eric for all the reasons mentioned. I would remove this code path (or throw on self.sid is None) and remove the pre-test cleanup

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the cleanup check and removed the if self.sid is None: code path.

# query for validating detection is starting from a disabled state
query = (
f'search index=risk search_name="{self.name}" [search index=risk search '
f'search_name="{self.name}" | tail 1 | fields orig_sid] | tojson'
)
else:
# query after the detection has been enabled and dispatched
query = f'search index=risk search_name="{self.name}" orig_sid="{self.sid}" | tojson'
result_iterator = self._search(query)

# Iterate over the events, storing them in a list and checking for any errors
Expand Down Expand Up @@ -657,10 +667,15 @@ def get_notable_events(self, force_update: bool = False) -> list[NotableEvent]:
return self._notable_events

# Search for all notable events from a single scheduled search (indicated by orig_sid)
query = (
f'search index=notable search_name="{self.name}" [search index=notable search '
f'search_name="{self.name}" | tail 1 | fields orig_sid] | tojson'
)
if self.sid is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the if self.sid is None: code path.

# query for validating detection is starting from a disabled state
query = (
f'search index=notable search_name="{self.name}" [search index=notable search '
f'search_name="{self.name}" | tail 1 | fields orig_sid] | tojson'
)
else:
# query after the detection has been enabled and dispatched
query = f'search index=notable search_name="{self.name}" orig_sid="{self.sid}" | tojson'
result_iterator = self._search(query)

# Iterate over the events, storing them in a list and checking for any errors
Expand Down
Loading