-
Notifications
You must be signed in to change notification settings - Fork 8
Linux Target pids should work with existing and future tasks within processes #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ProcFS on Linux allows us to enumerate processes and tasks, however, we only offer process enumeration. Add iter_proc_tasks() so callers can iterate over all the tasks within the process. Signed-off-by: Beau Belgrave <[email protected]>
The perf_event_open() syscall can target specific tasks, but it will not automatically track new tasks. Nor will it enumerate the existing tasks. Add methods to enumerate the current tasks for target_pids and include these when enabling ring buffers. Add the inherit flag to tell perf_event_open() to carry the perf_event settings through to child tasks. Signed-off-by: Beau Belgrave <[email protected]>
brianrob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of questions below.
|
|
||
| /* Find all unique tasks IDs */ | ||
| for pid in pids.drain(..) { | ||
| tasks.insert(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we insert pid here, then we're going to end up with duplicate pids when this function returns, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tasks is a HashSet, so we will only keep unique task IDs. I wanted to ensure we did not have duplicates. For example, if /proc/<pid>/task is given a task, it will include parent tasks. The hashset is preventing it, and pids is drained fully, so we only end up with a unique set after this function (pids drain into a hashset that then repopulates the drained pids vec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - I missed that it calls drain to empty them out.
| &self.leader_ids, | ||
| &mut self.ring_bufs, | ||
| &common, | ||
| None)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we're not keeping track of the fds that get created here since we're not passing a Vec<PerfDataFile> to add_cpu_bufs. Presumably this is because we're redirecting these events to the leader buffers that we actually look at. When a fork/clone operation occurs, does this redirection automatically happen since we're setting the inherit flag? I'm assuming so, since that's the only thing I can see that would allow these events to flow from new tasks created after we call build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, inherit will do the redirection within the kernel for fork/clone. We unfortunately have to keep the FD in memory for perf_events to keep the redirect event active (I wish this wasn't the case). So we end up with a lot of FDs. Ideally, we'd only need 1 FD per-CPU and have perf_event ref-count the redirected ring buffers, but that's not how it currently works. I'll be poking around with tracing kernel folks if we can make this better over time.
brianrob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
With microsoft/one-collect#209, we can now allow specifying a particular process to be traced.
Closes #192.
Perf_event_open() requires an FD per-PID + CPU + Tracepoint, which causes a lot of FDs, so this change also includes setting the rlimit default limit to the max to ensure we can have as many files open as possible.
I've tested scenarios where existing threads are already running and we get data back for them correctly.
I've also tested a process without any extra threads getting collection started, and then having new threads spawn.
In both scenarios I see the new threads data showing up via nettrace within PerfView.