Skip to content
Draft
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
2929949
feat(k8s): add eventInformer to podTracker
cognifloyd Apr 28, 2022
be144f4
feat(k8s): ignore event deletion
cognifloyd Apr 28, 2022
7602600
feat(k8s): begin handling event stream
cognifloyd Apr 28, 2022
457a664
refactor: create eventInformer from eventInformerFactory
cognifloyd Apr 29, 2022
7d674e8
refactor: rename selector=>labelSelector
cognifloyd Apr 29, 2022
71c9b34
enhance: register eventInformerFactory on podTracker
cognifloyd Apr 29, 2022
c3e4b50
enhance: add podTracker.inspectContainerEvent
cognifloyd Apr 29, 2022
3e89df6
enhance: add signal for running container
cognifloyd Apr 29, 2022
8ba7ad5
enhance: only mark containers as running/terminated if it is the corr…
cognifloyd Apr 29, 2022
038bf97
enhance(k8s): exit WaitContainer if build is canceled
cognifloyd Apr 29, 2022
b5ab8eb
enhance(k8s): add containerTracker.Events() function
cognifloyd Apr 29, 2022
57011c1
tests: fix inspectContainerStatuses test
cognifloyd Apr 29, 2022
639b219
bugfix(k8s): Make sure image pull errors are detected
cognifloyd Apr 29, 2022
092840b
refactor(k8s): use channels to signal imagePull success/errors
cognifloyd Apr 29, 2022
e3d6d93
fix: comment typos
cognifloyd Apr 29, 2022
aa1078e
enhance: capture ImagePull errors from ContainerStatuses as well
cognifloyd Apr 29, 2022
9c64abd
enhance(k8s): handle more image pull event types
cognifloyd Apr 29, 2022
9686c23
tests(k8s): fix tests for RunContainer
cognifloyd May 3, 2022
dd189e0
tests(k8s): test RunContainer and WaitContainer with canceled build
cognifloyd May 3, 2022
ca14e32
tests(k8s): test AssembleBuild with canceled build
cognifloyd May 4, 2022
2a6d109
tests(k8s): test RunContainer with PodTracker failure (increase cover…
cognifloyd May 4, 2022
08c219c
tests(k8s): test inspectContainerStatuses with Running or ImagePullError
cognifloyd May 4, 2022
40da3f8
chore: prune some comments
cognifloyd May 4, 2022
4726072
tests: fix inspectContainerStatuses test with an errgroup
cognifloyd May 4, 2022
0d2f315
tests: test RunContainer with an ImagePullError
cognifloyd May 4, 2022
ccf9fb2
tests: test getTrackedPodEvent
cognifloyd May 4, 2022
f76339d
tests: test podTracker.HandleEventAdd, podTracker.HandleEventUpdate
cognifloyd May 4, 2022
e6876d1
refactor: drop unused Events function
cognifloyd May 4, 2022
ecceeac
tests: test inspectContainerEvent image pull events
cognifloyd May 4, 2022
b945f33
tests: test inspectContainerEvent edge cases
cognifloyd May 4, 2022
9ec81fa
tests: test more pull policies in SetupContainer
cognifloyd May 4, 2022
65cbe59
chore: delete dead code.
cognifloyd May 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
enhance(k8s): handle more image pull event types
I've documented a bunch of other event reasons to facilitate
expanded event handling in the future.
Also, I keep running across these messages, so this help
me remember what they are and where they come from.
  • Loading branch information
cognifloyd committed May 4, 2022
commit 9c64abd4f11e77db7b10c4ecadc7f5c4dcdda4c9
50 changes: 41 additions & 9 deletions runtime/kubernetes/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,36 @@ import (
"k8s.io/apimachinery/pkg/util/wait"
)

// These are known kubernetes event Reasons.
const (
// kubernetes event reasons.
reasonBackOff = "BackOff"
reasonFailed = "Failed"
reasonPulled = "Pulled"
reasonPulling = "Pulling"
// nolint: godot // commented code is not a sentence
// known scheduler event reasons can be found here:
// https://github.com/kubernetes/kubernetes/blob/v1.23.6/pkg/scheduler/schedule_one.go
//reasonFailedScheduling = "FailedScheduling"
//reasonScheduled = "Scheduled"

// known kubelet event reasons are listed here:
// https://github.com/kubernetes/kubernetes/blob/v1.23.6/pkg/kubelet/events/event.go

// kubelet image event reasons.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to just use their constants or wrap them at least so we know when they change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have not packaged the code so that it can be used externally.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/events/event.go

If I do go get github.com/kubernetes/kubernetes I get an error module declares its path as: k8s.io/kubernetes
If I do go get k8s.io/kubernetes I get an error that k8s.io/[email protected] requires k8s.io/[email protected]: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0

They seem to carefully curate what can be imported by external projects, so we can't import this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just checking. I just noticed when I went to the code they were public on the package. So, thought they might be importable.

Copy link
Contributor

@jbrockopp jbrockopp May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I'd provide a bit of clarity on this subject...

The tl;dr is you can vendor from k8s.io/kubernetes but its not pretty or straightforward 😅

Here's the reason why go get k8s.io/kubernetes fails:

kubernetes/kubernetes#79384 (comment)

And to actually vendor from k8s.io/kubernetes, you have to add a replace directive for all nested packages:

kubernetes/kubernetes#79384 (comment)

If we want, there are people who've scripted the approach to this so updating the version is easier:

kubernetes/kubernetes#79384 (comment)

To see a real world example of what this looks like in the go.mod file:

https://github.com/kubernetes-sigs/cloud-provider-huaweicloud/blob/9589bc854c29e399476479f9c5aa9599b2064da5/go.mod#L18-L40

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eww... All that work just to reuse some constants... Not worth it imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 😄

reasonPulling = "Pulling"
reasonPulled = "Pulled"
reasonFailed = "Failed" // Warning: image, container, pod
reasonInspectFailed = "InspectFailed" // Warning
reasonErrImageNeverPull = "ErrImageNeverPull" // Warning
reasonBackOff = "BackOff" // Normal: image, container

// nolint: godot // commented code is not a sentence
// kubelet container event reasons.
//reasonCreated = "Created"
//reasonStarted = "Started"
//reasonKilling = "Killing"
//reasonPreempting = "Preempting"
//reasonExceededGracePeriod = "ExceededGracePeriod"
// kubelet pod event reasons.
//reasonFailedKillPod = "FailedKillPod"
//reasonFailedCreatePodContainer = "FailedCreatePodContainer"
//reasonNetworkNotReady = "NetworkNotReady"
)

// InspectContainer inspects the pipeline container.
Expand Down Expand Up @@ -468,23 +492,31 @@ func (p *podTracker) inspectContainerEvent(event *v1.Event) {

p.Logger.Tracef("container event for %s: [%s] %s", tracker.Name, event.Reason, event.Message)

// check if the event mentions the target image
// if the relevant messages does not include our image
// it is probably for "kubernetes/pause:latest"
// check if the event mentions the target image.
// If the relevant messages does not include our image, then
// either it is for "kubernetes/pause:latest", which we don't care about,
// or it is a generic message that is basically useless like:
// event.Reason => event.Message
// Failed => Error: ErrImagePull
// BackOff => Error: ImagePullBackOff
// Many of these generic messages come from this part of kubelet:
// https://github.com/kubernetes/kubernetes/blob/v1.23.6/pkg/kubelet/kuberuntime/kuberuntime_container.go
if strings.Contains(event.Message, tracker.Image) {
switch event.Reason {
// examples: event.Reason => event.Message
case reasonFailed, reasonBackOff:
// The image related messages come from the image manager in kubelet:
// https://github.com/kubernetes/kubernetes/blob/v1.23.6/pkg/kubelet/images/image_manager.go
case reasonFailed, reasonBackOff, reasonInspectFailed, reasonErrImageNeverPull:
// Failed => Failed to pull image "image:tag": <containerd message>
// BackOff => Back-off pulling image "image:tag"
// InspectFailed => Failed to apply default image tag "<image>": couldn't parse image reference "<image>": <docker error>
// InspectFailed => Failed to inspect image "<image>": <docker error>
// ErrImageNeverPull => Container image "image:tag" is not present with pull policy of Never
tracker.ImagePullErrors <- event
return
case reasonPulled:
// Pulled => Successfully pulled image "image:tag" in <time>
// Pulled => Container image "image:tag" already present on machine
tracker.imagePulledOnce.Do(func() {
p.Logger.Debugf("container image pulled: %s in pod %s, %v", tracker.Name, p.TrackedPod, event.Message)

Expand Down