feat(proposal): Add `InitStep` resource #771

cognifloyd · 2023-03-15T23:35:34Z

Implementation:

The UI needs to be handled by someone else as I don't know Elm.

proposals/2023/03-15_init-step.md

jbrockopp

We should drop Step from the name of the resource i.e. Init

The reason why is because today's init process can be either a stage or step:

https://github.com/go-vela/server/blob/main/compiler/native/initialize.go

cognifloyd · 2023-04-05T01:59:08Z

The init "stage" is effectively just a holder for the init step. That was only needed because pipelines cannot have Stages and Steps-- if there are Stages, then all steps have to be in a stage.

So, there is no need for an init "stage" any more.

Also, by moving to InitStep, we can have more than just a single log to report initialization. Conceptually these are steps of initialization.

Also, using "Inits" makes the code much harder to grok. I started with something like that, really disliked it. It makes more sense to me to lean into Step so we don't have another concept floating around.

I'm open to other names, though I really didn't like Init.

jbrockopp

To provide some context, init was chosen because it replicates built-in functionality for Golang:

https://golangdocs.com/init-function-in-golang
https://medium.com/golangspec/init-functions-in-go-eac191b3860a

We tried to chose a name that would ensure we don't conflict with user-declared steps/services in the pipeline so we based it off an already "reserved" keyword from our language of choice.

Can you provide some reasoning or justification behind why not init?

jbrockopp · 2023-04-10T19:39:11Z

proposals/2023/03-15_init-step.md

+
+Replace the injected "init" step (a pseudo-container) with `InitStep`, a new resource.
+
+Each `Build` will have one or more `InitSteps` that logically group related parts of the log data.


Can you clarify how many InitSteps are to be expected for each Build?

I ask because each individual Vela installation at Cargill and Target have ran > 1 million builds.

Each of those builds can already have multiple services and steps leading to a large amount of data in those tables as well as the logs table.

If we're going to now also add multiple InitSteps per build, and if I'm following other information in this proposal, we'll have an InitStep for each step/service in the build, and each InitStep has a corresponding entry in the logs table, we're talking about an exponential growth in data.

Yes. There would be many more rows in the InitSteps and Logs tables.

If we move forward with creating an InitStep for each Step and Service, then:
In the logs table, the line(s) that the runtime currently prefixes to the step logs would be in a separate row from the rest of the step output.

For kubernetes, I would like to report on more than just the command to be run. Displaying the final container definition (in yaml) would be helpful as well as reporting the asynchronous events the k8s provides (pulling image, exponential back off pulling image, image pulled, container starting, container started, container terminated, crash loop back off, failed to mount volume, ...).

Today, Vela does not give me enough information about a running build, I have to use other kubernetes native tools to watch the build. So, as I started working on improving the design around init-reporting, I realized InitStep could be a way to first capture and then expose this additional meta state info about steps. However, I'm a kubernetes admin, and most of my Vela users will not have access to the tooling I do. So, I need to find a way to capture and expose this to those users.

As I write this, I'm trying to think of what value this might provide to docker runtime users:

More collapsible log sections in the UI (esp in the big block of build init)

similar to GHA and Circle CI, the command will be shown in a separate collapsible place from the rest of the command's output. And no fake shell prompt ($) is required.

InitSteps is merely one way to provide all these benefits.

With your millions of builds, how could we better tag/structure/organize the entries in the logs table without creating the exponential increase that scares you? Maybe change as more fields to the log struct so that the command(s) can be structurally separated from the output without increasing the number of rows? Turn the log struct into a postgres JSON type?

jbrockopp · 2023-04-10T19:40:18Z

proposals/2023/03-15_init-step.md

+Each `Build` will have one or more `InitSteps` that logically group related parts of the log data.
+
+An `InitStep` is a report by a "reporter" about a discrete part of the build setup. The "reporter" would be a logical part of the stack, like the "Pipeline Compiler" or the Runtime. For example, we could have these reporters be discrete initsteps (Reporter: name of the `InitStep`):
+- Pipeline Compiler: report info/debug logs


Is there an open ticket or request for this information?

Trying to understand the value behind providing this information.

No open ticket. I asked in slack as I was trying to figure out what to do about the "init" step.

https://gophers.slack.com/archives/CNRRKE8KY/p1677684779756489?thread_ts=1677684670.251639&cid=CNRRKE8KY

I asked about the "purpose of the injected init step".

@kneal replied with (he mentioned the compiler, emphasis mine):

The high level idea was to record the ephemeral setup to enable pipeline users to troubleshoot builds without admin level intervention.

Historically there was all sorts of things talked about being added in there. Compile info, runtime setup, image pulls, secret setup, etc

Environment injection info was another thing I was personally interested in looking at adding too. It's been a bunch of random things honestly.

But the idea is just provide as much info to the user as possible to self solve without admins and give them a understanding of what's going on under the hood.

...

Yeah, the init idea in general was built pretty MVP I think. There's a lot of angles for improvement. Even visually honestly, reading that much output isn't super useful. It would be great to have the details from the commands collapsable like Actions does in their setup info.,

Me (@cognifloyd):

So, I guess to make the PR flow more manageable we'd want to actually add support in reverse order, starting with the UI/CLI where they allow for an optional Init field for reporting all that, and then build backwards to actually add the backend support.

@kneal mentioned the compiler here (my emphasis):

It doesn't necessarily have to be reverse order either. You could setup the compiler and worker to use a new init layer. Get all that in main and then go back and rip things out once the frontend layer is ready.

We'd just be tracking twice but I'm guessing the performance hit is pretty negligible

I think compatibility is more important than that tiny performance loss for a minor period of time.

And then @kneal mentioned the compiler again here (in discussion about moving the init step to some top level field on the Build struct; emphasis is original, not mine):

If you have init get added in at the pipeline.Build struct layer. You might actually be able to add a whole shit ton of info to that field starting from the compiler layer ...

Adding more info, is likely a double edge sword tho until the output is more easily consumable.

So, I focused on the logs that could be added from the compiler because @kneal mentioned it several times. It's apparently a blind spot where users have to ask Vela admins for help diagnosing issues. That seems reasonable to me, but it only makes sense when there is a build that such logs can be associated with.

jbrockopp · 2023-04-10T19:44:50Z

proposals/2023/03-15_init-step.md

+- Docker Runtime: report volume setup
+- Kubernetes Runtime: Pod YAML
+
+`InitStep`s are always part of a build. An `InitStep` may also be associated with a `Step` or a `Service`. Today, the runtime prefixes each step's log with the command (`$ command ...`). An `InitStep` linked to a `Step`--or a `Service`--will allow the worker to cleanly report what it runs separate from the output of that command. The UI, in turn will also be able to visually separate the command and output. The worker could also log other information like environment vars, or for kubernetes, the YAML container definition in the pod.


So we'll have the Log for the InitStep contain the $ command ... portion for each step/service from:

https://github.com/go-vela/server/blob/main/compiler/native/script.go#L70-L98

And then the Log entry for that step/service will contain the output from invoking that command?

Yes. I want to separate the Vela generated bits from the command step or service output.

jbrockopp · 2023-04-10T19:47:01Z

proposals/2023/03-15_init-step.md

+
+Currently, we're abusing the `Step` and `Container` models to allow reporting on build setup (eg when the Docker Runtime initializes the Network and Volumes). This means that we have to check for the special "init" stage/step/container in many places. 
+
+We are also simulating a shell in that init step, printing simulated commands and output even though the runner does not actually run them, so the init step does not accurately represent which part of vela is doing what thing to prepare for running the user's pipeline.


Can you clarify this with an example?

AFAIK, we manually generate the "shell output" for those commands to replicate the runtime CLI i.e. docker

But the docker commands we generate are literal representations of what API call is invoked on the host

That may be true in the docker runtime, but the init commands displayed by the kubernetes runtime are generally pointless noise. They are NOT literal representations of the equivalent kubectl commands -- some of the API features we use now are used internally by kubectl but are not exposed directly on the cli. Instead of showing commands in the kubernetes runtime, I want to show kubernetes resources, preferably in yaml format. The imitated kubectl commands do not provide value at best, are misleading at worst (Vela is not just a wrapper around kubectl, constantly shelling out to kubectl to communicate with k8s, which is what printing those commands suggests).

A stream of resources in one big init block is also not helpful. Ideally the init output should be collapsible and syntax highlighting friendly. Ie: it needs to be more structured / less ad-hoc to facilitate UI enhancements.

jbrockopp · 2023-04-10T19:52:24Z

proposals/2023/03-15_init-step.md

+In the worker, we frequently need to iterate over the containers for steps/services. But the "init" stage/step is not really a container, so we have to identify which container's are not actually containers so they can be skipped. So far, the worker relies on `Name="init"`, but that does not work in all cases. when the executor is checking trusted+privileged settings in `AssembleBuild`, it checks for `Image="#init"` instead because service containers can be named "init" by users.
+
+This issue is even worse with the kubernetes runtime. There, the number of containers has to be counted and indexed. Given a particular step or service the Kubernetes runtime has to look up which container it needs to edit. So there are many places where that count/index has to be adjusted by one to account for the init step. Then with the injected clone step, figuring out when to add or subtract one or 2 to get the index can be confusing. Also, the kubernetes runtime breaks when running a pipeline with a service named "init" because the container setup is skipped in  one place but not another. That was uncovered by attempting to use it in the executor AssembleBuild test.


This problem could be solved by removing init from Stages/Steps and instead create a separate field:

https://github.com/go-vela/types/blob/676c45c911b5ed2c6019dfcb20852c551fc8873a/pipeline/build.go#L16-L29

->

// Build is the pipeline representation of a build for a pipeline. // // swagger:model PipelineBuild type Build struct { ID string `json:"id,omitempty" yaml:"id,omitempty"` Version string `json:"version,omitempty" yaml:"version,omitempty"` Metadata Metadata `json:"metadata,omitempty" yaml:"metadata,omitempty"` Environment raw.StringSliceMap `json:"environment,omitempty" yaml:"environment,omitempty"` Worker Worker `json:"worker,omitempty" yaml:"worker,omitempty"` InitStep InitStep `json:"init,omitempty" yaml:"init,omitempty"` Secrets SecretSlice `json:"secrets,omitempty" yaml:"secrets,omitempty"` Services ContainerSlice `json:"services,omitempty" yaml:"services,omitempty"` Stages StageSlice `json:"stages,omitempty" yaml:"stages,omitempty"` Steps ContainerSlice `json:"steps,omitempty" yaml:"steps,omitempty"` }

Once we escape the confines of Steps/Stages, why stop with only one InitStep? With more than one, we can have more structured logs.

But yes, if we restrict each build to only one InitStep, adding a separate Build.InitStep field does resolve many shortcomings of the current design. I used an InitStep slice (Build.InitSteps) in this proposal because it had the benefit of enabling more structured/organized logging instead of one big wall of Init log for each build.

jbrockopp · 2023-04-10T19:53:41Z

proposals/2023/03-15_init-step.md

+Today, only the worker can safely add details to the init step log. With this change,
+the server can also add end-user visible logging, especially in places like the compiler
+where it would be helpful to highlight compile errors in the UI/CLI.
+
+In the Build and Webhook API endpoints, we can create the `InitStep` and `Log` as soon as we
+have the `RepoID` and `Build.Number`. Failures before this point can't be associated
+with a particular build, so they cannot bubble up to the end-user. The endpiont will
+then pass in a `library.Log` to the compiler that it creates specifically for the compiler.
+It can also record relevant its own log messages when handling requests like:


Today, we don't create a build unless the pipeline compiles successfully.

Am I correct in understanding that we'd always generate a build now even if the pipeline doesn't compile?

No. That is not my intention. The init step logs should only be created if the build has been created. I thought the build number could only be assigned once the build was persisted. Most of the compiler logic happens before the build is created, but not all of it (unless I misunderstand the sources).

cognifloyd · 2023-04-11T05:53:45Z

Apparently with the GitHub mobile app, I can't queue up my review responses and submit them all at once, sorry.

About the name:

To provide some context, init was chosen because it replicates built-in functionality for Golang:

... why not init?

I don't like using Init alone as a noun. Wiktionary does say it is a computing noun, and uses the example "the ... system init".

In golang, init is an adjective, not a noun. Go has an "init function" not an "init". You define it with func init() {} not just init {}. "Init" is too vague on its own. I suppose if there was a good adjective it might be ok to use something like "BuildInit".

For Vela, we can use "init" to describe something. So what is it that we are describing with "init"?

I started with Init, Inits, and InitSlice, but it felt jarring; so I looked for a noun that was like "step" and included the idea of "init" or "setup". Everything I came up with is already used in various workflow systems, and could be confusing: Action, Task, Node, Phase, Workflow, Setup, Plan, ....

As Vela already has a concept of a Step, we don't need to find another noun, we can lean into it and make a special step for init info, an InitStep.

cognifloyd · 2023-04-12T21:49:59Z

I think I have an alternative idea on how to achieve these goals without an InitStep object. Basically:

allow database.Log to be associated directly with a build, not just a step or service.
- when step_id=NULL and service_id=NULL, then the log entry is for the build as a whole (build_id still cannot be NULL)
- Instead of submitting logs that are attached to an "init" step, we use some new build-specific endpoints:
  - GET /api/v1/repos/:org/:repo/builds/:build/init/:name/logs
  - POST /api/v1/repos/:org/:repo/builds/:build/init/:name/logs
  - PUT /api/v1/repos/:org/:repo/builds/:build/init/:name/logs
  - DELETE /api/v1/repos/:org/:repo/builds/:build/init/:name/logs
- When I read this API path, I read "init logs", so "init" is not standing alone. It describes the logs. The API path ends up being the only place that needs this "init" string.
we could extend database.Log and library.Log with additional fields:
- for build init logs we add:
  - Name: a file name (like pod.yaml) or other descriptive url-safe string
    - To start off with, Name can default to - or step which would result in one of these:
      - /api/v1/repos/github/octocat/builds/55/init/-/logs
      - /api/v1/repos/github/octocat/builds/55/init/step/logs
    - but the worker/runtime could opt to create more than that. For example:
      - /api/v1/repos/github/octocat/builds/55/init/pod.yaml/logs
    - The log table would get another constraint: UNIQUE(build_id, name) (in addition to the current UNIQUE(step_id) and UNIQUE(service_id) and the build_id index.
  - Reporter: a well known name like Compiler, xyz Runtime, ...
- for optional use on all logs (build, step, or service):
  - Mimetype: defines a specific datatype for the contents of the Data field.
    - NULL is equivalent to text/plain (basically what we do now)
- for step and service logs (these would need the same secret scrubbing that the Data field gets):
  - Command: This separates the command that vela runs from its output (which goes in the Data field).
    - The UI/CLI can handle presentational bits like prefixing $
    - The CLI can optionally skip printing the command
    - The UI can collapse the command separate from the command output
      - Some of my commands are very long, which makes it annoying to scroll past the script to get to the script output.
  - Environment: Show the env vars (scrubbed of course) for the step or service.
    - This is a very helpful feature in GHA, CircleCI and other CI services.
  - AdHocData: additional adhoc logging that the worker could add beyond what the pipelines command printed.
    - Other possible names: Events, InitData, AdHoc ...
    - This would be helpful for the kubernetes runtime to report additional event information about pulling images, starting the container, etc.

The AdHocData field idea is still in its infancy, and I'm not sure what shape that feature should take yet. The other fields are clear (imo) wins over the status quo, adding more structure to the log data.

Yes, I'm imagining that a build could have more than one Log entry, but generally it would be only one unless the runtime has a good reason to split off some part of the init logs. Steps and Services, on the other hand, would still only have one log entry, but they would have additional fields to split out what currently goes in the Data field as well as logging some additional details about the step/service.

@jbrockopp Does that sound more palatable?

jbrockopp · 2023-04-20T13:00:13Z

@cognifloyd apologies it took so long to respond but I wanted to ensure I provided enough detail to explain my logic.

I am against creating an InitStep for each step/service in the pipeline that stores the logs from commands.

In general, the output for what command was run is smaller than the actual output from running the command.

By default, Vela is setup with compressing logs in the DB because we have some users who push MBs of logs per step.

It's important we note that with compression, you need to reach a saturation point before you get value from it.

To be stated differently, a certain length must be achieved in logs before you get value from compression.

Here is a Go playground example for reference: https://go.dev/play/p/DPwS2axFJSJ

That example shows compression actually increases the size of the log entry until we have > 70 characters.

For a single log entry this may not be a big deal, but if you compound that with millions of builds which contain tens of millions of steps/services, this is the exponential growth I had concerns about which I referred to earlier.

Can you clarify why having one InitStep per build would not satisfy the use-case you're looking for?

cognifloyd · 2023-04-21T15:32:46Z

A couple of the fields I was looking at adding to Log only make sense if there are multiple init logs for a Build:

Mimetype: (for eventual syntax highlighting in the UI) only works if I can guarantee that all of the output in an initstep matches that type.
Reporter: which service added the log.

Most of this proposal is comes from "what if we had multiple InitSteps per build, what does that enable". Since that does not seem valuable to you, and no one else has chimed in, I'll just close this.

We probably just need to associate a Log directly with a Build and skip the intermediate step resource.

cognifloyd added 4 commits March 15, 2023 14:59

stub init step proposal

61381cd

chore: Fill out remaining parts of init step proposal

9b2fb7c

Add today's date to proposal

545e951

chore: reorder README

33fe579

cognifloyd self-assigned this Mar 15, 2023

cognifloyd requested a review from a team as a code owner March 15, 2023 23:35

cognifloyd commented Mar 16, 2023

View reviewed changes

proposals/2023/03-15_init-step.md Show resolved Hide resolved

enhance: InitStep can be associated with Step/Service

c1a670c

This was referenced Mar 24, 2023

refactor(api): moved common Post*Stream logic to helper func go-vela/server#795

Closed

feat(api/log)!: support paging on GetBuildLogs go-vela/server#798

Merged

Merge branch 'main' into init-step

8c66625

jbrockopp requested changes Apr 4, 2023

View reviewed changes

jbrockopp requested changes Apr 10, 2023

View reviewed changes

cognifloyd closed this Apr 21, 2023


		Replace the injected "init" step (a pseudo-container) with `InitStep`, a new resource.

		Each `Build` will have one or more `InitSteps` that logically group related parts of the log data.


		Currently, we're abusing the `Step` and `Container` models to allow reporting on build setup (eg when the Docker Runtime initializes the Network and Volumes). This means that we have to check for the special "init" stage/step/container in many places.

		We are also simulating a shell in that init step, printing simulated commands and output even though the runner does not actually run them, so the init step does not accurately represent which part of vela is doing what thing to prepare for running the user's pipeline.

		In the worker, we frequently need to iterate over the containers for steps/services. But the "init" stage/step is not really a container, so we have to identify which container's are not actually containers so they can be skipped. So far, the worker relies on `Name="init"`, but that does not work in all cases. when the executor is checking trusted+privileged settings in `AssembleBuild`, it checks for `Image="#init"` instead because service containers can be named "init" by users.

		This issue is even worse with the kubernetes runtime. There, the number of containers has to be counted and indexed. Given a particular step or service the Kubernetes runtime has to look up which container it needs to edit. So there are many places where that count/index has to be adjusted by one to account for the init step. Then with the injected clone step, figuring out when to add or subtract one or 2 to get the index can be confusing. Also, the kubernetes runtime breaks when running a pipeline with a service named "init" because the container setup is skipped in one place but not another. That was uncovered by attempting to use it in the executor AssembleBuild test.

feat(proposal): Add InitStep resource #771

feat(proposal): Add InitStep resource #771

Uh oh!

Conversation

cognifloyd commented Mar 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jbrockopp left a comment

Choose a reason for hiding this comment

Uh oh!

cognifloyd commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockopp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cognifloyd commented Apr 11, 2023

Uh oh!

cognifloyd commented Apr 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockopp commented Apr 20, 2023

Uh oh!

cognifloyd commented Apr 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(proposal): Add `InitStep` resource #771

feat(proposal): Add `InitStep` resource #771

cognifloyd commented Mar 15, 2023 •

edited

Loading

cognifloyd commented Apr 5, 2023 •

edited

Loading

cognifloyd commented Apr 12, 2023 •

edited

Loading