Skip to content

Conversation

@soltysh
Copy link

@soltysh soltysh commented Nov 28, 2022

/assign @deads2k

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2022
The major issue with splitting the test binaries to be built from two separate
repositories is that developers wishing to run kubernetes tests will have to
build the binary manually, from a separate repository and ensure the binary
is available during test runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just pull down the latest tests image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

developers wishing to run kubernetes tests will have to
build the binary manually, from a separate repository

@bparees it's not clear to me how it affects the suite kubernetes/conformance currently included in the openshift-tests binary when running:

openshift-tests run kubernetes/conformance [args...]

Will the suite kubernetes/conformance keeps available without additional steps or we'll need to run it from the upstream way (building/downloading the binary [, using sonobuoy's embedded k8s suites], etc) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really a question for @soltysh at this point. The current implementation includes the k8s tests in the openshift-tests image and in a separate k8s image, so you can run them from either location (i.e. just having openshift-tests is sufficient to run the k8s tests, though the version of the k8s tests may vary between the two images).

But that is currently done by vendoring the k8s tests into origin, which ultimately we want to stop doing. So i think the end solution looks more like:

separate test images for different tests/suites/components/areas
an aggregated image (openshift-tests) that is built by pulling all those separate images together

and only the aggregated image ships in the payload, and it's all that is needed to run all the tests. That would maintain the current end user experience.

but that is not what has been built/delivered so far, out of a need for expediency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio there are 2 possible ways with the current approach:

  1. run openshift-tests as you did before, which will try to pull the appropriate k8s-tests binary from the release image, and it will use that binary for the test execution;
  2. use OPENSHIFT_SKIP_EXTERNAL_TESTS=true env variable, when invoking openshift-tests which will use the embedded version of k8s conformance tests;

We are guaranteeing that the latter approach will also work for cases when pulling the binary from release is not possible for various reasons. When it comes to k8s conformance (which don't change between minor versions) that shouldn't be a problem for you.

Have a look at Risks and Mitigations section, if you don't find answers there, lemme know I'll gladly add whatever details are required.

@dhellmann
Copy link
Contributor

@elmiko @rvanderp3 @mtulio @bostrt @lobziik @julienlim FYI, since this will affect the VCSP work and certification tool. You'll want to work with @soltysh to make sure someone from your team is on the reviewer list for this enhancement.

### Non-Goals

1. Abandon existing tests.
2. Change the available functionality of `openshift-tests`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is not a goal to change the current content of openshift-tests suites, mainly the conformance ones (openshift/conformance and kubernetes/conformance), is that correct?

My concern is the e2e tests from kubernetes/K8s suite are in openshift/conformance, this suite is a base suite used to validate external clusters by partners on VCSP program - and we understand that the e2e tests in this suite cover all needed to validate an OCP installation.

$ ./openshift-tests run openshift/conformance --dry-run |grep -F "$(./openshift-tests run kubernetes/conformance --dry-run |tail -n1)"
"[sig-storage] Subpath Atomic writer volumes should support subpaths with secret pod [Excluded:WindowsDocker] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"
$ ./openshift-tests run openshift/conformance --dry-run |grep -F "$(./openshift-tests run kubernetes/conformance --dry-run |head -n1)"
"[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] listing mutating webhooks should work [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"
$ ./openshift-tests run openshift/conformance --dry-run |grep -F "$(./openshift-tests run kubernetes/conformance --dry-run |shuf |head -n1)"
"[sig-api-machinery] Watchers should observe an object deletion if it stops meeting the requirements of the selector [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything you said is true, i'm not clear what the concern is though?

we can meet the goal of having k8s tests run as part of the openshift/conformance suite w/o including the k8s tests in the openshift-tests binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can meet the goal of having k8s tests run as part of the openshift/conformance suite w/o including the k8s tests in the openshift-tests binary.

That's what I would like to make sure I got correctly to track changes on the VCSP program/OPCT.

Overall the two steps in this proposal will bring a huge benefit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio correct, we will ensure the current workloads continue working, at least wrt k8s conformance tests.

@elmiko
Copy link
Contributor

elmiko commented Dec 14, 2022

adding myself to cc list for this pr, i have not had a chance to fully digest it yet though.

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me and generally seems like a good idea.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2023
@bertinatto
Copy link
Member

/remove-lifecycle stale

@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2023
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 1, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 1, 2023

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@soltysh soltysh force-pushed the test_improvements branch from 239f7ca to 49b77b4 Compare June 20, 2023 14:20
@soltysh soltysh changed the title [WIP] Initial take for improved platform tests Improved platform tests Jun 20, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2023
2. Remove the requirement that all test code lives in a single repository.
3. Maintain `openshift-tests` API for running tests.
4. Ensure that kubernetes tests are still vendored in `openshift-tests` binary,
to allow for a fallback when the release payload image is not reachable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this one as temporary....once we start having all teams moving their e2e tests into their own repos, we're just going to have to accept that the test images need to be available or the test job fails.

and we're not going to want to maintain this vendoring indefinitely just to give us a fallback (not to mention that falling back is risky since now you're not necessarily running the tests you thought you were)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some cases the fallback will be a must, microshift comes to my mind immediately, since they don't provide release image. I'm not sure about hypershift, either. So at least for k8s-related tests I think that might be longer than just temporary. We'll need to re-assess this at a later point in time. That's why I've explicitly called out k8s tests only for backwards compatibility.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this to "Even more future work" section, there's still work about upgrade tests, so this is a good future thing to look at .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if nothing else, fallback needs to be an opt-in behavior (do not fallback to another version of the test code silently or by default).

but this is another reason why producing a single image that contains all the test binaries would be a better solution in my mind.

cc @dhellmann since he can better speak to what microshift does today or would want to support in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that explicit fallback is being tracked in openshift/origin#28000 which will eventually allow us to drop the fallback we currently have.


This proposal might lead to unnecessary proliferation of external tests binaries,
which might cause problems when the release image is not available, or for local
development.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard to imagine a situation where the release image is not available to someone who's trying to run tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about microshift?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll have to defer to @dhellmann on how they manage this today (where they get the openshift-tests binary from, how they distribute it to customers) on the implications there, but in my mind it's another argument for why we're better off having a single test image that contains all the test binaries. Microshift can then consume/reference that image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no release payload for MicroShift. It looks like the CI job that uses openshift-tests pulls it from an image built as part of the CI system? https://github.com/openshift/release/blob/master/ci-operator/config/openshift/microshift/openshift-microshift-main.yaml#L47

@pacevedom, @pmtk , or @copejon can any of you explain where that binary comes from?

We do not ship the openshift-tests binary as part of MicroShift.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the breadcrumbs it's: inputs.test-bin which is

  test-bin:
    name: "4.14"
    namespace: ocp
    tag: tests

Which is imported: Tagging ocp/4.14:tests into pipeline:test-bin.
I think it's this image config

tl;dr: we just take it from a promoted image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workflow where someone is trying to test outside of our CI system and has no release payload?

if they aren't having to modify the tests themselves, they could still use the test binaries from a recent payload image(since @soltysh's current proposal is that there would be multiple test images, one per binary, i defer to him on exactly how a consumer/client identifies all the images needed, pulls them, and extracts all the binaries, but that's something that CI will need to be doing too and i expect it will be relatively straightforward and handled by the openshift-tests binary itself)

if they are modifying the tests locally, then i'd expect them to be able to:

  1. build/get a copy of the openshift-tests wrapper binary
  2. invoke it while pointing it at their local externalized test binary for their component

(it would only run the tests that are built into openshift-tests plus the tests in their binary, but usually when people are doing this they are already narrowing the set of tests down to a few specific ones anyway).

Do those two options cover your use cases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number 2 is covered in risks sections, where I'm talking about possibility to use locally available binaries instead of pulling them from release. It's not implemented, but it's rather simple. I'll add it to future work section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside is you need to know which binaries you care about, but in case of microshift it's rather simple b/c you'll most likely care about k8s mostly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, MicroShift would need tests for Routes, SCCs, and oc, as well. Our current goal is to make the entire suite "work" (pass or skip), so we can add a job to ensure new tests also work on MicroShift (pass or skip). When the suite is split up, we could avoid doing that for the entire suite and focus on the parts that are most important.

If there's going to be a way to use local binaries for the tests, we should be able to come up with a way to get the "right" binaries, so I think my concerns would be covered by adding a bit of detail to the future work section as you suggest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Fabio's comment we'll stick with in-binary k8s tests for now, and we'll slowly work our way out in the future, remembering about cases like CSI and Microshift.

@soltysh
Copy link
Author

soltysh commented Jun 26, 2023

just wanted to share that i've been working with upstream sig cloud provider in attempts to create more generic tests for CCMs, you can see the ideas we've been discussing here https://hackmd.io/@elmiko/BJGn1SQU3

@elmiko thx for that, but I think k8s is in better situation, since they could just export the framework for easier consumption, and the fact that they are directly invoking ginkgo they don't require the same sophistication when splitting the tests into separate binaries, since you can easily treat each one as a separate unit (they can be combined, but I'd probably discourage that).

Our case is a bit different in that we have the set of monitors which are probing the cluster throughout the entire test run, and only then we execute single tests. Upstream k8s doesn't have that functionality, which makes their tests' execution a bit simpler 😉

and minimize time when we start running newest kubernetes tests.
2. Remove the requirement that all test code lives in a single repository.
3. Maintain `openshift-tests` API for running tests.
4. Ensure that kubernetes tests are still vendored in `openshift-tests` binary,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a must for now, but I'm not sure if it's worth to keep this goal in the long-term. The fallback argument is important during the first stages, but eventually it's all about maintainability winning the game.

Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I went through this enhancement and except for nits, I don't have anything to add.

I propose we merge this and work out the remaining details in separate changes.

@soltysh soltysh added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 30, 2023
the kubernetes tests vendored since they don't provide release image from which
we can extract the kubernetes test binaries.

Re-evaluate [alternative approach](#build-time-assembly---aggregating-binaries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth starting this conversation with testplatform in 4.15 to understand what capabilities our CI build system provides and how difficult this actually is long term. I'd really like to see us not have vendored tests. That will discourage separation beyond a few well known cases.

@deads2k
Copy link
Contributor

deads2k commented Jul 6, 2023

We have a significant improvement already merged. I'm ok landing this as a description, but we need to avoid losing sight of future improvements since we're still stuck vendoring at the moment.

@deads2k
Copy link
Contributor

deads2k commented Jul 6, 2023

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2023
Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 7, 2023

@soltysh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit c77298b into openshift:master Jul 7, 2023
@soltysh soltysh deleted the test_improvements branch July 7, 2023 14:40
To achieve the first step of the proposal, we need to:
1. Provide a library for building `<binary>-tests` which exposes two commands:
- `list` - responsible for listing tests in JSON format;
- `run` - run a single test, returning results in ginkgo compatible format;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be run-test, consistently with the main openshift-tests command. Having a compatible API potentially enables interesting future evolutions. For example, external tests may eventually grow some independence from openshift-tests for debugging purposes, and implement other bits of openshift-tests' API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.