Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions

We now have 30 Github Actions that run as part of the CI test suite, and it's starting to have a noticeable impact on CI runners.

We test Spark with a large number of combinations of Java versions and Scala versions.

We previously only tested the "latest" Spark version (i.e. Spark 3.2) with Scala 2.13.

We are now testing:
- Spark 2 with Java 8 (1 workflow)
- Spark 3.0, 3.1, 3.2, 3.3 with Java 8 and Scala 2.12 (4 workflows)
- Spark 3.0, 3.1, 3.2, 3.3 with Java 11 and Scala 2.12 (4 workflows)
- Spark 3.2, 3.3 with Java 8 and Scala 2.13 (2 workflows)
- Spark 3.2, 3.3 with Java 11 and Scala 2.13 (2 workflows)

That brings a total of 13 Spark specific CI variants that run on every PR that touches `core` or `spark`.

We should consider reducing the large number of combinations of JRE versions with Scala versions that are run for the various Spark versions, as CI is starting to take a good while longer.

We should also look into (again) refactoring out CI test suites to using callable workflows, such that all tests stem from one root test (very much like an Airflow DAG), so that if any one test fails, they all stop. We get this for free at present for any set of CI suites generated out of one `matrix` (such as java 11 and java 8 with scala 12).

This will reduce the number of CI slots that are running for tests that will have to be run again (as something else failed).

We can also set up the faster tests first, to ensure they pass, before then calling out to the more expensive tests (such as Spark / Flink etc).

I tried before with the callable workflow, but at the time it wasn't worth the effort. I think now it probably is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions